VOOZH
about
URL: https://dev.to/t/reinforcementlearning
β± Reinforcementlearning - DEV Community
The Whole Paper Fits in One Sigmoid: Implementing the SDAR Gate
π shoaibalimir profile
Shoaibali Mir
π Image
Shoaibali Mir
Jun 14
The Whole Paper Fits in One Sigmoid: Implementing the SDAR Gate
#
machinelearning
#
reinforcementlearning
#
python
#
aws
1
comment
5 min read
Four Models in One Training Loop: Architecting SDAR on AWS (Before Renting a Single GPU)
π shoaibalimir profile
Shoaibali Mir
π Image
Shoaibali Mir
Jun 6
Four Models in One Training Loop: Architecting SDAR on AWS (Before Renting a Single GPU)
#
aws
#
machinelearning
#
reinforcementlearning
#
mlops
Add Comment
5 min read
How to Add Live Telemetry and Failure Diagnosis to Isaac Lab, MuJoCo, or Gazebo Training in Under 5 Minutes
π simtooreal profile
SimTooReal
π Image
SimTooReal
Jun 6
How to Add Live Telemetry and Failure Diagnosis to Isaac Lab, MuJoCo, or Gazebo Training in Under 5 Minutes
#
ai
#
robotics
#
mujoco
#
reinforcementlearning
Add Comment
4 min read
Why robotics RL training pipelines fail at scale
π robosynx profile
Robosynx
π Image
Robosynx
May 30
Why robotics RL training pipelines fail at scale
#
robotics
#
machinelearning
#
reinforcementlearning
#
simulation
Add Comment
4 min read
ARTIST: RL-Powered Tool Use for LLM Agents Explained
π jangwook_kim_e31e7291ad98 profile
Jangwook Kim
π Image
Jangwook Kim
May 27
ARTIST: RL-Powered Tool Use for LLM Agents Explained
#
reinforcementlearning
#
llmagents
#
tooluse
#
agenticai
Add Comment
9 min read
Q-Learning for Games: Teaching an Agent Tic-Tac-Toe Through Self-Play
π berkan_sesen profile
Berkan Sesen
π Image
Berkan Sesen
May 11
Q-Learning for Games: Teaching an Agent Tic-Tac-Toe Through Self-Play
#
reinforcementlearning
#
gametheory
Add Comment
14 min read
Your RL Agent Failed a 12-Step Task. Which Step Was Wrong? (The Supervision Problem in Agentic RL)
π shoaibalimir profile
Shoaibali Mir
π Image
Shoaibali Mir
May 31
Your RL Agent Failed a 12-Step Task. Which Step Was Wrong? (The Supervision Problem in Agentic RL)
#
machinelearning
#
reinforcementlearning
#
llm
#
aws
2
comments
5 min read
Value Iteration vs Q-Learning: Dynamic Programming Meets RL
π berkan_sesen profile
Berkan Sesen
π Image
Berkan Sesen
May 4
Value Iteration vs Q-Learning: Dynamic Programming Meets RL
#
reinforcementlearning
#
optimisation
#
dynamicprogramming
Add Comment
12 min read
Solving CartPole Without Gradients: Simulated Annealing
π berkan_sesen profile
Berkan Sesen
π Image
Berkan Sesen
Apr 23
Solving CartPole Without Gradients: Simulated Annealing
#
reinforcementlearning
#
optimisation
Add Comment
13 min read
The Cross-Entropy Method: Solving RL Without Gradients
π berkan_sesen profile
Berkan Sesen
π Image
Berkan Sesen
Apr 21
The Cross-Entropy Method: Solving RL Without Gradients
#
reinforcementlearning
#
optimisation
π Image
1
reaction
Add Comment
12 min read
Self-Learning AI Agents; Architectures and Challenges
π vishaluttammane profile
Vishal Uttam Mane
π Image
Vishal Uttam Mane
Apr 21
Self-Learning AI Agents; Architectures and Challenges
#
selflearningai
#
aiagents
#
agentarchitecture
#
reinforcementlearning
π Image
1
reaction
1
comment
3 min read
Policy Gradients: REINFORCE from Scratch with NumPy
π berkan_sesen profile
Berkan Sesen
π Image
Berkan Sesen
Apr 8
Policy Gradients: REINFORCE from Scratch with NumPy
#
reinforcementlearning
#
deeplearning
#
optimisation
Add Comment
16 min read
Deep Q-Networks: Experience Replay and Target Networks
π berkan_sesen profile
Berkan Sesen
π Image
Berkan Sesen
Apr 6
Deep Q-Networks: Experience Replay and Target Networks
#
reinforcementlearning
#
deeplearning
#
optimisation
Add Comment
18 min read
Q-Learning from Scratch: Navigating the Frozen Lake
π berkan_sesen profile
Berkan Sesen
π Image
Berkan Sesen
Apr 4
Q-Learning from Scratch: Navigating the Frozen Lake
#
reinforcementlearning
#
optimisation
Add Comment
11 min read
Evolution Is Back: A New Way to FineβTune LLMs
π ankitdey01 profile
Ankit Dey
π Image
Ankit Dey
May 4
Evolution Is Back: A New Way to FineβTune LLMs
#
ai
#
reinforcementlearning
#
machinelearning
#
coding
π Image
1
reaction
Add Comment
7 min read
π
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
π DEV Community
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account
π Image
π Image
π Image
π Image
π Image