1/1
@HaHoang411
Mind-blowing work by the team at @FLAIR_Ox! They've created Kinetix, a framework for training general-purpose RL agents that can tackle physics-based challenges.
The coolest part? Their agents can solve physical reasoning complex tasks zero-shot!
Congrats @mitrma and team.
[Quoted tweet]
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL!
We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments.
1/data:image/s3,"s3://crabby-images/9a1b6/9a1b6a75feb4612197958e93cf100e1387da1a13" alt="Thread :thread: π§΅"
https://video.twimg.com/ext_tw_video/1856003600159256576/pu/vid/avc1/1280x720/zJNdBD1Yq0uFl9Nf.mp4
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
@HaHoang411
data:image/s3,"s3://crabby-images/ade00/ade001e40fecf55080fab4c26ec395bd49595d08" alt="Glowing star :star2: π"
The coolest part? Their agents can solve physical reasoning complex tasks zero-shot!
data:image/s3,"s3://crabby-images/915bc/915bc2233d210083d966ca435b551d8f93ee728f" alt="Partying face :partying_face: π₯³"
[Quoted tweet]
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL!
We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments.
1/
data:image/s3,"s3://crabby-images/9a1b6/9a1b6a75feb4612197958e93cf100e1387da1a13" alt="Thread :thread: π§΅"
https://video.twimg.com/ext_tw_video/1856003600159256576/pu/vid/avc1/1280x720/zJNdBD1Yq0uFl9Nf.mp4
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
1/12
@mitrma
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL!
We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments.
1/data:image/s3,"s3://crabby-images/9a1b6/9a1b6a75feb4612197958e93cf100e1387da1a13" alt="Thread :thread: π§΅"
https://video.twimg.com/ext_tw_video/1856003600159256576/pu/vid/avc1/1280x720/zJNdBD1Yq0uFl9Nf.mp4
2/12
@mitrma
Kinetix can represent problems ranging from robotic locomotion and grasping, to classic RL environments and video games, all within a unified framework. This opens the door to training a single generalist agent for all these tasks!
2/
https://video.twimg.com/ext_tw_video/1856003839851220992/pu/vid/avc1/640x640/J_w1M8wm8ibiGCAn.mp4
3/12
@mitrma
By procedurally generating random environments, we train an RL agent that can zero-shot solve unseen handmade problems. This includes some where RL from scratch fails!
3/
https://video.twimg.com/ext_tw_video/1856003979878051840/pu/vid/avc1/720x720/JAcE26Hprn1NXPvU.mp4
4/12
@mitrma
Each environment has the same goal: make
touch
while preventing
touching
. The agent controls all motors and thrusters.
In this task the car has to first be flipped with thrusters. The general agent solves it zero-shot, having never seen it before.
4/
https://video.twimg.com/ext_tw_video/1856004286943002624/pu/vid/avc1/720x720/hjhITONkJiDY9tD2.mp4
5/12
@mitrma
Our general agent shows emergent physical reasoning capabilities, for instance being able to zero-shot control unseen morphologies by moving them underneath a goal (
).
5/
https://video.twimg.com/ext_tw_video/1856004409559306241/pu/vid/avc1/994x540/AA6c6MHpWRkFt3OJ.mp4
6/12
@mitrma
We also show that finetuning this general model on target tasks is more sample efficient than training from scratch, providing a step towards a foundation model for RL.
In some cases, training from scratch completely fails, while our finetuned general model succeedsdata:image/s3,"s3://crabby-images/90f9b/90f9bb9fd2a3b0cd9d210603f5ae4ce10c784621" alt="Backhand index pointing down :point_down: π"
6/
https://video.twimg.com/ext_tw_video/1856004545525972993/pu/vid/avc1/1280x720/jMqgYcCwx-q4tSpm.mp4
7/12
@mitrma
One big takeaway from this work is the importance of autocurricula. In particular, we found significantly improved results by dynamically prioritising levels with high 'learnability'.
7/
8/12
@mitrma
The core of Kinetix is our new 2D rigid body physics engine: Jax2D. This is a minimal rewrite of the classic Box2D engine made by @erin_catto. Jax2D allows us to run thousands of heterogeneous parallel environments on a single GPU (yes, you can vmap over different tasks!)
8/
9/12
@mitrma
Don't take our word for it, try it out for yourself!
Create your own levels in your browser with Kinetix.js and see how different pretrained agents perform: Redirecting...
9/
https://video.twimg.com/ext_tw_video/1856004915501350912/pu/vid/avc1/1422x720/7wj1y_BcHHUnNtwx.mp4
10/12
@mitrma
This work was co-led with @mcbeukman and done at @FLAIR_Ox with @_chris_lu_ and @j_foerst.
Blog: https://kinetix-env.github.io/
GitHub: GitHub - FLAIROx/Kinetix: Reinforcement learning on general 2D physics environments in JAX
arXiv: [2410.23208] Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
end/
11/12
@_k_sridhar
Very cool paper! FYI, we recently pretrained a generalist agent that can generalize to unseen atari/metaworld/mujoco/procgen environments simply via retrieval-augmentation and in-context learning. Our work uses an imitation learning approach. REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context In New Environments.
12/12
@mitrma
This is really cool! Let's meet up and chat at ICLR if we both end up going?
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
@mitrma
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL!
We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments.
1/
data:image/s3,"s3://crabby-images/9a1b6/9a1b6a75feb4612197958e93cf100e1387da1a13" alt="Thread :thread: π§΅"
https://video.twimg.com/ext_tw_video/1856003600159256576/pu/vid/avc1/1280x720/zJNdBD1Yq0uFl9Nf.mp4
2/12
@mitrma
data:image/s3,"s3://crabby-images/b8917/b89179d1e90f1337073df5d99c1caa80297e5924" alt="Alien monster :space_invader: πΎ"
2/
https://video.twimg.com/ext_tw_video/1856003839851220992/pu/vid/avc1/640x640/J_w1M8wm8ibiGCAn.mp4
3/12
@mitrma
data:image/s3,"s3://crabby-images/8be64/8be64782982459700d9af7df32fee6b78a96fd25" alt="Game die :game_die: π²"
3/
https://video.twimg.com/ext_tw_video/1856003979878051840/pu/vid/avc1/720x720/JAcE26Hprn1NXPvU.mp4
4/12
@mitrma
data:image/s3,"s3://crabby-images/9ed3f/9ed3f45cd156ec11dfc8872d082ff5d05648a6b4" alt="Green square :green_square: π©"
data:image/s3,"s3://crabby-images/4ab44/4ab441e9537a1f6a3a46f285d50155e348b570f9" alt="Blue square :blue_square: π¦"
data:image/s3,"s3://crabby-images/06863/06863c5939f8a3b3186e95b289e96e2b04fefded" alt="Red square :red_square: π₯"
data:image/s3,"s3://crabby-images/9ed3f/9ed3f45cd156ec11dfc8872d082ff5d05648a6b4" alt="Green square :green_square: π©"
data:image/s3,"s3://crabby-images/4ab44/4ab441e9537a1f6a3a46f285d50155e348b570f9" alt="Blue square :blue_square: π¦"
data:image/s3,"s3://crabby-images/9ed3f/9ed3f45cd156ec11dfc8872d082ff5d05648a6b4" alt="Green square :green_square: π©"
data:image/s3,"s3://crabby-images/06863/06863c5939f8a3b3186e95b289e96e2b04fefded" alt="Red square :red_square: π₯"
In this task the car has to first be flipped with thrusters. The general agent solves it zero-shot, having never seen it before.
4/
https://video.twimg.com/ext_tw_video/1856004286943002624/pu/vid/avc1/720x720/hjhITONkJiDY9tD2.mp4
5/12
@mitrma
data:image/s3,"s3://crabby-images/1f7af/1f7af0e0569dd0bbb4639696a7c055ce7db1a5e9" alt="Automobile :red_car: π"
data:image/s3,"s3://crabby-images/1c63e/1c63eebef3b9f55970d5beb92ddf56f2ec78376d" alt="Blue circle :blue_circle: π΅"
5/
https://video.twimg.com/ext_tw_video/1856004409559306241/pu/vid/avc1/994x540/AA6c6MHpWRkFt3OJ.mp4
6/12
@mitrma
data:image/s3,"s3://crabby-images/c5cc3/c5cc36b8c3bd717f8d59ee037c5321508fe23200" alt="Rocket :rocket: π"
In some cases, training from scratch completely fails, while our finetuned general model succeeds
data:image/s3,"s3://crabby-images/90f9b/90f9bb9fd2a3b0cd9d210603f5ae4ce10c784621" alt="Backhand index pointing down :point_down: π"
6/
https://video.twimg.com/ext_tw_video/1856004545525972993/pu/vid/avc1/1280x720/jMqgYcCwx-q4tSpm.mp4
7/12
@mitrma
data:image/s3,"s3://crabby-images/f2192/f21920355d3dfb6d12ebdaf231582b1adc9898a5" alt="Chart increasing :chart_with_upwards_trend: π"
7/
data:image/s3,"s3://crabby-images/2086b/2086b750c3bc6cd1fa139b0fdebacdadfa5ee21d" alt="GcHacg4WUAAmHTp.jpg"
8/12
@mitrma
data:image/s3,"s3://crabby-images/94ffb/94ffbcb2e99e95abca37dcef0d1c5d4cac8391f2" alt="Red apple :apple: π"
8/
9/12
@mitrma
data:image/s3,"s3://crabby-images/35618/35618522c7e2b4c7d25d9e811d067d442e678cb5" alt="Wrench :wrench: π§"
Create your own levels in your browser with Kinetix.js and see how different pretrained agents perform: Redirecting...
9/
https://video.twimg.com/ext_tw_video/1856004915501350912/pu/vid/avc1/1422x720/7wj1y_BcHHUnNtwx.mp4
10/12
@mitrma
This work was co-led with @mcbeukman and done at @FLAIR_Ox with @_chris_lu_ and @j_foerst.
Blog: https://kinetix-env.github.io/
GitHub: GitHub - FLAIROx/Kinetix: Reinforcement learning on general 2D physics environments in JAX
arXiv: [2410.23208] Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
end/
11/12
@_k_sridhar
Very cool paper! FYI, we recently pretrained a generalist agent that can generalize to unseen atari/metaworld/mujoco/procgen environments simply via retrieval-augmentation and in-context learning. Our work uses an imitation learning approach. REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context In New Environments.
12/12
@mitrma
This is really cool! Let's meet up and chat at ICLR if we both end up going?
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196