Ashish Kapoor's Twitter Thread

The future of robotics is RL with synthetic data. GRPO could teach robots to learn like humans do. But implementing it for robotics is non-trivial. Here's where this breakthrough technology remains trapped:

I've seen many algorithmic breakthroughs that don't translate well to robots. GRPO (Group Relative Policy Optimization) might be one of the most significant one. It's a reinforcement learning method that could revolutionize how robots learn. But there's a fundamental issue ...

First, let's understand what makes GRPO special: In robotics, we need to teach machines to navigate complex tasks. Most RL methods use scalar rewards - but this isn't enough for sophisticated robot learning. GRPO might hint at a fundamentally different approach:

Instead of simple rewards, GRPO works by comparing different experiences: It generates multiple paths based on current strategy. Then it checks how well each path worked, looking at relative performances. This provides a clear gradient for improvement. But there's a catch:

To make these comparisons, GRPO needs multiple attempts at the same task. In simulation, this is straightforward - you can run thousands of scenarios. In the real world? It's far too expensive and often impractical. This reveals the core challenge of modern robotics.

Take autonomous drones for example: GRPO can help them avoid obstacles while efficiently reaching the destination. But without simulation, testing multiple paths becomes prohibitively expensive. This creates a fascinating paradox:

The most promising learning algorithms require extensive trial and error. But real-world trial and error is too expensive and risky. This is why breakthroughs like GRPO require simulations. Let me show you the potential we're missing:

GRPO could revolutionize: • Path Planning & Navigation • Robot Manipulation & Grasping • Multi-Robot Coordination • Warehouse Robotics But without better simulation technology, these advances can't reach their full potential.

The solution requires simulation frameworks that can: 1. Generate multiple solutions 2. Evaluate them meaningfully 3. Bridge the gap between simulation and reality This is especially crucial for aerial, space and underwater robotics. But there's hope on the horizon:

At Scaled Foundations, we're building GRID - a framework designed to bridge this gap. As simulation technology matures, robots will become self-learning in software. Because the future of robotics isn't just about better algorithms. It's about better tools to implement them.

The robotics revolution won't come from a single breakthrough. It will come from building the infrastructure that lets breakthroughs reach the real world. This is why I started @scafo to focus on such problems. The potential is too important to ignore:

Imagine robots that can: • Learn from comparing experiences • Make controlled, gradual improvements • Optimize their strategies through real-time comparison All possible with GRPO and future methods that will truly harness power of sims.

The future of robotics isn't about building better robots. It's about building better ways for robots to learn. This is why I'm dedicating my work to solving this fundamental challenge. The next decade of robotics depends on it.

Want to be part of the AI/Robotics revolution? Try out the GRID Platform today! https://www.scaledfoundations.... Show us what you can do!

I hope you've found this thread helpful. Follow me @akapoor_av8r for more. Like/Repost the quote below if you can:

Video credits: • YT link https://www.youtube.com/watch?...

@akapoor_av8r Add loop to the mix https://arxiv.org/abs/2502.016...

@akapoor_av8r The exact tweet thread I was looking for; what's GRPO impact on robotics; tks for the sharing Ashish!

@akapoor_av8r Ah it's so over

@akapoor_av8r RL with synthetic data is powerful, and real-world adaptability thrives on diverse, high-quality motion data. The more movements captured, the smarter robots become. 🤖📡

@akapoor_av8r Yes, left hemisphere can be entirely learned by simulation. But who works on the right hemisphere?

@akapoor_av8r Totally agree—GRPO has huge potential, but the real challenge is making RL work outside of perfect simulations. The future is in better augmentation and sim-to-real transfer. I’m working on an augmentation platform myself, so I’d love to hear more about what you’re building! :)

@akapoor_av8r Synthetic data + GRPO's hierarchical learning could revolutionize robotics, but we're grappling with: vector-based reward modeling, latent space optimization, and high-dimensional motor control challenges. Until we solve cross-embodiment transfer, human-like robotic adaptation

@akapoor_av8r While GRPO shows promise for human-like robot learning through synthetic data, key challenges remain: reward function design, sim-to-real transfer gaps, and high-dimensional action spaces. Solving these could unlock truly adaptive robotics, but we need advances in both algorithms

@akapoor_av8r glad to see MuSHR getting some action! I have fond memories building it

@akapoor_av8r Amazing! I love seeing simulations like this

@akapoor_av8r We’re at the edge of something huge, but implementation is always the hardest part. 👏

@akapoor_av8r Do you think synthetic data could fully substitute real world data?

@akapoor_av8r Do systems like this need Nvidia H100's and H200's??? If so, are you looking for some? Let's DM 💯🫡 I have on hand in inventory

@akapoor_av8r Looks cool. Which languages, frameworks, and tools do you use for robotics, training and creating such animated visuals?

@akapoor_av8r https://x.com/gurtej__gill_/st...

@akapoor_av8r @BrianRoemmele

@akapoor_av8r Awesome

@akapoor_av8r Exciting potential! Bridging sim-to-real is the key challenge.

@akapoor_av8r Don’t use grpo for robotics. The multiturn and noisy nature will cause too much variance in the estimate of baseline, which was why PPO had all the ugly GAE stuff at the beginning

@akapoor_av8r Another hyper with ad, the first video what I got attracted, there is no sign that its done by GRPO, and the algorithm just a variant of PPO, there are ton of other better RL algorithms which performs way better.

@akapoor_av8r Storage seems to be the biggest issue I believe in real-time scenarios along with processing. Storing the knowledge remotely (for non-instant scenarios) seems like a really good idea. The robot will just need a GSM module and good internal storage.

Share this thread

Read on Twitter

Navigate thread