Profile picture of Hong-Xing "Koven" Yu

Hong-Xing "Koven" Yu

@Koven_Yu

Published: December 28, 2024
8
55
252
1/6
08:46 AM

🤩Forget MoCap -- Let’s generate human interaction motions with *Real-world 3D scenes*!🏃🏞️ Introducing ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation. No training, No MoCap data! 🧵1/5 Web: https://awfuact.github.io/zero...

2/6Continued
08:49 AM

Generating 4D human-scene interaction motions is central to gaming/VR/robotics. Yet, existing methods generally require many Motion-Scene pairs for training, which is expensive💸 and infeasible to collect in various real-world scenes ❌.

Image in tweet by Hong-Xing "Koven" Yu
3/6Continued
08:50 AM

We propose ZeroHSI to generate 4D interactions without requiring any MoCap data — Instead, our main idea is to distill human motions from a well-trained video generation model 🎥 that has already seen many human videos.

4/6Continued
08:50 AM

The technical idea is very simple: We generate a human-scene interaction video for the 3D scene, and then we use differentiable human rendering to extract the 3D human motion.

Image in tweet by Hong-Xing "Koven" Yu
5/6Continued
08:51 AM

Our ZeroHSI works with both (1) static scenes and (2) dynamic scenes with interactable objects.

6/6Continued
08:53 AM

See our project website https://awfuact.github.io/zero... for more visualizations! Work done w/ Hongjie Li (summer intern student at our group), @jiaman01 , and @jiajunwu_cs at @StanfordSVL @StanfordAILab .

Share this thread

Read on Twitter

View original thread

Navigate thread

1/6