AI Reading Group | How Far is Video Generation from World Model: A Physical Law Perspective

Name: AI Reading Group | How Far is Video Generation from World Model: A Physical Law Perspective
Start: 2026-06-08T16:45:00.000Z
End: 2026-06-08T18:30:00.000Z
Location: AI Campus Berlin

No items found.

Location

AI Campus Berlin

Google Maps

How Far is Video Generation from World Model: A Physical Law Perspective

Bingyi Kang, Yang Yue, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, Jiashi Feng

We close out this season of the BLISS Reading Group with a critical question that follows naturally from the previous two sessions: when a video generation model produces realistic-looking physics, has it actually learned the underlying laws?

Our paper is How Far is Video Generation from World Model: A Physical Law Perspective (Kang et al., 2024).

After Sora and similar models generated impressive videos that appeared to obey physics, it became tempting to claim that scaling video generation would naturally produce world models. Kang et al. put this to the test. The results are sobering: the models generalise perfectly in-distribution, show some scaling progress on combinatorial tasks, but fail on out-of-distribution scenarios. Naive scaling alone is not enough to discover physical laws.

What would it take for a video model to truly extrapolate rather than interpolate? Is combinatorial diversity in training data the answer, or do we need fundamentally different architectures? And does a "world model" need to learn laws, or is good-enough prediction sufficient?

More events