Generative Video Model Comparisons - through dragons!

Models vary in their quality and purpose, so to explore how different models actually behave, I decided to give them all the same challenge: a dragon queen flying above her kingdom.

Each model had this starting frame:

To make it a fairer comparison, I designed each prompt for that specific model (or at least I tried), but all have the same purpose. Each one is 720 fps (because, you know, credit budget).

Kling 3.0 Omni

One of the most interesting aspects of Kling Omni is that it approaches video generation differently from many competing models.

Most generative video models treat each prompt as a largely self-contained request. You provide a starting image, a prompt, and the model attempts to create motion that satisfies the instruction. The model understands the current frame extremely well, but its understanding of broader narrative context is often limited and as each frame in a sequence is generated, it usually only understands the one just before it. That is why you can start the shot with something really great, but by the end of a film (or even just within the shot), the details shift ever so slightly.

You can load elements - descriptions of people, things or locations with up to 4 reference images. So as the camera gets closer to her face, she looks as expected. And if I tried to generate this later in another shot, she’d still look the same there too.

Using the same element, I could put her in a totally different context and she’d still be largely consistent.

Veo 3.1 Fast

Veo 3.1 (even the Fast version) is certainly one of the stronger models. It does have a tendency to do weird things with movement, and to create a sort of cartoonish overlay on the characters.

Ray 3.14

I LOVE Ray 3.14, for the way it handles fast-moving, epic cinematic shots.

Firefly

Yeah, nah. Not really suitable for this one, I reckon.

Runway Gen 4.5

The motions didn’t really work here.