EG "make me a 120 minute Star Trek / Star Wars crossover". It should be more or less comparable to a big-budget studio film, although it doesn't have to pass a full Turing Test as long as it's pretty good. The AI doesn't have to be available to the public, as long as it's confirmed to exist.
@elf Still think there's a long way to go here but this is definitely the type of thing that puts this way closer to possible that it was a week ago
@elf I was thinking of selling my no shares anyways due to ~AI 2027 and other stuff updating my timelines a bit shorter in general, but this is a good specific nudge. XD
A lot of human effort that went into fine tuning that model.


So humans had to write annotations for each 3-second segment of training data. With 81 five-minute episodes, that's 8,100 annotations. Each annotation was ~98 words. So that's 793,800 words of human effort to fine-tune the model.
Naively extrapolating 1-minute Tom and Jerry movie to a 120 minute Star-Wars/Star-Trek crossover, we would need 810 hours of training footage, broken down into 3 second segments, each annotated by 98 words, requiring 95 million words of human annotation in sum. (I'm not sure if the training footage could be a mix of Star Wars and Star Trek movies, or if it would have to be specifically from crossover movies) Once you had that model, you'd be able to generate more 2-hour Star-Trek/Star-Wars crossovers with very little human work. But it wouldn't generalize into say, a new Fast and Furious movie.
So, napkin math, this method needs 23 doublings in efficiency before it can generate arbitrary 2 hour movies with 10 words of human prompting, ex "make me a 120 minute Star Trek / Star Wars crossover".
HOWEVER, even though 95 million words is way more effort than a consumer would put into a movie for their own enjoyment, it would be a trivial cost for a Disney budget. 100 people, working in parallel, could do it in a few months. So if we're asking "When will Hollywood make movies from AI, with no human actors, cameras or 3d modeling,โ then human effort is less of a constraint than training footage. We don't have 810 hours of Star Wars movies. You can get about 200 hours if you include all movies, spinoff movies, and tv shows, but if the AI fine-tuned on all of those it would output a jarring mix of styles. If we want our training data to maintain the vintage original trilogy style, we'd have to limit our training data to only the 7 hours available in the OT. That would mean the Tom and Jerry model would need to double in efficiency 6.8 times before it could make Star Wars movies.
The market asks if AI can generate an AI video in response to "a prompt". The sample prompt listed is 10 words long. If someone makes an "AI movie" by splicing together hundreds of different videos on a model they had to pre-train and micromanage, that doesn't meet the standards gestured towards in the market description.
To truly see how far away we are from meeting the market resolution, ask yourself "What's the best movie people can make today with 10 words of prompting? No pretraining the model beforehand, off-the-shelf software only. Ignore videos made by humans who invested hours behind the scenes.
@GG They can use a longer single prompt than 10 words, but yeah, the general idea sounds correct. Once again, this market looks to me like "will we get noticeable RSI in AI research in the next couple years sufficient to greatly accelerate AI progress overall, therefore enabling movie generated AI as side effect."
If AI task horizon trends & reasoning model math/coding/etc. improvements continue at current rates, plus there aren't any multi-year unexpected delays due to difficulties in agent-scaffolding for unpredictable reasons, data-wall isn't unsurmountable in the near-term, etc. it could totally happen. But that all seems less than 1/3 by the resolution date.
@GG Why does it matter if it's a 10 word prompt?
I mean, you could literally have LLM take a 10 word prompt, and turn it into 100 word prompt. Even some image generators can do that. Take your prompt, and then improve it.
@JonTdb03 You're correct, an AI could take the end-user's 10 word word prompt, feed it into an LLM who expands on the prompt, and then send the expanded prompt to a different AI that makes the video. The important thing is that the human asking for a movie only has to put in 10 words of work. The rest of the work needs to be done by AI.
The "AI movies" that go viral these days still require tons of human work behind the scenes. Example 1, example 2, example 3. These movies, even if they were feature length and Hollywood quality, would not resolve this market YES, because they require far more human collaboration than the 10-word prompt given in the example.
ChatGPT 03-mini-high estimates producing a Hallmark movie takes ~20,000 person-hours of work, and producing a Hollywood romantic comedy takes ~120,000 hours. An AI that automated 90% of that labor would reek havoc on those employed in the industry, but still wouldn't meet the criteria to resolve this market YES.
@GG Again, your general point is correct, but I really don't think Scott meant the literal 10 word portion of the resolution criteria to be lode bearing. 1 prompt worth of work, where the prompt is a 150 word exercise in clever prompt engineering, would almost certainly not prevent a yes resolution, and I'm happy to make a market to bet on it if you disagree. ;P
@DavidHiggs Agreed, and maybe we're talking past each other, but the line needs to be drawn somewhere. The difference between writing 10 words and writing 150 words is small, logarithmically, compared to amount of work that needs to be done now to make a 5-minute AI movie, to say nothing of a 2 hour movie. The window in which AI can make a movie with 150 words of human help, but not 10 words, is short, so I doubt Scott would have to make a determination. (I suspect he would come down on your side though).
My bigger concern is if AI video is good enough to eliminate 90% of human labor demand, but is still far short of "ChatGPT for feature-film movies". Some people would think that should resolve the market YES, when I think it would clearly be NO.
Donโt mind me I gotta pay rent ๐๐ https://manifold.markets/NeoMalthusian/sp-500-drops-8-by-april-2025-postli
Because if it's doubling every 3 months then that graph goes way past 10 million dollars by Jan 2028


@MalachiteEagle ChatGPT says 80-90% of a TV commercial's budget is on purchasing air time, not production of the commercial itself. So a free AI would still need to be 80-90% as good as humans (as measured in conversions/airtime) in order to be economically viable. If the AI commercial costs half as much to produce, it needs to 90-95% as good.
I don't know if we'll get there in 18 months, but we're certainly not there now.
@GG you're pointing to the most expensive commercials though: the ones on TV. Think of all those localized and super targeted commercials on YouTube. Quite a lot of that will be AI-generated very soon.
@GG most people don't know how to use adblock. Think of all the crappy 30-second ad videos they're seeing on their phone etc
@GG Disruption starts at the lower end, not the higher end. If in 18 months time the majority of these are AI-generated, even if these are the lowest-tier ads, that still suggests awesome amonts of money flowing into AI video gen.
The 3-month doubling model starting from 10k dollars suggests someone has spent 1 million dollars on a single AI-generated video in November 2026. I think this is highly plausible.
Bear in mind this video was from November 2024: https://www.youtube.com/watch?v=4RSTupbfGog
@MalachiteEagle Your model suggests that an AI video cost $3,900 in November 2024. The Real Magic Coca-Cola ad, with a typical production budget, would have $500,000 to $2 million.
So either
A) Coca-cola spent a huge amount of money on an AI ad, 2 OOMs more than your model suggests.
B) The vast majority of the Real Magic production budget was spent on humans and traditional video editing expenses.
We haven't seen another big budget AI commercial. I suspect the Real Magic commercial was done for the novelty.
@GG I suspect they just did multiple calls to whatever their "Real Magic AI" thing was and selected the best versions. I think 4k dollars of inference compute sounds pretty close. The models they're using are too low-quality to motivate spending more than that.
@MalachiteEagle
> Like what is the maximum amount of inference compute in dollar money someone has spent on generating a single video right now with AI?
For reference, this 3 minute video cost $16 (a full 1.5 hour movie would cost around $200). Given the quality, I don't think anyone would pay even $1 for such a movie.
Costs need to come down by 2 OOM or quality needs to dramatically improve. Most likely both will happen simultaneously, since as cost comes down it's easier to experiment leading to higher quality.
If we imagine a curve like this:

Where we as "what is the longest video clip that AI can generate such that a human would enjoy watching it with 50% rate?" I would say that we are currently around the 30 second mark.
Suppose the doubling rate is exactly the same (1 doubling every 7 months), then we will have 5 doublings = a factor of 32 between now and early 2028.
So (take this with a huge grain of salt), I would expect that in 2028 we will be able to generate a video about 15 minutes long, and we wouldn't be able to generate a "movie" (2 hours) until 1.5 years after that (late 2029).
Of course, that curve could bend in either direction. For one thing, movie generation hasn't been a primary focus of most AI labs. (Runs tend to be in the millions of dollars for video models vs billions for the largest LLM runs). Movie generation will also get a huge boost from "fully multimodal out" models (similar to the 4o model that started the studio Ghibli craze). It's hard to know when exactly someone will train fully-multimodal video+images+text+sound input+output model, but I strongly suspect it won't happen until the inference cost for generating a video drops from a few dollars a minute (currently) to a few cents.
If I fully believed the curve above, I should probably sell my "yes" shares. But I think there is significantly room for surprise to the up side.
The most likely reasons the curve would bend down instead of up would be: the AI boom fizzles and people stop investing billions of dollars training models. or: China invades Taiwan and as a result TSMC is destroyed and Moore's law is set back by 2-4 years (Intel is behind, but they aren't that far behind).
Personally I have a hard time believing the AI boom will fizzle in the next 3 years. We still have a lot of headroom in how much we can improve reasoning-style models in ways that are clearly economically valuable, and that's not even touching on adjacent areas like robotics.
Manifold is also relatively optimistic about Taiwan.
I should also emphasize that $200 is how much it would cost to generate a 2-hour film using a python script that I wrote in 1 weekend. If for some reason you wanted to spend much more, you could get a significant boost in quality.
Here are some things you could do (in order of how practical they are):
* You could use a better video model (Say VEO2 instead of Wan, this would double the price)
* You could use a more expensive LLM (I would recommend GPT 4.5) for tasks like generating a script and converting it to a sequence of shots. (This would cost a few dollars per minute, so add a few hundred dollars for the entire film)
* You could fine-tune a LORA on each character in your film in order to guarantee near-perfect subject coherence throughout the film. (This costs $6/character so for a film with dozens of characters that appear in more than one shot this could be $100's)
* You could use a video-reasoning model (such as Gemini) to rate video clips and choose the "best out of 10" or something like this. (Assume we simply 10x the price of generating the film from $200 to $2000)
* You could fine-tune a LLM on the 2 tasks: writing a script and converting this script into a sequence of shots for a video-generation model. (The expensive part of this would be collecting the data. Fine tuning an LLM is comparatively cheap. Realistically, you're talking about hiring a team of data scientists, annotators, etc. which is $$$ or you can convince Furries to do it for free).
* You could train a fully multi-modal model for video+text+audio in/out. (This will cost between $6 million dollars if you are Deepseek and $30 Billion dollars if you are MetaAI)
* You could lobby congress that we need a Manhattan Project level effort to create AGI (Price tag $500B - $7 trillion)
So, yeah, if anyone wants to do some insider trading. For a mere $7 Trillion I can promise you this question resolves positive.
@LoganZoellner Video length keeps coming up as a metric. I mostly agree with your points, except for this.
It's not like student films are all 5 minutes, and young professionals make 22 minute TV episodes, and only senior filmmakers make 90 minutes films. Most people who have directed a successful TV show I'd expect could also direct a feature length film without additional skills or training.
The difference in stringing together 6 shots or 600 is mostly context length of the script. Solving the consistency issues for characters, environments, and art style is about the same. Maybe there's some work at each order of magnitude (e.g. 6 > 60 > 600, we know getting beyond 1 is hard), but does anyone here expect a model that can chain 54 clips together well but not 55?
It seems like the wrong metric.
@robm
> It's not like student films are all 5 minutes, and young professionals make 22 minute TV episodes, and only senior filmmakers make 90 minutes films. Most people who have directed a successful TV show I'd expect could also direct a feature length film without additional skills or training.
How long a task takes is much more important for p(success) for current AI models than it is for humans. This is empirically observable, but also relates to the fact that LLMs have a finite "attention window", meaning they literally cannot do things once they get past a certain length.
Humans, by contrast, are capable of long-term planning. Meaning that we can break a task into smaller chunks and (as you have noted) complete a large task if it is composed of small tasks we can do.
It would be a huge breakthrough in AI if someone solved long-term planning. The fact that this is theoretically possibility is one of several reasons why I think this curve has a better chance of bending upwards than downwards.
> does anyone here expect a model that can chain 54 clips together well but not 55?
Notice that "task duration" on the graph is on a Log-scale. This means the difference between 1 minute and 2 minutes is the same as the difference between 54 clips and 108 clips, not 55. I can easily believe someone might train a multimodal LLM with a big enough attention window to produce a 54 minute movie but not a 108 minute movie. Sora, for example, can produce 60 seconds of continuous video but not 120 seconds.
There are obviously "hacks" you could do to get around this (for example you could structure your film as a series of 10 minute episodes), but I think that the graph is morally true more so than literally true. That is to say, tasks which generally require about an hour to do require a certain level of intelligence (which AI models have only just reached). This could relate to things like: how well can you break tasks down into parts, how many things can you keep in your memory at the same time, how hard is it to keep track of how the parts relate to one-another...