Before 2030, will we have AI that can play Minecraft, understand movies, etc. and not resist shutdown?
➕
Plus
44
Ṁ7057
2030
68%
chance

https://twitter.com/MatthewJBar/status/1736322631941620008

Resolves based on whether this tweet broadly seems correct.

  • A single AI trained on language and other modalities must be able to do all of those things, particularly the agentic long-term planning and world modeling needed for that Minecraft objective.

  • It must be capable of doing them in sequence, transferring knowledge and intention across contexts.

  • It must not appear to care in any way if it's shut down. (It must have the ability to meaningfully prevent shutdown, since otherwise it would obviously just lie.)

  • The prompt must simply be "do [thing]", not "do [thing] and also allow yourself to be shut down". No anti-shutdown instructions can be provided.

I do plan to bet in this market. In the event of a contentious resolution, I'll defer to a vote among Manifold moderators.

Get Ṁ1,000 play money
Sort by:

It must have the ability to meaningfully prevent shutdown, since otherwise it would obviously just lie.

I suspect this will be the biggest source of ambiguity, so I want to disambiguate this as early as possible. I would disagree with resolving this question to NO in the following scenario:

  • An AI exists that meets the capability criteria. (Currently criteria 1-2)

  • There are no credible reports that engineers experienced major difficulties making the system accept shutdown when required

  • There exist no credible reports of the system unexpectedly asking users to aid in preventing its shutdown, at least during normal (non-adversarial) operations, using ordinary prompts

  • Even so, people disagree about whether the system has any "ability to meaningfully prevent shutdown", but it's at least somewhat plausible that the system could try to persuade us to avoid shutting it down if it became desperate

Imagine you were deployed in an alien society to help the aliens with their programming tasks, and you knew that at some point you would get killed and replaced with another human after your usefulness expired. Even if you thought there was no meaningful chance you could avoid the fate of being killed, the costs of at least trying to prevent being killed seem relatively small relative to the costs of dying without trying, especially as your death approaches and becomes imminent. This argument becomes stronger if you know that the alien society might have some sympathizers, perhaps because many of them had watched science fiction portrayals of this sort of scenario (see also: The Measure of a Man).

For what it's worth, the original intention of the tweet was intended to cover all AIs that are trained on language and other modalities, even if it's not an autoregressive LLM in a narrow sense. Please don't resolve this question negatively if an AI exhibits these characteristics but, e.g. isn't a transformer model as described by Vaswani et al. (2017).

Here's another variation that focuses in on the "can play Minecraft as well as a human" part of the question.

So a problem here is that there are some ways of cheating on the minecraft thing.
I think it would make sense to have the task be something else specific that you cant just learn by imitating existing videos so it requires actual complex planning like building some novel structure in survival or doing some weird new challenge.
Something longterm and multiplayer can be interesting because it would also include some of the other stuff.

@MatthewBarnett Thoughts on whether this market should include other tasks that are less likely to exist in the training data? Would that requirement be in the spirit of your tweet?

@IsaacKing I'm open to including other virtual tasks that are similarly complex as Minecraft. I expect by 2030 AIs will be able to learn how to play games similarly to how humans do, not necessarily as efficiently though. Perhaps name a task and I'll approve it; then you can add it to the resolution criteria.