Mar 2025 Edit:
– Did a recent update on the numbers.
– I guess I’ve also updated what I consider AGI – not assuming that people fully buy in this Metaculus prediction, but sharing mine if I were to anchor on this. I feel like novel scientific research needs to be part of the picture: an AI model building and testing a hypothesis across many different scientific fields, including those that require physical research components and orchestration of research groups, and being able to produce reputable scientific research or file a patent over a discovery.
– Also, the Ferrari assembly doesn’t seem to fit for sufficient multimodality. There’s a clear business incentive for visual processes in car assembly to be supported by existing data, but that doesn’t necessarily speak to broader generalization. We need more everyday, real-world scenarios to really test a model’s generalizability. This isn’t the best example, and I wouldn’t call it AGI if a model could reason through it, but it’s in the direction I’m thinking: your toaster is broken, you’re leaving for work in 10 minutes, and all your pans are in the dishwasher. Does your helper agent consider that the best move might be to quickly wash a pan and toast the bread on the stove—rather than trying to fix the toaster or start a whole new meal from scratch?
August 2024 Edit: This was originally posted in November 2023. I keep the text almost fully to its original, only update the lines about the main prediction if I’ve significantly updated my prediction.
I feel so lost in conversations these days. So surprised that it’s just been a year since ChatGPT is out, I feel like it’s been here all my lifetime. Especially after moving out, so much of my conversations is going on in online chats, and it’s been difficult ot communicate. I don’t care abour my timelines at all, but this post is going to save me a tone of time so here it is 😀
Metaculus AGI prediction:
I think the tasks here are widely different from each other, combining purely cognitive and purely physical-cognitive tasks together, leading to a clear bottleneck, but anyway.
Q&A Benchmark: 2025-2026
MMLU, pretty similar benchmark, is already at >80%, so this should be solved in 2025 or 2026.
APPS (Algorithmic Reasoning): 2027
MATH and SWE-bench are kind of indicative for this benchmark both? And they seem to have doubled per year, so APPS should catch up? 2027 feels right, idk.
Turing Test (2-hour, multi-modal): ~2030
This one’s tricky. Depends on the judge, this convo can take an entire day or just 10 mins. I guess what matters here is the context length and multi-modality. Also, does AI need to be embedded in a hardware system that has continuous sensory input? Idk, I’ll index this similar to the Ferrari assembly question below, anchoring on difficulty of processing sensory information.
Ferrari Assembly (Real Robots!): 2035-2037
Are we talking about Ferrari actually implementing this because if they have “some” business understanding, the answer is “probably never” because they should keep the luxury human-hand-made brand. I mean we still pay a tone for concerts even though we have Spotify, so someone will need handmade cars, right?
Whatever, in terms of ability, I still think that specific tasks require mastery, like AI will probably produce a car that is functional, but it may need a call-back because a small detail was not perfect. So I’d anchor this on robotics and it has a long way to go. I think it’s plausible that AI can instruct someone to make the car by like 2030, but it’ll probably take another 5 years until it can make it, so maybe by 2035?
Big Picture Prediction: 2035, anchored on the Ferrari task.
Range? 2032-2038. +/- 3 years. Why? Idk.
Assumptions:
- Robotics: there might be a breakthrough, like data farms work out? idk.
- Compute? Will scale – at least to support on the cognitive tasks.
- Regulations: won’t slow down much
- Warning shots: hopefully not by then, but bio stuff is scary



No Comments / Yorum Bulunmuyor