Viewing a single comment thread. View all comments

arindale t1_iqwduom wrote

You have to look into the source data. Specifically, it links to Metaculus's "Date Weakly General AI is publicly known" survey. They further define this below (bolded text). I provided some additional definitions in [] parenthesis with no bold.

"For these purposes we will thus define "AI system" as a single unified software system that can satisfy the following criteria, all easily completable by a typical college-educated human.

Able to reliably pass a Turing test of the type that would win the Loebner Silver Prize. [The "silver" prize is offered for the first chatterbot that judges cannot distinguish from a real human and which can convince judges that the human is the computer program.]

Able to score 90% or more on a robust version of the Winograd Schema Challenge, e.g. the "Winogrande" challenge or comparable data set for which human performance is at 90+% [a benchmark for commonsense reasoning, is a set of 273 expert-crafted pronoun resolution problems originally designed to be unsolvable for statistical models. Recent advances in neural language models have already reached around 90% accuracy on variants of WSC. Per Cornell University, this problem was solved by 2019, but note that they were ANI.]

Be able to score 75th percentile (as compared to the corresponding year's human students; this was a score of 600 in 2016) on all the full mathematics section of a circa-2015-2020 standard SAT exam, using just images of the exam pages and having less than ten SAT exams as part of the training data. (Training on other corpuses of math problems is fair game as long as they are arguably distinct from SAT exams.) [ I believe this has been solved as early as 2015 but may have cheated using previous versions of SAT tests. But more recent work suggests that an AI can solve university-level math problems which would be harder. This link provided is one of many different news articles. I perceive this problem as likely solved.

Be able to learn the classic Atari game "Montezuma's revenge" (based on just visual inputs and standard controls) and explore all 24 rooms based on the equivalent of less than 100 hours of real-time play (see closely-related question.)" [Montezuma's Revenge was solved in 2018 by Uber. I am unsure whether it met the threshold of 100 hours of real-time play, but there are other models that have been released since Uber's paper and one may have already met this threshold.]

​

Now, personally, I think that this specific set of questions is insufficiently broad for a weak AGI. But I admit that everyone has a different definition of weak AGI and Metaculus at least provided a precise definition that can be measured against. Given this definition, I think it's somewhat possible for a single to meet all of the criteria in 2022 or 2023. Two notable challenges remain.

  1. To make a truly remarkable chatbot that is indistinguishable from humans. There are some serious contenders, but I would argue that this is not ready quite yet.

  2. To create a SINGLE AI model that can do ALL of these tasks.

Will we see a single AI model in 2023 fit all of these criteria? I have high hopes for the Gato 2 scale-up but who knows at this point.

4