A rising variety of specialists have known as for these exams to be ditched, saying they increase AI hype and create “the phantasm that [AI language models] have larger capabilities than what really exists.” Learn the complete story right here.
What stood out to me in Will’s story is that we all know remarkably little about how AI language fashions work and why they generate the issues they do. With these exams, we’re making an attempt to measure and glorify their “intelligence” based mostly on their outputs, with out totally understanding how they operate underneath the hood.
Different highlights:
Our tendency to anthropomorphize makes this messy: “Folks have been giving human intelligence exams—IQ exams and so forth—to machines for the reason that very starting of AI,” says Melanie Mitchell, an artificial-intelligence researcher on the Santa Fe Institute in New Mexico. “The difficulty all through has been what it means once you take a look at a machine like this. It doesn’t imply the identical factor that it means for a human.”
Children vs. GPT-3: Researchers on the College of California, Los Angeles, gave GPT-3 a narrative a few magical genie transferring jewels between two bottles after which requested it tips on how to switch gumballs from one bowl to a different, utilizing objects akin to a posterboard and a cardboard tube. The concept is that the story hints at methods to unravel the issue. GPT-3 proposed elaborate however mechanically nonsensical options. “That is the type of factor that kids can simply clear up,” says Taylor Webb, one of many researchers.
AI language fashions are usually not people: “With giant language fashions producing textual content that appears so human-like, it’s tempting to imagine that human psychology exams might be helpful for evaluating them. However that’s not true: human psychology exams depend on many assumptions that will not maintain for giant language fashions,” says Laura Weidinger, a senior analysis scientist at Google DeepMind.
Classes from the animal kingdom: Lucy Cheke, a psychologist on the College of Cambridge, UK, suggests AI researchers may adapt methods used to review animals, which have been developed to keep away from leaping to conclusions based mostly on human bias.
No person is aware of how language fashions work: “I feel that the basic downside is that we maintain specializing in take a look at outcomes slightly than the way you go the exams,” says Tomer Ullman, a cognitive scientist at Harvard College.