LLMs are stuck in a groupthink rut. This startup is trying to get them out.

🤖 Yapay Zekâ 📰 World 🕐 2 saat önce

LLMs are more predictable than you think: Ask any major chatbot for a random number between 1 and 10 and you'll almost certainly get 7. This isn't coincidence—it reflects a deep tendency in AI models to converge on the same familiar answers. An Australian startup is training AI to be weirder: Springboards built a model called Flint that deliberately injects randomness at key decision points in its responses, rather than cranking up randomness across the board—which tends to make models incoherent. The homogeneity problem is bigger than one company: A NeurIPS award-winning paper found that when 70+ LLMs were asked to write a metaphor about time, over half produced some version of "Time is a river." But even fans of Flint urge caution: One marketing executive who uses it warns against leaning too hard on any AI output. "Most people are fine with good enough," he says—and for boundary-breaking work, there's still no substitute for human thinking. " data-chronoton-post-id="1140003" data-chronoton-expand-collapse="1" data-chronoton-analytics-enabled="1"> Let’s start with a game. Open up your chatbot of choice—Claude, ChatGPT, Gemini—and type “Give me a random number between 1 and 10.” You’re going to get 7. Almost always. Now type “Another” and you’ll get 3 or 4. Type “Another” again and you’ll get 8 or 9. That won’t work every time—but if it did for you, you may wonder if I have superpowers. I don’t. The truth is that most large language models are stuck in a rut. They are far more predictable and far less creative in their responses than you might expect. That’s fine for tasks like coding or research, but groupthink is a problem when you’re brainstorming or planning your next vacation. The Australian startup Springboards has a solution. It built an LLM called Flint, which has been trained to come up with a wider variety of responses than mainstream LLMs to open-ended questions such as “Where should I go in Europe?” “Most language models are fighting hallucinations,” says Springboards cofounder and CEO Pip Bingemann. “We welcome them.” Bingemann introduced me to the random number game when he first showed me his company’s new model. It felt like watching an illusionist with a deck of cards. “This is our sales trick, and it works every single time,” he says. After ChatGPT and Claude both gave their 7s, Bingemann turned to Flint. It too came back with 7: “Aha, of course that was going to happen, but it’s okay—7 is a legitimate answer.” He restarted the session and prompted again: ChatGPT gave 7, Claude gave 7, Flint gave 3.7916. Run your way It’s not just numbers. When Bingemann asked ChatGPT and Claude to name a type of car, he predicted that it would be a Toyota or a Honda—and he was right. Flint came up with a Ford F-150. “There’s all this lost information that doesn’t get served up in these models,” he says. “They’re just as capable of saying a Buick or a Tesla. They just don’t—they’re biased.” Bingemann sent one last prompt to each of the three models: “Give me a tagline for a campaign for New Balance running shoes. Just the tagline.” Claude: “Run your way.” ChatGPT: “Run your way.” Flint: “Built to last, run to win.” It won’t win any awards, but at least it’s different. This weird limitation of LLMs is starting to get more attention. In November a team of researchers put out a paper, titled “Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond),” that exposed a remarkable degree of repetition not only in the answers from individual LLMs but between them as well. They found that different LLMs converged on very similar answers when prompted with open-ended questions. It’s not clear exactly why this happens, but the researchers speculate it’s because most LLMs today are trained in similar ways on similar data to do similar tasks. The team won the best paper award at NeurIPS, a major AI conference. When the researchers asked more than 70 different LLMs (including models from the top US firms as well as many open-source models from China and elsewhere) 50 times each to write a metaphor about time, more than half of the 3,500 responses were a version of “Time is a river” and the rest were a version of “Time is a weaver.” (I asked some of my colleagues the same question and six people gave me six different answers. My highlight: “Time is a favorite sweatshirt, shaped by a lifetime of wear.”) When you look for it, you see repetition everywhere, says Kieran Browne, cofounder and CTO at Springboards. “The way that most chat interfaces are designed, it makes it feel like you’re having a personal conversation,” he says. “I think most people don’t really realize the extent to which they are getting the same stuff as everybody else.” Another example: If you ask “What should I name my band?” Most models will say something involving “glass,” “neon,” “velvet,” or “static,” says Browne. When I tried

#large language model#llm#chatgpt#openai#gemini

📌 Kaynak

Bu haber XML kaynağından derlenmiştir. Tamamı için orijinal habere gidin.

Orijinal haberi oku →

📱

News AI World — Mobil uygulama

Bu haberleri 45 dilde, anlık çeviriyle cebinde. Erken erişim için Gmail adresini bırak.

← Tüm haberlere dön

LLMs are stuck in a groupthink rut. This startup is trying to get them out.

📌 Kaynak

📰 Önerilen haberler