Why Aren’t We Measuring How AI Affects Humans?

🤖 Yapay Zeka 📰 spectrumieee 🕐 2 gün önce
Why Aren’t We Measuring How AI Affects Humans?

As AI systems become more capable, a lot of resources and effort are being put toward measuring their abilities. Researchers look at technical evaluation metrics, subject AIs to reasoning tests, track their throughput, and much more. But there’s one key metric that often gets overlooked, and it’s arguably the most important of all: What is AI doing to humans? Imran Khan leads psychosocial evaluation of AI at the nonprofit Center for Humane Technology . In a recent essay publi

As AI systems become more capable, a lot of resources and effort are being put toward measuring their abilities. Researchers look at technical evaluation metrics, subject AIs to reasoning tests, track their throughput, and much more. But there’s one key metric that often gets overlooked, and it’s arguably the most important of all: What is AI doing to humans? Imran Khan leads psychosocial evaluation of AI at the nonprofit Center for Humane Technology . In a recent essay published on the organization’s Substack, Khan points out that we’re deploying AI tools capable of reshaping our cognition, relationships, and behavior, but with little systematic effort to measure the downstream impacts they’re having on us. The push to look more closely at AI’s psychosocial effects is similar to debates that emerged around social media and its harms, but Khan believes AI could have even broader and more intimate effects. The focus on measuring AI performance and progress misses the question of whether the technology is ultimately helping humans flourish—or eroding some of our most fundamental capacities. IEEE Spectrum spoke with Khan about why AI evaluation is so narrowly focused, what meaningful measurement of human outcomes might look like, and whether the AI industry has incentives to ask these questions at all. The missing question about AI model performance In your essay, you argue that we’ve become very good at measuring what AI systems can do, but bad at measuring what they do to humans. What made you realize this was the missing question? Khan: If you spend any time in and around the AI development space, you see this amazing progress in terms of what models are capable of, with graphs of how well different models perform on tests like SWE-bench or humanity’s last exam or LLM arena. There’s a competitive dynamic to how AI companies want to progress and be known for their models being the best. You see that impressive data, but then you also see these scary and dangerous things that happen in the real world, like teenagers dying by suicide and people succumbing to AI psychosis. So on the one hand, we’re devoting an incredible amount of energy to measuring how AI does on these sometimes quite abstruse things that have limited relevance to most people’s day-to-day lives. And on the other hand, AI is impacting human well-being, and we’re measuring that much less. It seemed like a strange paradox that the things we should care about most, we’re measuring least. Your essay points out that with social media, harms were already entrenched by the time the evidence was strong enough to act on them. Do you think AI is already producing measurable harms at scale, or are we still in an early-warning phase? What differences might there be in how quickly harm from AI evolves? Khan: There are some really high-profile cases that I think are the tip of the iceberg—the teen suicides, AI psychosis, people spending immense amounts of time or money engaging with these AI chatbots that are designed to be incredibly sycophantic . I think those harms are already there. Yet there is plenty we can do. Because of public pressure, OpenAI had to tweak one of its ChatGPT models due to public concerns about sycophancy. It’s a high-profile example of how the labs will pay attention and respond to scrutiny. So there is potential to change the direction of the technology to make it still useful, but less harmful . If we can measure some of those harms, that’s part of the ammunition we’d have to inform that. Where it feels trickier is the question of harms on the societal level. What’s going to happen to romantic relationships, to families, to teenagers’ identities as a result of people using AI every day for months and years? I worry that if we don’t start measuring those kinds of phenomena soon, it will become too late to make a difference. AI companies would likely argue that their users value convenience and productivity above all else. What would you say to this claim? Khan: If you put a doughnut in front of me right now, I would probably not have the willpower to not eat it. Yet I also want to control my sugar intake and eat healthy. But technology design often gets boiled down to “well, we’re just trying to give the users what they want, and what the users want is defined by what choice they make in an individual moment.” This is the complexity of what it means to be a human and a consumer: We want contradictory things. We need to understand not just the choice a user might make when they’re busy or in a high-stress moment, but what they want a healthy relationship with this technology to look like. In the moment we often want low friction. But I don’t think any of us believe that a low-friction life is the most fulfilling or gives us the most learning and agency. So I think it’s asking a subtly different question, which is not what people choos

#llm#chatgpt#openai#anthropic#copilot

📌 Kaynak

Bu özet spectrumieee kaynağından otomatik derlenmiştir. Tamamı için orijinal habere gidin.

Orijinal haberi oku →
← Tüm haberlere dön