Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on multiple long-horizon coding benchmarks for 1/6th the cost

🤖 Yapay Zekâ 📰 United States 🕐 2 saat önce
Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on multiple long-horizon coding benchmarks for 1/6th the cost

Today, Chinese AI startup Z.ai (formerly Zhipu AI) announced the immediate release of GLM-5.2 , a 753-billion parameter open-weights large language model (LLM) engineered specifically to dominate "long-horizon" autonomous coding and engineering tasks. Available immediately on Hugging Face , the Z.ai API , and more than 20 third-party coding environments, the model boasts a highly stable 1-million-token context window alongside enterprise subscription tiers starting at just $1

Today, Chinese AI startup Z.ai (formerly Zhipu AI) announced the immediate release of GLM-5.2 , a 753-billion parameter open-weights large language model (LLM) engineered specifically to dominate "long-horizon" autonomous coding and engineering tasks. Available immediately on Hugging Face , the Z.ai API , and more than 20 third-party coding environments, the model boasts a highly stable 1-million-token context window alongside enterprise subscription tiers starting at just $12.60 per month. In excellent news for cost and security-conscious businesses, z.ai has released GLM-5.2's core weights under an unrestricted MIT open-source license , allowing enterprises to download the model freely from Hugging Face, customize or fine-tune it to their liking, and run it potentially locally or via virtual machines for only the cost of their compute and electricity. This is an increasingly appealing option for enterprises, as state-of-the-art American proprietary models face an uncertain and potentially interrupted regulatory future, following the Trump Administration's export control directive last week prohibiting foreign nationals from using Anthropic's new Claude Fable 5 model (which that company responded to by taking the models in question entirely offline for all users). For enterprise technical decision-makers, z.ai's GLM-5.2 provides a highly capable path to host frontier-level AI locally, entirely bypassing the geographic fencing and commercial limitations. IndexShare re-uses one indexer for every four sparse attention layers, reducing compute needs Under the hood, GLM-5.2 operates with 753 billion parameters and introduces a major architectural optimization called "IndexShare". In standard massive language models, recalculating attention mechanisms across long documents is computationally exorbitant. IndexShare solves this by reusing the identical indexer across every four sparse attention layers. At the maximum 1-million-token context length, this single innovation reduces per-token compute FLOPs by a massive 2.9 times. The model also features an upgraded Multi-Token Prediction (MTP) layer for speculative decoding, which boosts accepted token length by up to 20% during inference. Additionally, Z.ai has implemented flexible, selectable "Thinking Modes". Users can toggle the model's reasoning effort between "Max," designed to push the limits of logical problem-solving, or "High," which strikes a careful balance between high-end performance and latency-sensitive token efficiency. State-of-the-art benchmarks for an open model, and matching, even beating proprietary leaders on some categories On industry-standard third-party benchmark tests, GLM-5.2 performs above most open source flagship models, even DeepSeek v4 and scores near or above its closed-weights rivals, OpenAI's GPT-5.5 and Anthropic's Claude Opus 4.8. The model particularly shines in agentic tool use and long-horizon software engineering tasks: SWE-bench Pro: GLM-5.2 scored 62.1, decisively beating GPT-5.5 (58.6) and its own predecessor, GLM-5.1 (58.4). FrontierSWE (Dominance): Designed to test long-horizon task completion, GLM-5.2 hit 74.4%, surpassing GPT-5.5 (72.6%) and finishing in a near-tie with Claude Opus 4.8 (75.1%). MCP-Atlas: On this tool-usage evaluation, GLM-5.2 achieved a 77.0, outscoring GPT-5.5 (75.3) and performing just shy of Claude Opus 4.8 (77.8). Humanity's Last Exam (w/ Tools): When equipped with external tools, GLM-5.2 reached a score of 54.7, coming out ahead of GPT-5.5 (52.2) and tracking closely behind Claude Opus 4.8 (57.9). PostTrainBench & SWE-Marathon: In extended, multi-hour engineering workloads, GLM-5.2 consistently topped GPT-5.5, scoring 34.3% against GPT-5.5's 25.0% on PostTrainBench, and 13.0% against GPT-5.5's 12.0% on SWE-Marathon. While GLM-5.2 trails Claude Opus 4.8 and GPT-5.5 slightly on raw Terminal-Bench 2.1 scores (81.0 versus 85.0 and 84.0, respectively), it significantly outscores Google's Gemini 3.1 Pro (74.0). Beyond traditional coding metrics, GLM-5.2 took an impressive first place on the crowdsourced design task benchmark Design Arena , beating out even the aforementioned state-of-the-art Claude Fable 5 with an ELO score of 1360. Furthermore, the impact of Z.ai's new selectable "thinking modes" is clearly visible in the data: under the "Max" effort level, GLM-5.2 pushes to peak intelligence, but utilizes nearly 85k output tokens per task. Switching to the "High" effort setting sacrifices only a few points in performance while effectively halving the required token output, providing a crucial optimization lever for latency-sensitive applications. Available via Coding Plans and API To operationalize the model, Z.ai launched the GLM Coding Plan , aiming squarely at developer workflows rather than simple chat interfaces. The plan offers out-of-the-box support for third-party U.S. and global agentic coding

#large language model#llm#gpt-#openai#anthropic

📌 Kaynak

Bu haber XML kaynağından derlenmiştir. Tamamı için orijinal habere gidin.

Orijinal haberi oku →
📱
News AI World — Mobil uygulama
Bu haberleri 45 dilde, anlık çeviriyle cebinde. Erken erişim için Gmail adresini bırak.
← Tüm haberlere dön