Researchers say they trained a foundation model from scratch for about $1,500
Training a foundation LLM from scratch costs millions and requires internet-scale data — which is why most enterprises don't bother. Sapient thinks it has a cheaper path. To overcome this brute-force scaling dogma, researchers at Sapient developed HRM-Text , which replaces standard Transformers with a highly sample-efficient Hierarchical Recurrent Model (HRM), an architecture they first introduced last year . HRM decouples computation into slow-evolving strategic and fast-evo
Researchers have developed a new method for training large language models (LLMs) that significantly reduces costs and computational resources. Their approach, called HRM-Text, utilizes a Hierarchical Recurrent Model (HRM) architecture instead of traditional Transformers. This model is trained exclusively on instruction-response pairs, mimicking real-world enterprise use cases. A 1-billion parameter version was trained from scratch for approximately $1,500, demonstrating performance competitive with much larger models.
This development democratizes access to powerful AI models, allowing organizations with limited resources to train their own capable reasoning systems affordably.
📌 Kaynak
Bu özet VentureBeat kaynağından otomatik derlenmiştir. Tamamı için orijinal habere gidin.
Orijinal haberi oku →