Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Cohere today released two new open-weight models in its Aya project to close the language gap in foundation models.
Aya Expanse 8B and 35B, now available on Hugging Face, expands performance advancements in 23 languages. Cohere said in a blog post the 8B parameter model "makes breakthroughs more accessible to researchers worldwide," while the 32B parameter model provides state-of-the-art multilingual capabilities.
The Aya project seeks to expand access to foundation models in more global languages than English. Cohere for AI, the company's research arm, launched the Aya initiative last year. In February, it released the Aya 101 large language model (LLM), a 13-billion-parameter model covering 101 languages. Cohere for AI also released the Aya dataset to help expand access to other languages for model training.
Aya Expanse uses much of the same recipe used to build Aya 101.
"The improvements in Aya Expanse are the result of a sustained focus on expanding how AI serves languages around the world by rethinking the core building blocks of machine learning breakthroughs," Cohere said. "Our research agenda for the last few years has included a dedicated focus on bridging the language gap, with several breakthroughs that were critical to the current recipe: data arbitrage, preference training for general performance and safety, and finally model merging."
Aya performs well
Cohere said the two Aya Expanse models consistently outperformed similar-sized AI models from Google, Mistral and Meta.
Aya Expanse 32B did better in benchmark multilingual tests than Gemma 2 27B, Mistral 8x22B and even the much larger Llama 3.1 70B. The smaller 8B also performed better than Gemma 2 9B, Llama 3.1 8B and Ministral 8B.
Cohere developed the Aya models using a data sampling method called data arbitrage as a means to avoid the generation of gibberish that happens when models rely on synthetic data. Many models use synthetic data created from a "teacher" model for training purposes. However, due to the difficulty in finding good teacher models for other languages, especially for low-resource languages.
It also focused on guiding the models toward "global preferences" and accounting for different cultural and linguistic perspectives. Cohere said it figured out a way to improve performance and safety even while guiding the models' preferences.
"We think of it as the 'final sparkle' in training an AI model," the company said. "However, preference training and safety measures often overfit to harms pre ...