AI-HUB | Open Source Weights, Code, and Dataset; Performance Surpassing Mistral-7B; Apple’s Small Model is Here

OpenAI has launched its small model GPT-4o-mini, officially opening the small model competition. Recently, Apple has also joined this race.

Apple, as one of the research institutions for the DataComp-LM (DCLM) project, has released the DCLM-7B open-source model on Hugging Face. This model's performance has already surpassed Mistral-7B and is approaching other leading open-source models, including Llama 3 and Gemma.

A current evaluation challenge for large language models (LLMs) is the lack of controlled comparisons. LLM research often compares models that use different architectures, computations, or hyperparameters, making it difficult to clarify the factors that influence the quality of language models.

In response, the research team proposed a new benchmark for comparing language model data—DCLM. This is the first benchmark for the curation of training data for language models, aimed at improving model performance by designing high-quality datasets, especially in the multimodal domain.

The research team found that model-based filtering, which involves using machine learning (ML) models to automatically filter and select high-quality data from larger datasets, could be key to building high-quality training sets.

The overall idea of DCLM is straightforward: use a standardized framework for experimentation, including fixed model architectures, training code, hyperparameters, and evaluations, to ultimately determine which data curation strategy is best suited for training high-performance models.

Using DCLM, the research team constructed a high-quality dataset called DCLM-BASELINE and trained a 7B parameter model—DCLM-7B—from scratch using this dataset.

DCLM-7B employs a pre-training scheme based on the OpenLM framework, achieving a 5-shot accuracy of 64% on the MMLU benchmark, comparable to Mistral-7B-v0.3 (63%) and Llama 3 8B (66%). Additionally, its average performance across 53 natural language understanding tasks is also comparable to Mistral-7B-v0.3 and Llama 3 8B, while requiring only 1/6 of the computational resources of Llama 3 8B.

Most other models have open weights but closed data. This is why Vaishaal Shankar describes the DCLM model as "truly open source."

©️Copyright Notice: Without special notice, all articles on this site are copyrighted by AI-HUB

Similar ToOpen Source Weights, Code, and Dataset; Performance Surpassing Mistral-7B; Apple’s Small Model is Here

The former CEO of iRobot has returned to the robotics field and launched a home robot for health companionship

Meta has launched an open-source AI virtual fitting model called Leffa: Retaining More Details

A new feature has been launched on YouTube: Creators are allowed to authorize third parties to use their videos to train AI

Midjourney Launches the World-Building Tool "Patchwork" for Multi-Person Collaboration, Supporting 100 People to Operate on the Same Canvas

Google Launches New AI Tool Deep Research to Help Users Conduct Online Research Easily