As the world marvels at GPT-4o-mini, Apple has decided to expand its family of AI models. Recently, Apple's research team, part of the DataComp for Language Models project, launched a series of DCLM models on Hugging Face.

Apple introduced two main models: one with 7 billion parameters and another with 1.4 billion. The 7-billion parameter model has shown exceptional performance, surpassing Mistral-7B and nearing leading models like Llama 3 and Gemma. Vaishaal Shankar from Apple noted these as the best-performing open-source models to date.

These models are not only effective but also fully open-source. Apple shared the model weights, training code, and pretraining dataset, fostering collaboration and progress in the AI community. Both models are available under licenses that allow commercial use, distribution, and modification. However, they are still in early research phases and may exhibit biases or inappropriate responses due to the test data used. These developments highlight the importance of data curation in training language models and provide a solid foundation for future research in this field.

 

Multidisciplinary Collaboration

The DataComp project is a collaborative effort involving researchers from Apple, the University of Washington, Tel Aviv University, and the Toyota Research Institute. Using a standardized framework, the team has experimented with different data curation strategies to train highly efficient models.

The DCLM-Baseline-7B model, trained with 2.5 trillion tokens, achieved 63.7% accuracy in the MMLU test, surpassing the previous leader in the open-data language model category. The smaller model, with 1.4 billion parameters, also showed impressive performance, competing strongly with other models in its category.

 

A New Step Forward

The launch of Apple Intelligence a few months ago marks a significant change for a company that had waited for the right moment to enter the AI arena. Apple introduced an innovative and distinctive proposal with a hybrid AI that combines local and cloud capabilities, ensuring user privacy.

Tim Cook emphasized that Apple Intelligence should be powerful, intuitive, and private, aligning with Apple's strategy. Craig Federighi, head of software, explained that "local model execution is prioritized to avoid collecting personal data."

This AI system utilizes the power of the latest iPad, iPhone, and Mac chips to run generative models without compromising user privacy. Apple Intelligence uses a semantic index based on user information to deliver accurate results. However, it also incorporates cloud functions to handle more complex requests, using Apple's Private Cloud Compute, which processes data without storing it, ensuring it disappears after use. With this combination of local and cloud AI, Apple aims to offer a powerful and secure solution, positioning itself strongly in the AI market.