Microsoft has introduced a new set of lightweight, open-source AI models that are reported to outperform Google’s Gemini 1.5 Flash, Meta’s Llama 3.1, and OpenAI’s GPT-4o in certain areas.
These models – Phi-3.5-mini-instruct, Phi-3.5-Mixture of Experts (MoE)-instruct, and Phi-3.5-vision-instruct – are the latest additions to Microsoft’s small-language models (SLMs) lineup, called the Phi-3 series. The first in this series, the Phi-3-mini, was launched in April of this year.
The Phi-3.5-mini-instruct has 3.82 billion parameters, while the Phi-3.5-MoE-instruct features 41.9 billion parameters, with only 6.6 billion active at any given time. The Phi-3.5-vision-instruct includes 4.15 billion parameters. The number of parameters in an AI model indicates its size and gives an estimate of the model’s knowledge and capabilities acquired through machine learning.
All three Phi 3.5 models support a context window of up to 128k tokens. The context window size, measured in tokens, represents the amount of information the AI model can process and generate at one time. A larger context window allows the model to handle more extensive text, images, audio, code, video, and other data types.
Microsoft reports that the Phi-3.5 Mini model was trained over ten days using 3.4 trillion tokens, while the Phi-3.5 MoE model underwent training for 23 days on 4.9 trillion tokens. The Phi-3.5 Vision model required six days and 500 billion tokens for training. The datasets used for training these models consisted of high-quality, reasoning-intensive, publicly available data.
The Phi-3.5 Mini is designed for quick, basic reasoning tasks, making it ideal for generating code or solving mathematical and logical problems. The Phi-3.5 MoE model, which combines multiple specialized models, can tackle more complex AI tasks across various languages.
The Phi-3.5 Vision model, being multimodal, can handle both text and images, enabling it to perform visual tasks like summarizing videos or analyzing charts and tables.
These models are available for download, customization, and integration at no cost under an open-source license, allowing developers to freely use and modify them for commercial purposes. They can be accessed via Hugging Face, an AI cloud hosting platform.