On April 5, Meta unveiled the first two versions of Llama 4: Scout and Maverick. These open models, designed to be natively multimodal, can process text, images, and videos. Thanks to a distillation process based on Llama 4 Behemoth, the largest and most powerful model in the series still under development, they stand out for their execution speed and efficiency.
Unlike previous versions of Llama, which used a Transformer-based architecture, Llama 4 relies on a Mixture-of-Experts (MoE) architecture, where only certain parameters are activated for each request. This approach improves efficiency and reduces computational cost, with Scout and Maverick activating "only" 17 billion parameters per request. Meta has also expanded language support from 8 for Llama 3.3, introduced last December, to 12. However, it should be noted that their knowledge base stops in August 2024.

Llama 4 Scout: An Unprecedented Context Window

With 109 billion parameters, Scout is the lightest of the series. Its main advantage is a contextual window that can reach up to 10 million tokens, a leap from Llama 3's 128,000 tokens. In addition to the fact that it can operate on a single Nvidia H100 card, the model is particularly suited for companies with limited resources for tasks such as multi-document synthesis or in-depth user activity analysis. According to Meta's internal evaluations, Llama 4 Scout offers better performance than Gemma 3, Mistral 3.1, and Gemini 2.0 Flash Lite.

Llama 4 Maverick

Maverick, with 400 billion parameters and 128 experts, is more powerful. With a context window of 1 million tokens, it paves the way for applications such as:
  • Intelligent enterprise agents capable of reasoning and interacting securely with various tools and workflows;
  • Document understanding systems, facilitating the extraction of structured data from PDFs, scans, and forms;
  • Multilingual assistants offering nuanced responses adapted to cultural specifics;
  • Creation tools dedicated to writing stories, marketing content, and personalized texts.
According to Meta, Maverick Llama 4 is the best multimodal model in its category, outperforming GPT-4o and Gemini 2.0 Flash across a wide range of benchmarks. It also achieves comparable results to DeepSeek v3 in reasoning and programming while using less than half of the active parameters.
Moreover, Llama 4 Maverick offers exceptional performance/cost ratio, with an experimental chat version reaching an ELO score of 1417 on LMArena.

Llama 4 Behemoth

Meta also shared a preview of Llama 4 Behemoth, the 2 trillion parameter (2,000 billion) teacher model that served as the basis for these first two versions. With 288 billion active parameters and 16 experts, it delivers top-tier performance even though it is not a specialized reasoning model. According to the company, it excels in mathematics, multilingualism, and image benchmarks.
According to its evaluations, Behemoth thus surpasses GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on various benchmarks.

Availability

Scout and Maverick are available on Llama.com and Hugging Face, distributed as open source under Meta's license, which requires companies with more than 700 million active monthly users to seek permission to use them. While Llama 4 already powers Meta AI, the AI assistant on its platforms WhatsApp, Messenger, Instagram in about forty countries, the multimodal features are currently only available in the United States. In the EU, there is still a wait...