Meta, like every major tech company today, has its flagship generative AI model called Llama. What sets Llama apart is its open nature, allowing developers to download and use it with few limitations. This is unlike models such as Anthropic’s Claude, OpenAI’s GPT-4, and Google’s Gemini, which are accessible only through APIs. To offer developers more choices, Meta has partnered with vendors like AWS, Google Cloud, and Microsoft Azure to provide cloud-hosted versions of Llama. Additionally, Meta has rolled out tools to facilitate easier fine-tuning and customization of the model.
Here's everything you need to know about Llama from its capabilities and editions to where you can use it. We will keep this post updated as Meta releases upgrades and new developer tools for the model. Llama is not a single model but a family of models: Llama 8B, Llama 70B, and Llama 405B. The latest versions are Llama 3.1 8B, Llama 3.1 70B, and Llama 3.1 405B, which were released in July 2024. These models are trained on web pages in various languages, public code, and online files, including synthetic data.
Llama 3.1 8B and 70B are compact models optimized for devices like laptops and servers. Meanwhile, Llama 3.1 405B is a large-scale model that usually requires data center hardware. Although Llama 3.1 8B and 70B are less capable than Llama 3.1 405B, they are faster and designed to have lower storage overhead and latency. All Llama models feature 128,000-token context windows, which help maintain context over long documents, approximately 300 pages or 100,000 words.
Llama excels in various assistive tasks like coding, answering math questions, and summarizing documents in eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Most text-based workloads are manageable with Llama, although it cannot process or generate images. However, this may change soon. Llama models can integrate with third-party apps, tools, and APIs, using Brave Search for recent event queries, Wolfram Alpha API for math and science questions, and a Python interpreter for code validation.
Meta claims that Llama 3.1 models can utilize tools they haven’t encountered before, though their reliability in doing so is another matter. For personal use, you can interact with Llama via Meta AI chatbot experiences on Facebook Messenger, WhatsApp, Instagram, Oculus, and Meta.ai. Developers can download and fine-tune Llama models across major cloud platforms, with support from over 25 partners like Nvidia, Databricks, Groq, Dell, and Snowflake. These partners have developed tools and services on top of Llama to enhance its capabilities, such as referencing proprietary data or running at lower latencies.
Meta recommends using the smaller models, Llama 8B and 70B, for general applications like chatbots and code generation. The 405B model is best suited for model distillation and generating synthetic data to train alternative models. The Llama license restricts deployment for app developers with more than 700 million monthly users unless they request a special license from Meta. Meta offers several tools designed to make Llama safer to use, including Llama Guard, Prompt Guard, and CyberSecEval.
Llama Guard moderates content fed into or generated by Llama, blocking categories like criminal activity, hate speech, and child exploitation. Prompt Guard is designed to protect against prompt injection attacks aiming to manipulate Llama. CyberSecEval assesses the cybersecurity risks associated with using Llama models. Despite its many features, Llama has its limitations. The model's training data may include copyrighted content, potentially exposing users to infringement risks. This issue has already sparked legal battles, including a lawsuit by authors like Sarah Silverman.
Llama can also produce buggy or insecure code, necessitating human review before deploying AI-generated code in services or software. Meta’s controversial data training practices, using Instagram and Facebook posts without easy opt-out options, add another layer of concern.