
Axera NPU completes Llama 3 and Phi-3 large model adaptation, promoting widespread application of AI large model technology
Background
The continuous advancement of AI large model technology is driving the智能化升级 of thousands of industries. Recently, Meta and Microsoft have successively released their landmark Llama 3 series and Phi-3 series models. The Llama 3 series includes two specifications: 8B and 70B, while the Phi-3 series includes three specifications: mini (3.8B), small (7B), and medium (14B). To further provide developers with more opportunities to experience these models, Axera's NPU toolchain team has quickly responded by completing the adaptation of the Llama 3 8B and Phi-3-mini models based on the AX650N platform.
Llama 3
Last Friday, Meta released the Meta Llama 3 series of language models (LLM), specifically including an 8B model and a 70B model. In testing benchmarks, the Llama 3 models performed exceptionally well, competing with popular closed-source models in terms of practicality and security evaluation.

Official website: https://llama.meta.com/llama3
At the architectural level, Llama 3 chooses the standard decoder-only Transformer architecture and uses a tokenizer with a 128K token vocabulary. Llama 3 was trained using over 15T of public data, with 5% being non-English data covering more than 30 languages. The training data volume is seven times that of the previous generation, Llama 2.
According to Meta's test results, the Llama 3 8B model outperformed Gemma 7B and Mistral 7B Instruct on multiple performance benchmarks including MMLU, GPQA, and HumanEval. The 70B model surpassed the renowned closed-source model Claude 3's intermediate version Sonnet, and compared to Google's Gemini Pro 1.5, it won three times and lost twice.

On-device Results
Currently, AX650N has completed the adaptation of the Llama 3 8B Int8 version. If Int4 quantization is used, the tokens per second can be doubled, which is sufficient for normal human-computer interaction.
Phi-3
Shortly after the release of Llama 3, competitors have arrived, and they are small-sized models that can run on mobile phones.
This Tuesday, Microsoft released its self-developed small-sized model Phi-3. Although Phi-3-mini is optimized for deployment on mobile phones, its performance can compete with models like Mixtral 8x7B and GPT-3.5. Microsoft states that the innovation primarily lies in using a higher-quality training dataset.

Online DEMO:
https://ai.azure.com/explore/models/Phi-3-mini-4k-instruct/version/2/registry/azureml

On-device Results
Currently, AX650N has completed the adaptation of the Phi-3-mini Int8 version, which is sufficient for normal human-computer interaction.
Other Achievements
AX650N's ability to complete the adaptation of Llama 3 and Phi-3 in a timely manner stems from the quiet completion of the optimization of the existing NPU toolchain for large language models earlier this year. Currently, in addition to Llama 3, adaptations for mainstream open-source large language models have been completed, including Llama 2, TinyLlama, Phi-2, Qwen1.5, and ChatGLM3.

The related achievements have been released to the developer community (https://github.com/AXERA-TECH/ax-llm). Welcome to try them out.
Future Plans
This year is the元年 of AIPC. We will provide more solutions for common AIPC applications. We will give full play to the high energy efficiency ratio of AxeraNeurons NPU technology, enabling various interesting large models to achieve cost-effective local deployment, making "large models affordable for everyone" and practicing "Inclusive AI for Better Life."
