arxiv.org
- The Phi-3 Technical Report introduces phi-3-mini, a compact 3.8 billion parameter language model trained on 3.3 trillion tokens, designed to run locally on smartphones, with performance comparable to larger models like Mixtral 8x7B and GPT-3.5.
- This model's innovation stems from its training dataset, an expanded version of the one used for phi-2, featuring heavily filtered web data and synthetic data, aimed at enhancing robustness, safety, and chat format alignment.
- The report also discusses initial results from scaling the model to 7 billion and 14 billion parameters (phi-3-small and phi-3-medium), showing significant performance improvements over phi-3-mini on benchmarks such as MMLU and MT-bench.
- This model's innovation stems from its training dataset, an expanded version of the one used for phi-2, featuring heavily filtered web data and synthetic data, aimed at enhancing robustness, safety, and chat format alignment.
- The report also discusses initial results from scaling the model to 7 billion and 14 billion parameters (phi-3-small and phi-3-medium), showing significant performance improvements over phi-3-mini on benchmarks such as MMLU and MT-bench.