Microsoft has released Orca 2, and the goal is to explore the possibilities of smaller language models with about 10 billion parameters or less.
The model demonstrates that improved learning methods can enhance the reasoning abilities of smaller language models to make them equivalent to larger models.
“Orca 2’s success lies in its application of diverse reasoning techniques and the identification of optimal solutions for various tasks. While it has several limitations, including limitations inherited from its base models and common to other language models, Orca 2’s potential for future advancements is evident, especially in improved reasoning, specialization, control, and safety of smaller models. The use of carefully filtered synthetic data for post-training emerges as a key strategy in these improvements,” the Microsoft team wrote in the previously mentioned blog post.
A Microsoft blog post says that compared to models of similar size, including the original Orca, Orca 2 significantly outperforms them and achieves performance levels similar to or better than those of models that are 5-10 times larger.
“Our findings underscore the value of smaller models in scenarios where efficiency and capability need to be balanced. As larger models continue to excel, our work with Orca 2 marks a significant step in diversifying the applications and deployment options of language models”, the Microsoft team added in a blog post.
Available in two sizes (7 billion and 13 billion parameters), both sizes of Orca 2 have undergone fine-tuning using specialized, high-quality synthetic data sourced from LLAMA 2 base models. Microsoft has made the weights of Orca 2 publicly accessible, aiming to promote additional research in the advancement, assessment, and alignment of smaller language models.
Elaborate instructions and numerous queries were employed to elicit responses from the teacher model, enabling the student model to grasp the underlying strategies and reasoning capabilities even in the absence of explicit task instructions. The objective is to enhance the performance of smaller models by customizing solution strategies according to the specific task at hand.