OpenAI introduces text-embedding-ada-002, a cutting-edge embedding model that combines the capabilities of five previous text search, text similarity, and code search models.
This new model outperforms the previous most capable, Davinci, on most tasks while being significantly more accurate at 99.8% lower cost. Additionally, text-embedding-ada-002 is easier to use, making it a more convenient option for users.
The embeddings in the new model are digital representations of concepts converted into sequences of numbers that make it easier for computers to understand the relationships between these concepts. Since the initial launch of the OpenAI/embeddings endpoint, many applications have incorporated embeddings for customizing, recommending, and searching content.
Model enhancements
Higher performance. text-embedding-ada-002 outperforms all legacy embedding models on text search, code search, and sentence similarity tasks and obtains comparable performance on text classification. For each task category, we evaluate the models on the datasets used in the old embeddings.
Merging capabilities. We have greatly simplified the interface of the /embeddings endpoint by merging the five separate models shown above (text-similarity, text-search-query, text-search-doc, code-search-text, and code-search-code) into one new model.
Longer context. The context length of the new model has been increased fourfold, from 2048 to 8192, making it more convenient for working with long documents.
Smaller embedding size. The new embeddings have only 1536 dimensions, which is one-eighth the size of the davinci-001 embeddings, making the new embeddings more cost-effective when working with vector databases.
Reduced cost. We have reduced the price of the new embeddings by 90% compared to the old embeddings of the same size.
With the introduction of text-embedding-ada-002, embedding technology has advanced significantly. It is an invaluable tool for a variety of applications and users due to its powerful combination of efficiency, affordability and usability.