Meta AI and Papers with Code recently released Galactica, a scientific language model with 120 billion parameters that can search and summarize academic literature, solve math problems, and write scientific code.

Galactica’s architecture is based on a transformer, an attention mechanism which draws global dependencies between input and output.

Some of the changes compared with the original transformer include using  GeLU as an activation function, learnt position embedding,  a vocabulary using byte pair encoding method and no bias parameter on dense-kernel or layer-norms.

The researchers trained the model using a tokenization process with various modalities like natural language versus math formulas versus molecular sequences, etc.

The source material for the dataset included 48-million papers, textbooks, reference materials, compounds, proteins and other sources of scientific knowledge. They implemented a special token to identify sections of step-by-step reasoning, which encourages Galactica to apply an internal working memory of sorts, which it would otherwise not be able to do.

Galactica has a few limitations. It can trend toward using toxic language, a behavior known as “hallucination”. Other limitations are frequency bias and overconfidence, especially about highly specialized scientific content.

Tags: , , , , , , , , , , , , , , , ,
Editor @ DevStyleR