Meta AI and Papers with Code recently released Galactica, a scientific language model with 120 billion parameters that can search and summarize academic literature, solve math problems, and write scientific code.
Galactica’s architecture is based on a transformer, an attention mechanism which draws global dependencies between input and output.
Some of the changes compared with the original transformer include using GeLU as an activation function, learnt position embedding, a vocabulary using byte pair encoding method and no bias parameter on dense-kernel or layer-norms.
The researchers trained the model using a tokenization process with various modalities like natural language versus math formulas versus molecular sequences, etc.
The source material for the dataset included 48-million papers, textbooks, reference materials, compounds, proteins and other sources of scientific knowledge. They implemented a special token to identify sections of step-by-step reasoning, which encourages Galactica to apply an internal working memory of sorts, which it would otherwise not be able to do.
Galactica has a few limitations. It can trend toward using toxic language, a behavior known as “hallucination”. Other limitations are frequency bias and overconfidence, especially about highly specialized scientific content.