GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Scaling language models with mixture-of-experts architecture for efficient training

December 29th, 2024
About GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Scaling language models with more data, compute, and parameters has driven significant progress in natural language processing. However, training these large dense models requires significant amounts of computing resources. In this paper, GLaM (Generalist Language Model) is proposed, which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while incurring substantially less training cost compared to dense variants. GLaM achieves strong results on in-context lea...
Key Features
- Efficient scaling of language models.
- Mixture-of-experts architecture.
- Scalability to large model capacity.
- Significantly reduced training cost
Use Cases
- Natural language processing.
- In-context learning tasks.
- Zero-shot
- one-shot
- and few-shot learning.
Loading reviews...