What is Mixture of Experts (MoE)?
LLM Engineer Interview Questions: Transformer Architecture, Self-Attention, and Modern LLM Foundations
Audio flashcard · 0:19Nortren·
What is Mixture of Experts (MoE)?
0:19
Mixture of Experts is an architecture where the feedforward layers are replaced with multiple expert networks, but only a subset is activated for each token. A learned router decides which experts to use. This allows the model to have many more parameters in total while keeping per-token compute low. Models like Mixtral, DeepSeek V3, and GPT-4 use MoE architectures.
huggingface.co