MemotivaLLM Engineer Interview Questions: Transformer Architecture, Self-Attention, and Modern LLM Foundations

What is Mixture of Experts (MoE)?

LLM Engineer Interview Questions: Transformer Architecture, Self-Attention, and Modern LLM Foundations

Audio flashcard · 0:19

Nortren·

What is Mixture of Experts (MoE)?

0:19

Mixture of Experts is an architecture where the feedforward layers are replaced with multiple expert networks, but only a subset is activated for each token. A learned router decides which experts to use. This allows the model to have many more parameters in total while keeping per-token compute low. Models like Mixtral, DeepSeek V3, and GPT-4 use MoE architectures.
huggingface.co