Question

What is Mixture of Experts (MoE)?

Accepted Answer

Mixture of Experts is an architecture where the feedforward layers are replaced with multiple expert networks, but only a subset is activated for each token. A learned router decides which experts to use. This allows the model to have many more parameters in total while keeping per-token compute low. Models like Mixtral, DeepSeek V3, and GPT-4 use MoE architectures.