MemotivaLLM Engineer Interview Questions: Fine-Tuning, LoRA, QLoRA, PEFT, and Instruction Tuning

What is RLHF?

LLM Engineer Interview Questions: Fine-Tuning, LoRA, QLoRA, PEFT, and Instruction Tuning

Audio flashcard · 0:19

Nortren·

What is RLHF?

0:19

RLHF stands for Reinforcement Learning from Human Feedback. It is a multi-stage process: first train a reward model on human preference rankings of model outputs, then use reinforcement learning, typically PPO, to fine-tune the language model to produce outputs that maximize the reward. RLHF is what makes models like ChatGPT and Claude follow instructions and refuse harmful requests.
arxiv.org