Model Training
Reinforcement Learning from Human Feedback (RLHF)
Training an AI to be more helpful and less harmful by having humans rate its outputs and feeding that feedback back into training.
Definition
A training technique where human preferences are used to fine-tune language models. Human evaluators rank model outputs, and a reward model is trained on these preferences to guide further training.
Why it matters
The technique that made ChatGPT conversational and helpful, the key innovation in aligning LLMs to human intent.
Related terms in Model Training
Adversarial Training
Teaching an AI to defend itself by constantly attacking it with tricky or malicious inputs during training.
Autoencoders
A neural network that learns to compress data into a small code and then unzip it back to the original.
Distillation (Model Distillation)
Teaching a small, fast AI model to mimic a large, expensive one, so you get similar results at a fraction of the cost.
Dropout
Randomly turning off some neurons during training so the AI doesn't over-memorize and can generalize better.
From vocabulary to outcomes
Ready to put Reinforcement Learning from Human Feedback (RLHF) to work?
Knowing the term is step one. Deploying it inside a revenue architecture that compounds is what Sophizo builds.
Book a Discovery Call