Evolution of Preference Optimization Techniques
November 13, 2024 | 5 min readRLHF is not the only method for AI alignment. This article introduces modern algorithms like DPO and KTO that offer simpler and more stable alternatives.
Under the sea, in the hippocampus's garden...
RLHF is not the only method for AI alignment. This article introduces modern algorithms like DPO and KTO that offer simpler and more stable alternatives.
This competition was all about distribution shift. Let's learn how the winners conquered the challenge.
It's hard to collect paired preferences. Can we align LLMs without them? Yes, with KTO!
DPO reduces the effort required to align LLMs. Here is how I created the Reviewer #2 Bot from TinyLlama using DPO.
Chatbot Arena updated its LLM ranking method from Elo to Bradley-Terry. What changed? Let's dig into the differences.
Does your React Native app go back to an unexpected screen? Here's how to deal with it.