This is a quick summary of an interesting paper I read today. Supervising strong learners by amplifying weak experts.
Iterated Distillation and Amplification (IDA) is a proposed scheme for training machine learning systems that can be robustly aligned with complex human values. The approach draws inspiration from AlphaGoZero's training methodology and is notably similar to expert iteration.
The core concept involves two key processes: amplification and distillation. In the amplification phase, a learned model serves as a subroutine in a more powerful decision-making process, similar to how AlphaGoZero uses Monte Carlo Tree Search (MCTS) to improve upon its policy network's choices. The distillation phase then involves training the model to directly predict the results of this amplified process, effectively compressing the improved capabilities into a faster system.
IDA aims to address AI safety problems by creating a powerful AI that never intentionally optimizes for something harmful to humans and remains correctable after deployment. Rather than proposing a specific implementation, it presents a design framework where capabilities are safely scaled up through iteration: a safe but slow method of amplification is distilled into a faster but slightly weaker AI, which can then be amplified again, with this process repeating until a sufficiently capable system is developed.
A key aspect of IDA is its use of an "overseer" (typically a human) who guides the process. The goal is to produce an agent that does what the overseer would want it to do, with the definition of "what the overseer would want" being determined through repeated application of the amplification procedure.
Comments
Post a Comment