Overview:
- MEND is a way to edit models to change output in a way that is local, reliable, and general.
- "Local" means unrelated output is not changed. "Reliable" means the model takes the desired corrections. "General" meaning variations on similar questions which would need correction also are corrected.
- Works even on very large models.
Differences to Prior Art:
Open Questions:
- They say it's a method for transforming raw fine-tuning gradients into a targeted parameter update. That would seem to suggest a new model results. At the same time, they say that the edits are applied to the model's weights at test time, which says the base MEND models need to be kept. Which is it?
- If they're producing a layer's parameter edit as an output, why do they care about producing the low-rank embedding of the deltas? Accumulate and apply them.