Differential Transformer (Oct 2024)
AI Paper Podcasts AI Paper Podcasts
109 subscribers
56 views
1

 Published On Oct 8, 2024

Title: Differential Transformer
Link: https://arxiv.org/abs/2410.05258
Date: 7 Oct 2024
Authors: Tianzhu Ye, Li Dong, Yuqing Xia, Yutao Sun, Yi Zhu, Gao Huang, Furu Wei

Summary

This paper introduces the Differential Transformer, a new architecture for large language models (LLMs) that addresses the issue of attention noise, where Transformers overallocate attention to irrelevant information. The authors propose a differential attention mechanism, which uses the difference between two softmax attention maps to cancel out noise and encourage models to focus on critical information. Experimental results demonstrate that the Differential Transformer outperforms traditional Transformers in various tasks, including language modelling, long-context modelling, information retrieval, hallucination mitigation, and in-context learning. Notably, the Differential Transformer also reduces activation outliers, which can be beneficial for model quantization. The paper concludes by highlighting the promising potential of the Differential Transformer as a foundation architecture for future advancements in LLMs.

Key Topics

Differential Transform, Attention Noise, Long-context Modelling, Key Information Retrieval, Contextual Hallucination

show more

Share/Embed