TransformerFAM: Feedback attention is working memory (Apr 2024)

109 subscribers

11 views

About
Share

Published On Oct 8, 2024

Title: "TransformerFAM: Feedback attention is working memory"
Link: https://arxiv.org/abs/2404.09173
Date: 14 Apr 2024
Authors: Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

Summary

This paper proposes a novel Transformer architecture, called TransformerFAM, which addresses the challenge of processing infinitely long sequences by introducing a feedback loop that functions as working memory. The authors argue that by allowing attention to operate on both homogeneous sequence data and latent representations through a feedback loop, the model can effectively retain and propagate information over long contexts. The paper compares TransformerFAM with existing approaches, such as Block Sliding Window Attention (BSWA) and TransformerXL, through various experiments on long-context tasks and GPT-3 benchmark tasks. These experiments highlight the ability of TransformerFAM to significantly improve performance on long-context tasks across various model sizes (1B, 8B, and 24B). The paper concludes that TransformerFAM offers a promising solution for addressing the limited memory capabilities of current Large Language Models.

Key Topics:

Long-context tasks, Feedback attention, TransformerFAM, Working memory, Attention mechanisms

Published On Oct 8, 2024

Share/Embed

Video Link