Chaudhari1 traces the origins of attention models to two 1964 papers (one each by Watson and Naradaya) that explore a relevance-aware regression model:

In this model, is a weighting function that encodes the relevance of example to predict for . The modern twist is that attention models can learn the function .

Sources

Footnotes

  1. Chaudhari, et al. (2021)