Relative positional embedding for any attention mechanism

Ivan Ukhov
Jan 17, 2024

In Shaw et al. (2018), the authors introduce relative positional embedding for self-attention in transformer models, and in Huang et al. (2018), the authors present a memory efficient approach to calculating this embedding in decoder blocks, in which the self-attention is causal. In this article, the approach is generalized to any attention mechanism, should it be self or cross or full or causal.

On to the article!

Originally published at https://blog.ivanukhov.com on January 17, 2024.

--

--