Relative positional embedding for any attention mechanism
Jan 17, 2024
In Shaw et al. (2018), the authors introduce relative positional embedding for self-attention in transformer models, and in Huang et al. (2018), the authors present a memory efficient approach to calculating this embedding in decoder blocks, in which the self-attention is causal. In this article, the approach is generalized to any attention mechanism, should it be self or cross or full or causal.
Originally published at https://blog.ivanukhov.com on January 17, 2024.