dm.cs.tu-dortmund.de/mlbits/neural-nlp-positional-encoding/
Positional Encoding – Lecture Notes
can be achieved by adding a distance bias.
\[ \operatorname{softmax}(QK^T + m · [-(i-1), ..., -2, -1, 0]) \]
However, it appears that this only works well for a short context, not for very long context sizes …