Relative Positional encoding RQ 발표

2024. 10. 5. 20:04AI/LLM

Relative positional encoding uses a word’s relative location in sentence rather than its absolute location.

In the first picture, RPE bias which is based on the relative distance between token i and j were added to Key and Value.

Clipping is conducted after certain distance k for the computing efficiency

Second picture shows an example of RPE Bias table of length 512.

Each token’s bias itself is 0 and bias increases or decreases with the distance.

 

There were several approaches to implement RPE.

First line is the result of expanding absolute positional encoding.

X is content vector and p is positional encoding bias

Some approaches changed absolute bias to relative bias and some made bias itself trainable.



RPE can preserve more context information because it uses relative locational information between tokens.

It is also resilient to take an variable length of input.

But RPE may requires more computing resource than APE like when using trainable bias.

 

 

전체 Positional encoding에 관한 슬라이드 주소: https://docs.google.com/presentation/d/e/2PACX-1vSfM_sKpoMFZxixOwsDgd68HCUtjGPuhiaVnsh8xkk-527gTjin2E-sHBEglRw-qVOPoMxdFsRlGEMQ/pub?start=false&loop=false&delayms=3000