Artificial Intelligence/NLP

[Transformer ์‹œ๋ฆฌ์ฆˆ] 01. Positional Encoding

geum 2022. 3. 30. 13:24

 

์‚ฌ์šฉ ์ด์œ 

- ์ž…๋ ฅ์ด RNN์ฒ˜๋Ÿผ ์ˆœ์„œ๋Œ€๋กœ ๋“ค์–ด์˜ค๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ์ด ๋ฌธ์žฅ ๋‚ด ๋‹จ์–ด์˜ ์œ„์น˜๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด Positional Encoding ์‚ฌ์šฉ

- ์ƒ์„ฑ๋œ ๊ณ ์œ ํ•œ Positional Encoding์„ ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์™€ ๋”ํ•  ๊ฒฝ์šฐ ๋ชจ๋ธ์ด ๋‹จ์–ด์˜ ์ ˆ๋Œ€ ์œ„์น˜ ํŒŒ์•… ๊ฐ€๋Šฅ

 

๋™์ž‘ ๋ฐฉ์‹

- N๋ฒˆ์งธ Positional Encoding์ด ๊ฐ ๋ฌธ์žฅ์˜ N๋ฒˆ์งธ ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ์— ๋”ํ•ด์ง„๋‹ค.

- ๋…ผ๋ฌธ ์ €์ž๋“ค์€ sin ํ•จ์ˆ˜, cos ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉ → pos: ๋ฌธ์žฅ ๋‚ด ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ ์œ„์น˜, i: ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ ๋‚ด ์œ„์น˜

 

 

- Positional Encoding์€ ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์™€ ๋”ํ•ด์ ธ์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— $d_{positional encoding}$=$d_{embedding vector}$

 

 

๐Ÿง sin ํ•จ์ˆ˜, cos ํ•จ์ˆ˜๋ฅผ ์“ฐ๋Š” ์ด์œ ๋Š”?

 

์•„์ฃผ ๊ดœ์ฐฎ์€ ๊ธ€์„ ํ•˜๋‚˜ ๋ฐœ๊ฒฌํ•ด์„œ ๋”ฐ๋กœ ์ •๋ฆฌ ํ›„ ๋งํฌ ์ถ”๊ฐ€ ์˜ˆ์ •

 

 

๐Ÿง sin ํ•จ์ˆ˜, cos ํ•จ์ˆ˜๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๋Š”?

 

sin ํ•จ์ˆ˜๋ฅผ ํ‰ํ–‰์ด๋™์‹œํ‚ค๋ฉด cos ํ•จ์ˆ˜๊ฐ€ ๋‚˜์˜ค๊ณ  ๋‹จ์–ด ์œ„์น˜ ๊ฐ„ ์ƒ๋Œ€์ ์ธ ๊ด€๊ณ„๊ฐ€ ๋‚ด๋ถ€์ ์œผ๋กœ๋Š” ํ–‰๋ ฌ์˜ ํšŒ์ „๋ณ€ํ™˜์— ์˜ํ•ด ์ƒํ˜ธ ๋ณ€ํ™˜๋  ์ˆ˜ ์žˆ๋Š”๋ฐ, ํ–‰๋ ฌ์˜ ํšŒ์ „๋ณ€ํ™˜์€ sin ํ•จ์ˆ˜์™€ cos ํ•จ์ˆ˜๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค.

 

 

๐Ÿง Word embedding, Positional encoding์„ ๊ณฑํ•˜์ง€ ์•Š๊ณ  ๋”ํ•˜๋Š” ์ด์œ ๋Š”?

 

Positional encoding ๋ฒกํ„ฐ์—์„œ 0์ธ ๋ถ€๋ถ„์ด ์žˆ๋‹ค๋ฉด ๊ณฑ์…‰์„ ํ•  ๊ฒฝ์šฐ word embedding์˜ ์ •๋ณด๊ฐ€ ์‚ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๋‹ค. ๋ง์…ˆ์„ ํ•˜๋ฉด word embedding์˜ ์ •๋ณด๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ์œ„์น˜ ์ •๋ณด๊นŒ์ง€ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

์ฝ”๋“œ ๊ตฌํ˜„

class PositionalEncoding(nn.Module):
    def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 5000):
        super().__init__()
        self.dropout = nn.Dropout(p=dropout)

        position = torch.arange(max_len).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
        pe = torch.zeros(max_len, 1, d_model)
        pe[:, 0, 0::2] = torch.sin(position * div_term)
        pe[:, 0, 1::2] = torch.cos(position * div_term)
        self.register_buffer('pe', pe)

    def forward(self, x: Tensor) -> Tensor:
        """
        Args:
            x: Tensor, shape [seq_len, batch_size, embedding_dim]
        """
        x = x + self.pe[:x.size(0)]
        return self.dropout(x)

 

โญ ๊ฐœ์ธ์ ์ธ ์˜๊ฒฌ์œผ๋กœ๋Š” ํŒŒ์ดํ† ์น˜ ํŠœํ† ๋ฆฌ์–ผ๋ณด๋‹ค https://github.com/hyunwoongko/transformer์— ์˜ฌ๋ผ์™€ ์žˆ๋Š” ์ฝ”๋“œ๊ฐ€ ๋” ์ง๊ด€์ ์ด๊ณ  ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์šด ๊ฒƒ ๊ฐ™๋‹ค. ํŒŒ์ดํ† ์น˜ ํŠœํ† ๋ฆฌ์–ผ ์ฝ”๋“œ์™€ ๋น„๊ตํ•ด๋ณด๋ฉด์„œ ๋ถ„์„ํ•ด๋ด์•ผ์ง€ 

 

์ฐธ๊ณ  ์ž๋ฃŒ

ํ•ธ์ฆˆ์˜จ ๋จธ์‹ ๋Ÿฌ๋‹

https://yngie-c.github.io/nlp/2020/07/01/nlp_transformer/

https://www.youtube.com/watch?v=1biZfFLPRSY&ab_channel=AICoffeeBreakwithLetitia

http://www.aitimes.kr/news/articleView.html?idxno=17442 

https://tutorials.pytorch.kr/beginner/transformer_tutorial.html