๐Ÿ‘ฉ‍๐Ÿ’ป

[์ฝ”๋“œ ๋ฆฌ๋ทฐ] ๋…ธ๋…„์ธต ๋Œ€ํ™” ๊ฐ์„ฑ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ๊ตฌํ˜„ (3): Transformer โ‘ 

geum 2022. 12. 27. 18:54

๊ฐ์„ฑ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ๊ตฌํ˜„ ์‹œ๋ฆฌ์ฆˆ (1) | CNN

๊ฐ์„ฑ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ๊ตฌํ˜„ ์‹œ๋ฆฌ์ฆˆ (2) | RNN


Transformer ๋ถ„๋ฅ˜ ๋ชจ๋ธ์€ ๋‹จ์ผ ํŒŒ์ผ์ด ์•„๋‹ˆ๋ผ์„œ ํ•˜๋‚˜์”ฉ ๋ถ„์„ํ•˜๋ฉด ๊ธ€์ด 3๊ฐœ๋‚˜ 4๊ฐœ ์ •๋„ ๋‚˜์˜ฌ ๊ฒƒ ๊ฐ™๋‹ค.

 

๐Ÿ‘ฉ‍๐Ÿซ ๋ชจ๋ธ ํด๋ž˜์Šค

import torch
import torch.nn as nn
import torch.nn.functional as F

from copy import deepcopy

from .encoder import Encoder, EncoderLayer
from .sublayers import *

attn = MultiHeadAttention(8, 152)
ff = PositionwiseFeedForward(152, 1024, 0.5)
pe = PositionalEncoding(152, 0.5)

class Transformer(nn.Module):
    def __init__(self, vocab_size, d_model, n_layer, num_class):
        super(Transformer, self).__init__()

        self.encoder = Encoder(EncoderLayer(d_model, deepcopy(attn), deepcopy(ff)), n_layer)
        self.src_embed = nn.Sequential(Embeddings(d_model, vocab_size), deepcopy(pe))
        self.linear = nn.Linear(d_model, num_class)

    def forward(self, x):
        x = self.src_embed(x)
        x = self.encoder(x)
        x = x[:, -1, :]
        x = self.linear(x)
        
        logits = F.softmax(x, dim=-1)

        return logits

 

์ฒ˜์Œ์—๋Š” attn, ff, pe ๋ณ€์ˆ˜๋„ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋„˜๊ธฐ๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ๋ญ”๊ฐ€ ์ž˜ ์•ˆ๋ผ์„œ ํด๋ž˜์Šค๋ž‘ ํ•œ ํŒŒ์ผ์— ๊ฐ™์ด ๋’€๋‹ค. ์ข‹์€ ๋ฐฉ๋ฒ•์€ ์•„๋‹ˆ๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค.

 

๐ŸŽฏ ํŒŒ๋ผ๋ฏธํ„ฐ

โ—ฝ d_model: ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ ์ฐจ์›

โ—ฝ n_layer: ์ธ์ฝ”๋” ๋ ˆ์ด์–ด ์ˆ˜(์›๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” 6๊ฐœ์˜ ์ธ์ฝ”๋” ๋ ˆ์ด์–ด๋ฅผ ์ด์–ด ๋ถ™์—ฌ์„œ ํ•˜๋‚˜์˜ ์ธ์ฝ”๋”๋กœ ์‚ฌ์šฉ)

 

โณ ์ž‘๋™ ๋ฐฉ์‹

1. __init__

1) Encoder(EncoderLayer(d_model, deepcopy(attn), deepcopy(ff)), n_layer)

โ€ป ์ด ํŒŒํŠธ๋Š” Encoder ํด๋ž˜์Šค, EncoderLayer ํด๋ž˜์Šค์— ๋Œ€ํ•œ ๊ธ€์ด ์ž‘์„ฑ๋˜๋ฉด ๋งํฌ๋ฅผ ์ถ”๊ฐ€ํ•ด๋†“์„ ์˜ˆ์ •์ด๋‹ค.

โ—ฝ ๋งค์šฐ ๊ฐ„๋‹จํ•˜๊ฒŒ ์„ค๋ช…ํ•˜๋ฉด (d_model, deepcopy(attn), deepcopy(ff))๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๋Š” EncoderLayer๋ฅผ n_layer๊ฐœ ์‚ฌ์šฉํ–ˆ๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค.

 

2) nn.Sequential(Embeddings(d_model, vocab_size)), deepcopy(pe))

โ—ฝ ์ž„๋ฒ ๋”ฉ์ธต๊ณผ positional encoding ์ •๋ณด๋ฅผ ํ•จ๊ป˜ ์ธ์ฝ”๋”์˜ ์ž…๋ ฅ์œผ๋กœ ๋„ฃ์–ด์ฃผ๊ธฐ ์œ„ํ•ด nn.Sequential๋กœ ์—ฐ๊ฒฐํ•œ๋‹ค. Positional encoding ์ธต ์ฐจ์›์€ d_model๊ณผ ๋”ํ•ด์ ธ์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— d_model ์ฐจ์›๊ณผ ๋™์ผํ•˜๋‹ค.

 

2. forward

1) x = self.src_embed(x)

โ—ฝ torch.Size([16(๋ฐฐ์น˜ ํฌ๊ธฐ), 152(๋ฌธ์žฅ ์ตœ๋Œ€ ๊ธธ์ด)])๋ฅผ ๊ฐ–๋Š” ์ž…๋ ฅ x๋Š” ์ž„๋ฒ ๋”ฉ ์ธต์„ ๊ฑฐ์ณ torch.Size([16, 152, 152])๋กœ ์ฐจ์›์ด ๋ฐ”๋€๋‹ค. 

 

2) x = x[:, -1, :]

โ—ฝ x์˜ ์ฐจ์›์„ (16, 152)๋กœ ๋ณ€๊ฒฝํ•œ๋‹ค. → torch.Size([16, 152])

 

3) x = self.linear(x)

โ—ฝ linear ๋ ˆ์ด์–ด๋Š” d_model ์ฐจ์› ๋ฒกํ„ฐ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„์„œ num_class ์ˆ˜๋งŒํผ ์ถœ๋ ฅ ๋ฒกํ„ฐ๋ฅผ ๋งŒ๋“ ๋‹ค. ์ด๋ ‡๊ฒŒ ๋งŒ๋“ค์–ด์ง„ ๋ฒกํ„ฐ์— softmax ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๋ ˆ์ด๋ธ”๋ณ„ ํ™•๋ฅ ์ด ๋‚˜์˜ค๊ฒŒ ๋˜๊ณ , ๊ทธ ์ค‘ ํ™•๋ฅ ์ด ์ตœ๋Œ€์ธ ๋ ˆ์ด๋ธ”์ด ๋ชจ๋ธ์˜ ์ตœ์ข… ์˜ˆ์ธก ๊ฒฐ๊ณผ๊ฐ€ ๋œ๋‹ค.