Artificial Intelligence 38

[NLP ๊ธฐ์ดˆ] ํ† ํฐํ™”(Tokenization, ํ† ํฌ๋‚˜์ด์ง•)

๊ฐœ๋… ํ…์ŠคํŠธ์— ๋Œ€ํ•ด ํŠน์ • ๊ธฐ์ค€ ๋‹จ์œ„๋กœ ๋ฌธ์žฅ์„ ๋‚˜๋ˆ„๋Š” ๊ณผ์ •์œผ๋กœ ํ† ํฐํ™”, ํ† ํฌ๋‚˜์ด์ง• ๋“ฑ ๋ถˆ๋ฆฌ๋Š” ์ด๋ฆ„์ด ๋‹ค์–‘ํ•˜๋‹ค. ํ† ํฐ์€ ๋ฌธ์žฅ์ด ๋ ์ˆ˜๋„ ์žˆ๊ณ  ๋‹จ์–ด๊ฐ€ ๋ ์ˆ˜๋„ ์žˆ์œผ๋ฉฐ ๋ณดํ†ต์€ ์œ ์˜๋ฏธํ•œ ๋‹จ์œ„๊ฐ€ ํ† ํฐ์œผ๋กœ ์ •์˜๋œ๋‹ค. ์˜ˆ์‹œ โ—ฝ ๋ฌธ๋‹จ ์ด์œ  ๋ชจ๋ฅผ ๊ฐ์ •์˜ ํ’์š”, ๊ทธ๋Š” ๋๋‚ด ๋งˆ์นจํ‘œ๋ฅผ ์ฑ„์›Œ ๋„ฃ์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ทธ๋Š” ํ™€๋กœ์ด ๋ชป๋‹ค ํ•œ ์ด์•ผ๊ธฐ๋“ค์„ ๋น„์šด๋‹ค. ํ•˜์ง€๋งŒ ๊ทธ์˜ ์ด์•ผ๊ธฐ์˜ ์ฃผ์ธ๊ณต์€ ์—ฌ์ „ํžˆ ๊ทธ๋…€์ด๋‹ค. ๋‚˜๋Š” ์ด๊ฑธ ๋‚ญ๋งŒ์ด๋ผ๊ณ  ๋ถ€๋ฅด๊ธฐ๋กœ ํ•˜์˜€๋‹ค. ๋น…๋‚˜ํ‹ฐ-๋‚ญ๋งŒ์ด๋ผ๊ณ  ๋ถ€๋ฅด๊ธฐ๋กœ ํ•˜์˜€๋‹ค(Narr. ๊น€๊ธฐํ˜„) โ—ฝ ๋ฌธ์žฅ ๋‹จ์œ„ ํ† ํฐํ™” ๋ฌธ์žฅ ๋‹จ์œ„๋กœ ํ† ํฐํ™”๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค๋ฉด ์˜จ์ (.)์„ ๊ธฐ์ค€์œผ๋กœ ์ง„ํ–‰๋˜๊ธฐ ๋•Œ๋ฌธ์— ์ด 4๊ฐœ์˜ ๋ฌธ์žฅ์œผ๋กœ ๋‚˜๋‰˜๊ฒŒ ๋œ๋‹ค. โ—ฝ ๋‹จ์–ด ๋‹จ์œ„ ํ† ํฐํ™” Python split()์ฒ˜๋Ÿผ ๋ฌธ์žฅ๋ถ€ํ˜ธ๋ฅผ ํฌํ•จํ•˜์ง€ ์•Š๊ณ  ๊ณต๋ฐฑ์„ ๊ธฐ์ค€์œผ๋กœ ํ† ํฐํ™”ํ•˜๊ฑฐ๋‚˜, ๋ฌธ์žฅ๋ถ€ํ˜ธ๋„ ํ•˜๋‚˜์˜..

[NLP ๊ธฐ์ดˆ] BoW(Bag of Words)

๊ฐœ๋… ๋ฌธ์žฅ์„ ์ด๋ฃจ๊ณ  ์žˆ๋Š” ๋‹จ์–ด์˜ ๋“ฑ์žฅ ํšŸ์ˆ˜๋ฅผ ์นด์šดํŠธํ•˜๊ณ  ๊ทธ ๊ฐ’์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฌธ์„œ๋ฅผ ๋ฒกํ„ฐํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ• ์˜ˆ์‹œ BoW ๋ชจ๋ธ์€ ๋‹จ์–ด ์‚ฌ์ „์„ ์ฐธ๊ณ ํ•˜์—ฌ ๋ฒกํ„ฐํ™”๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค. ์•„๋ž˜์ฒ˜๋Ÿผ 4๊ฐœ์˜ ๋ฌธ์žฅ์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฌธ์„œ๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ  ์ด ๋ฌธ์„œ๋ฅผ BoW ๋ชจ๋ธ๋กœ ํ‘œํ˜„ํ•ด๋ณด๊ธฐ๋กœ ํ•œ๋‹ค. โ—ฝ ๋ฌธ์„œ : ["It was the best of times", "It was the worst of times", "It was the age of wisdom", "It was the age of foolishness"] โ—ฝ ๋ฌธ์„œ์—์„œ ์ƒ์„ฑํ•œ ๋‹จ์–ด ์‚ฌ์ „ : ['It', 'was', 'the', 'best', 'of', 'times', 'worst', 'age', 'wisdom', 'foolishness'] โ—ฝ ์ฒซ ๋ฒˆ์งธ ๋ฌธ์žฅ ๋ฒกํ„ฐ ํ‘œํ˜„ ๊ฒฐ๊ณผ(๋‚˜๋จธ์ง€ ..

[์ •๋ฆฌ] Numpy โ‘ก : squeeze

b = np.array(range(1, 13, 2)).reshape(2, 3, 1) # ์˜ˆ์ƒ 2ํ–‰ 3์—ด ์ด๋ฒˆ์—๋Š” ๋‹ค์ฐจ์› ๋ฐฐ์—ด๋กœ ์‹ค์Šต์„ ์ง„ํ–‰ํ•ด๋ดค๋Š”๋ฐ ๋ฐฐ์—ด ์ƒ์„ฑ ๊ฒฐ๊ณผ ์ดํ•ดํ•˜๋Š” ๋ฐ์— ํ•œ์ฐธ ๊ฑธ๋ ธ๋‹ค. reshape ์ธ์ž ์ˆœ์„œ๋Œ€๋กœ (ํ–‰, ์—ด, ์ฐจ์›) ์ธ์ค„ ์•Œ์•˜๋Š”๋ฐ? ๊ทธ๊ฒƒ์ด? ์•„๋‹ˆ์—ˆ์Šต๋‹ˆ๋‹ค! numpy array๋Š” ๋‹ค๋ฅธ ์–ธ์–ด์—์„œ์˜ ๋ฐฐ์—ด์ฒ˜๋Ÿผ ์š”์†Œ ๊ฐ„ ์ฝค๋งˆ๊ฐ€ ์•ˆ ์ฐํžˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ฒฐ๊ณผ ๋ณด์ž๋งˆ์ž ๋ฐ”๋กœ ์™€๋‹ฟ์ง€๊ฐ€ ์•Š์•˜๋‹ค. ์ผ๋‹จ reshape๋Š” (์ฐจ์›, ํ–‰, ์—ด) ํฌ๊ธฐ์˜ ๋‹ค์ฐจ์› ๋ฐฐ์—ด์„ ๋งŒ๋“ ๋‹ค. → reshape(2, 3, 1)์€ 3ํ–‰ 1์—ด์˜ ๋ฐฐ์—ด์„ 2๊ฐœ ์Œ“์•˜๋‹ค๋Š” ๋œป squeeze # axis default: None, ์›ํ•˜๋Š” ์ถ• ์ง€์ • ๊ฐ€๋Šฅ b_squeeze = b.squeeze() ๋ฐฐ์—ด์—์„œ ๊ธธ์ด๊ฐ€ 1์ธ ์ถ•์„ ์ œ๊ฑฐํ•œ๋‹ค. (2, 3, ..

[์ •๋ฆฌ] Numpy โ‘  : shape, ndim, axis

a = np.array([0, 1, 2, 3, 4, 5]) ์‹ค์Šต์„ ์œ„ํ•œ ์ž„์˜์˜ ๋ฐฐ์—ด์„ ์ƒ์„ฑํ•˜๊ณ  ์ด ๋ฐฐ์—ด๋กœ ์ด๊ฒƒ์ €๊ฒƒ ํ•ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค. shape/ndim/size print("shape: ", a.shape) # ์˜ˆ์ƒ (6, 1) print("ndim: ", a.ndim) # ์˜ˆ์ƒ 2 print("size: ", a.size) # ์˜ˆ์ƒ 6 โ—ป shape : (ํ–‰, ์—ด)์„ ๋’ค์ง‘์€ ํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋ƒ„ → (6, 1) = (1ํ–‰, 6์—ด) โ—ป ndim : ๋ฐฐ์—ด ์ฐจ์› โ—ป size : ๋ฐฐ์—ด์˜ ์›์†Œ ๊ฐœ์ˆ˜ ์—ด์ด ํ•˜๋‚˜์ผ ๊ฒฝ์šฐ๋Š” shape์—์„œ 1์ด ์ฐํžˆ์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์ด ํŠน์ง•! ์•„์ง๋„ ์ฐจ์› ๊ฐœ๋…์ด ๋„ˆ๋ฌด ์–ด๋ ต๋‹ค ใ… ใ…  ํ…์„œ ์ฐจ์›์ด๋ž‘ ๊ฐ™๊ฒŒ ์ƒ๊ฐํ•ด์„œ ์Šค์นผ๋ผ๊ฐ€ 1์ฐจ์›, ๋ฐฐ์—ด์ด๋‹ˆ๊นŒ 2์ฐจ์›์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๋Š”๋ฐ 1์ฐจ์›์ด์—ˆ์Œ. ๊ทธ๋ƒฅ ํŒŒ์ด์ฌ ์ƒ์˜ ๋ฐฐ์—ด ..

[Transformer ์‹œ๋ฆฌ์ฆˆ] 01. Positional Encoding

์‚ฌ์šฉ ์ด์œ  - ์ž…๋ ฅ์ด RNN์ฒ˜๋Ÿผ ์ˆœ์„œ๋Œ€๋กœ ๋“ค์–ด์˜ค๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ์ด ๋ฌธ์žฅ ๋‚ด ๋‹จ์–ด์˜ ์œ„์น˜๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด Positional Encoding ์‚ฌ์šฉ - ์ƒ์„ฑ๋œ ๊ณ ์œ ํ•œ Positional Encoding์„ ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์™€ ๋”ํ•  ๊ฒฝ์šฐ ๋ชจ๋ธ์ด ๋‹จ์–ด์˜ ์ ˆ๋Œ€ ์œ„์น˜ ํŒŒ์•… ๊ฐ€๋Šฅ ๋™์ž‘ ๋ฐฉ์‹ - N๋ฒˆ์งธ Positional Encoding์ด ๊ฐ ๋ฌธ์žฅ์˜ N๋ฒˆ์งธ ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ์— ๋”ํ•ด์ง„๋‹ค. - ๋…ผ๋ฌธ ์ €์ž๋“ค์€ sin ํ•จ์ˆ˜, cos ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉ → pos: ๋ฌธ์žฅ ๋‚ด ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ ์œ„์น˜, i: ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ ๋‚ด ์œ„์น˜ - Positional Encoding์€ ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์™€ ๋”ํ•ด์ ธ์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— $d_{positional encoding}$=$d_{embedding vector}$ ๐Ÿง sin ํ•จ์ˆ˜, cos ํ•จ..

Sequence to Sequence Learning with Neural Networks

Transformer๋ฅผ ์ œ๋Œ€๋กœ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ๋ด์•ผ ํ•  ๋…ผ๋ฌธ๊ณผ ๊ฐœ๋…๋“ค์ด ๊ต‰์žฅํžˆ ๋งŽ๋‹ค. ์ฐจ๊ทผ์ฐจ๊ทผ ๋ณด๊ณ  Transformer๋„ ๋‹ค์‹œ ๋ณผ ๊ณ„ํš์ด๋‹ค. ๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” ! ์›๋ฌธ : https://arxiv.org/pdf/1409.3215.pdf Abstract - DNN์€ speech recognition๊ณผ ๊ฐ™์€ ์–ด๋ ค์šด ํ•™์Šต ํƒœ์Šคํฌ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๊ณผ๋ฅผ ๋‹ฌ์„ฑํ•œ ๋ชจ๋ธ์ด์ง€๋งŒ ๊ณ ์ • ์ฐจ์›์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ž…์ถœ๋ ฅ ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅธ ์‹œํ€€์Šค(๋ฌธ์žฅ)๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฌธ์ œ์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š์•˜๋‹ค. - ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์ธต LSTM์„ ์ธ์ฝ”๋”-๋””์ฝ”๋”๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ ์‹œํ€€์Šค ์˜๋ฏธ์— ๋Œ€์‘ํ•˜๋Š” ๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. - ์ž…๋ ฅ ์‹œํ€€์Šค ๋‹จ์–ด ์ˆœ์„œ๋ฅผ ๋ฐ˜๋Œ€๋กœ ํ•  ๊ฒฝ์šฐ(..

[์ •๋ฆฌ] train_test_split์„ ์ด์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹ ๋ถ„ํ• 

Bagging ์‹ค์Šตํ•˜๋‹ค๊ฐ€ ๋ฐ์ดํ„ฐ์…‹ ๋ถ„ํ•  ์ˆœ์„œ ๋•Œ๋ฌธ์— ์—๋Ÿฌ ๋ฉ”์‹œ์ง€๋ฅผ ๋งŒ๋‚œ ์ ์ด ์žˆ๋Š”๋ฐ(๋ฌด๋ ค ๋‘ ๋‹ฌ ์ „) ์ด์ œ์„œ์•ผ ์ •๋ฆฌํ•œ๋‹ค. from sklearn.model_selection import train_test_split from sklearn.datasets import load_breast_cancer import numpy as np ๋ฐ์ดํ„ฐ๋Š” ๋‘ ๋‹ฌ ์ „์— ์ผ๋˜ ๊ฑฐ ๊ทธ๋Œ€๋กœ ๋ถˆ๋Ÿฌ์™”๊ณ  ํ•„์š”ํ•œ ๋ชจ๋“ˆ๋งŒ importํ•ด์คฌ๋‹ค. ์œ„์Šค์ฝ˜์‹  ์œ ๋ฐฉ์•” ์ง„๋‹จ ๋ฐ์ดํ„ฐ์…‹์—๋Š” ์ด 569๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ๋‹ค. X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target) train_test_split์„ ์จ์„œ ๋ฐ์ดํ„ฐ์…‹์„ ๋‚˜๋ˆ„๋Š”๋ฐ, ์ˆœ์„œ์˜ ์ค‘์š”..

[๊ฐœ๋…] ํฌ์†Œ ํ‘œํ˜„ / ๋ฐ€์ง‘ ํ‘œํ˜„

ํฌ์†Œ ํ‘œํ˜„ | sparse representation - ๋ฌธ์žฅ์„ ๋ฒกํ„ฐ๋กœ ๋‚˜ํƒ€๋‚ผ ๋•Œ ๋Œ€๋ถ€๋ถ„์˜ ๊ฐ’์ด 0์ธ ํฌ์†Œํ–‰๋ ฌ ๊ฐœ๋… ์ด์šฉ → ํ‘œํ˜„ํ•˜๊ณ ์ž ํ•˜๋Š” ๋‹จ์–ด์˜ ์ธ๋ฑ์Šค๋Š” 1, ๋‚˜๋จธ์ง€ ์ธ๋ฑ์Šค๋Š” 0์œผ๋กœ ์„ค์ • - ๋‹จ์–ด์˜ ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚˜๋ฉด ์ฐจ์›๋„ ํ•จ๊ป˜ ์ปค์ง€๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค. ์˜ˆ) ์™ผ์ชฝ์€ ํ‘œํ˜„ํ•˜๊ณ  ์‹ถ์€ ๋‹จ์–ด๊ฐ€ 3๊ฐœ์ด๊ธฐ ๋•Œ๋ฌธ์— 3์ฐจ์›์ด์ง€๋งŒ, ์˜ค๋ฅธ์ชฝ์€ 100๊ฐœ๊ฐ€ ๋„˜๊ธฐ ๋•Œ๋ฌธ์— 100์ฐจ์›์„ ๋„˜๊ฒŒ ๋˜์–ด ๊ธด ๋ฌธ์žฅ์„ ๋ฒกํ„ฐ๋กœ ๋‚˜ํƒ€๋‚ด์•ผ ํ•  ๋•Œ๋Š” ํฌ์†Œ ํ‘œํ˜„์ด ๋น„ํšจ์œจ์ ์ด๋‹ค. โญ ์›์†Œ ๊ฐœ์ˆ˜๊ฐ€ ์ฐจ์›์ธ๊ฐ€? ์— ๋Œ€ํ•œ ์˜๋ฌธ์€ ์ด ๊ณณ์„ ์ฐธ๊ณ ํ•˜๋ฉด ๋„์›€์ด ๋  ๋“ฏ ํ•˜๋‹ค. (์‚ฌ์‹ค ๋‚ด๊ฐ€ ์ฐจ์› ๊ฐœ๋…์„ ์™„์ „ํžˆ ์ •๋ฆฝํ•˜์ง€ ๋ชปํ•จ) ๋ฐ€์ง‘ ํ‘œํ˜„ | dense representation - ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜์™€ ์ƒ๊ด€์—†์ด ์‚ฌ์šฉ์ž๊ฐ€ ์ฐจ์› ๊ฐ’์„ ์„ค์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ฐจ์› ์ถ•์†Œ์˜ ์žฅ์ ์ด ์žˆ๋‹ค. - ํŠน..

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

์—ฐ๊ตฌ์‹ค์—์„œ AI๋ณด์•ˆ ์ชฝ ๊ณต๋ถ€ํ•  ๋•Œ ๊ณต๊ฒฉ์— ๋Œ€ํ•ด์„œ๋งŒ ๊ณต๋ถ€ํ–ˆ๋˜ ๊ฑฐ ๊ฐ™์•„์„œ ๋ฐฉ์–ด ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ๊ถ๊ธˆํ•ด์กŒ๋‹ค. ์ด๋ฒˆ ์ฃผ ๋…ผ๋ฌธ์œผ๋กœ ๋‹น์ฒจ ๐Ÿ‘Š ๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” ! ์›๋ฌธ : https://arxiv.org/pdf/1704.01155.pdf Abstract ์ด์ „ ์—ฐ๊ตฌ๋“ค์€ adversarial example์„ ๋ฐฉ์–ดํ•˜๊ธฐ ์œ„ํ•ด DNN(Deep Neural Network) ๋ชจ๋ธ์„ ๊ฐœ์„ (๋ชจ๋ธ ์ž์ฒด๋ฅผ ์ˆ˜์ •ํ•ด์•ผ ํ•จ)ํ•˜๋Š” ๊ฒƒ์— ์ดˆ์ ์„ ๋งž์ท„์ง€๋งŒ ์„ฑ๊ณต์ด ์ œํ•œ์ ์ด๊ณ  ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋†’๋‹ค๋Š” ๋‹จ์  ์กด์žฌ → adversarial examples๋ฅผ ํƒ์ง€ํ•จ์œผ๋กœ์จ DNN ๋ชจ๋ธ์„ ๊ฐ•ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” Feature Squeezing ๋ฐฉ์‹ ์ œ์‹œ Introduction - ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ advers..

Attention Is All You Need

์ด๋ฒˆ์ฃผ๋ถ€ํ„ฐ ํ•œ ์ฃผ์— ํ•˜๋‚˜์˜ ๋…ผ๋ฌธ์„ ์ฝ์–ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค. ๋‚˜ ์ž˜ํ•  ์ˆ˜ ์žˆ๊ฒ ์ง€ ? ^_^ ๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” ! ์›๋ฌธ : https://arxiv.org/pdf/1706.03762.pdf Abstract dominantํ•œ sequence transduction ๋ชจ๋ธ๋“ค์€ ๋ณต์žกํ•œ RNN/CNN ๊ตฌ์กฐ → Attention ๋งค์ปค๋‹ˆ์ฆ˜๋งŒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ์ƒˆ๋กญ๊ณ  ๊ฐ„๋‹จํ•œ ๊ตฌ์กฐ์˜ Transformer ์ œ์•ˆ 2022. 3. 4 ์ถ”๊ฐ€ Transformer ์š”์•ฝ : ํ•™์Šต๊ณผ ๋ณ‘๋ ฌํ™”๊ฐ€ ์‰ฝ๊ณ  attention ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์†๋„๋ฅผ ๋†’์ธ ๋ชจ๋ธ Introduction Attention ๋งค์ปค๋‹ˆ์ฆ˜์€ ์ž…๋ ฅ, ์ถœ๋ ฅ ๊ฐ„ ๊ฑฐ๋ฆฌ์— ์ƒ๊ด€์—†์ด modeling์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค๋Š” ์ ์—..