Artificial Intelligence/Paper 8

Improving Language Understanding by Generative Pre-Training

๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” ! ๐Ÿ’ฌ โ—พ ๊ธฐํ˜ธ๋Š” ์›๋ฌธ ๋‚ด์šฉ์ด๋ฉฐ, โ—ฝ ๊ธฐํ˜ธ๋Š” ๊ธ€ ์ž‘์„ฑ์ž์˜ ๊ฐœ์ธ์ ์ธ ์ƒ๊ฐ์ž…๋‹ˆ๋‹ค. ์›๋ฌธ: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf Abstract โ—พ ์ž์—ฐ์–ด ์ƒ์„ฑ(NLG) ๋ถ„์•ผ์—์„œ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋Š” ์ถฉ๋ถ„ํ•˜์ง€๋งŒ ํŠน์ • ํƒœ์Šคํฌ(textual entailment, QA, semantic similarity assessment ๋“ฑ)๋ฅผ ์œ„ํ•ด ๋ ˆ์ด๋ธ” ๋œ ๋ฐ์ดํ„ฐ๋Š” ๋ถ€์กฑํ•จ โ—พ ๋ ˆ์ด๋ธ” ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•œ ์ƒํ™ฉ์€ ํ•™์Šต๋œ ๋ชจ๋ธ์ด ์ œ๋Œ€๋กœ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•˜์ง€ ๋ชปํ•˜๊ฒŒ ํ•จ โ—พ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋‹ค์–‘ํ•œ ํ…์ŠคํŠธ ์ฝ”ํผ์Šค์—..

RoBERTa: A Robustly Optimized BERT Pretraining Approach

๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” ! ๐Ÿ’ฌ โ—ฝ ๊ธฐํ˜ธ๋Š” ๊ธ€ ์ž‘์„ฑ์ž์˜ ๊ฐœ์ธ์ ์ธ ์ƒ๊ฐ์ด๋ฉฐ, โ—พ ๊ธฐํ˜ธ๋Š” ์›๋ฌธ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. ์›๋ฌธ: https://arxiv.org/pdf/1907.11692.pdf Abstract โ—พ BERT ๋ชจ๋ธ์— ๋Œ€ํ•ด ์žฌํ˜„ ์—ฐ๊ตฌ(replication study)๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉด์„œ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ, ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๊ฒฐ๊ณผ์— ์–ด๋–ค ์˜ํ–ฅ์„ ์ฃผ๋Š”์ง€ ํ™•์ธ โ—พ BERT ๋ชจ๋ธ์ด undertrained๋˜์—ˆ์œผ๋ฉฐ BERT ๋ชจ๋ธ ๋ฐœํ‘œ ์ดํ›„ ๋‚˜์˜จ ๋ชจ๋ธ๋“ค์˜ ์„ฑ๋Šฅ์„ ๋Šฅ๊ฐ€ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ ๋จ โ—พ ์ด์ „์— ๊ฐ„๊ณผ๋˜๋˜ ๋ชจ๋ธ ์„ค๊ณ„ ๋ฐฉ๋ฒ•์˜ ์ค‘์š”์„ฑ์— ๋Œ€ํ•ด ๊ฐ•์กฐ โ—ฝ RoBERTa๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ BERT ๋ชจ๋ธ์„ ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์œผ๋กœ ํ•™์Šต์‹œํ‚จ ๊ฒƒ โ—ฝ 'undertra..

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” ! ์›๋ฌธ: https://aclanthology.org/D19-1670.pdf 1 Introduction โ–ช๏ธ ๋จธ์‹ ๋Ÿฌ๋‹๊ณผ ๋”ฅ๋Ÿฌ๋‹์€ ๊ฐ์„ฑ๋ถ„์„๋ถ€ํ„ฐ ํ† ํ”ฝ ๋ถ„๋ฅ˜๊นŒ์ง€ NLP ๋ถ„์•ผ์—์„œ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ–ˆ์ง€๋งŒ, ๋†’์€ ์„ฑ๋Šฅ์€ ์ข…์ข… ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ์–‘๊ณผ ํ€„๋ฆฌํ‹ฐ์— ๋‹ฌ๋ ค ์žˆ์Œ โ–ช๏ธ ์ž๋™ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(Automatica data augmentation)์€ ์ปดํ“จํ„ฐ ๋น„์ „๊ณผ ์Œ์„ฑ ๋ถ„์•ผ์—์„œ ๋งŽ์ด ์‚ฌ์šฉ๋˜์ง€๋งŒ ์–ธ์–ด ๋ณ€ํ™˜์„ ์œ„ํ•œ ์ผ๋ฐ˜์ ์ธ ๊ทœ์น™์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์€ ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— NLP ๋ถ„์•ผ์—์„œ ์ผ๋ฐ˜์ ์ธ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์€ ์™„์ „ํžˆ ์—ฐ๊ตฌ๋œ ์ ์ด ์—†์Œ โ–ช๏ธ ๋…ผ๋ฌธ์„ ํ†ตํ•ด EDA(Easy Data Augmentation)๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ๊ฐ„๋‹จํ•œ NLP ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•..

BERT: Pre-training of Deep Bidirectional Transformers forLanguage Understanding

๋…ผ๋ฌธ ์ฝ๊ธฐ ์—„์ฒญ ์˜ค๋žœ๋งŒ์ด๋‹ค. BERT ๊ธฐ๋ฐ˜ ์‚ฌ์ „ํ•™์Šต๋ชจ๋ธ ์จ๋ณด๋ ค๊ณ  ํ•˜๋‹ค๊ฐ€ ๊ด€๋ จ ๊ฐœ๋…์„ ํ•˜๋‚˜๋„ ๋ชจ๋ฅด๋‹ˆ๊นŒ ๋ชจ๋ธ ์ž…๋ ฅ์— ๋ญ๊ฐ€ ๋“ค์–ด๊ฐ€๋Š”์ง€~ ๋ฐ์ดํ„ฐ ํ˜•ํƒœ๋ฅผ ์–ด๋–ป๊ฒŒ ๋งž์ถฐ์ค˜์•ผ ํ•˜๋Š”์ง€~ ๋„ˆ๋ฌด ์ดํ•ด๊ฐ€ ์•ˆ ๋˜๋Š” ๋ถ€๋ถ„์ด ๋งŽ์•„์„œ ๋…ผ๋ฌธ ๋ณธ์ธ๋“ฑํŒ์‹œํ‚ด ๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” ! ์›๋ฌธ: https://arxiv.org/pdf/1810.04805.pdf โ–  : ์•„์ง ๋ฐ”๋กœ ์ดํ•ด ์•ˆ ๋˜๋Š” ๋ถ€๋ถ„ Introduction 1. Pre-train๋œ ์–ธ์–ด ํ‘œํ˜„์„ ํ•˜์œ„ ํƒœ์Šคํฌ์— ์ ์šฉํ•˜๋Š” 2๊ฐ€์ง€ ๋ฐฉ๋ฒ• ์กด์žฌ 1) Feature-based - Pre-trained representations์„ ํฌํ•จํ•˜๋Š” task-specific ๊ตฌ์กฐ๋ฅผ ์ถ”๊ฐ€์ ์ธ feature๋กœ ์‚ฌ์šฉ - ์˜ˆ: ELMo 2) ..

Sequence to Sequence Learning with Neural Networks

Transformer๋ฅผ ์ œ๋Œ€๋กœ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ๋ด์•ผ ํ•  ๋…ผ๋ฌธ๊ณผ ๊ฐœ๋…๋“ค์ด ๊ต‰์žฅํžˆ ๋งŽ๋‹ค. ์ฐจ๊ทผ์ฐจ๊ทผ ๋ณด๊ณ  Transformer๋„ ๋‹ค์‹œ ๋ณผ ๊ณ„ํš์ด๋‹ค. ๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” ! ์›๋ฌธ : https://arxiv.org/pdf/1409.3215.pdf Abstract - DNN์€ speech recognition๊ณผ ๊ฐ™์€ ์–ด๋ ค์šด ํ•™์Šต ํƒœ์Šคํฌ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๊ณผ๋ฅผ ๋‹ฌ์„ฑํ•œ ๋ชจ๋ธ์ด์ง€๋งŒ ๊ณ ์ • ์ฐจ์›์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ž…์ถœ๋ ฅ ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅธ ์‹œํ€€์Šค(๋ฌธ์žฅ)๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฌธ์ œ์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š์•˜๋‹ค. - ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์ธต LSTM์„ ์ธ์ฝ”๋”-๋””์ฝ”๋”๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ ์‹œํ€€์Šค ์˜๋ฏธ์— ๋Œ€์‘ํ•˜๋Š” ๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. - ์ž…๋ ฅ ์‹œํ€€์Šค ๋‹จ์–ด ์ˆœ์„œ๋ฅผ ๋ฐ˜๋Œ€๋กœ ํ•  ๊ฒฝ์šฐ(..

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

์—ฐ๊ตฌ์‹ค์—์„œ AI๋ณด์•ˆ ์ชฝ ๊ณต๋ถ€ํ•  ๋•Œ ๊ณต๊ฒฉ์— ๋Œ€ํ•ด์„œ๋งŒ ๊ณต๋ถ€ํ–ˆ๋˜ ๊ฑฐ ๊ฐ™์•„์„œ ๋ฐฉ์–ด ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ๊ถ๊ธˆํ•ด์กŒ๋‹ค. ์ด๋ฒˆ ์ฃผ ๋…ผ๋ฌธ์œผ๋กœ ๋‹น์ฒจ ๐Ÿ‘Š ๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” ! ์›๋ฌธ : https://arxiv.org/pdf/1704.01155.pdf Abstract ์ด์ „ ์—ฐ๊ตฌ๋“ค์€ adversarial example์„ ๋ฐฉ์–ดํ•˜๊ธฐ ์œ„ํ•ด DNN(Deep Neural Network) ๋ชจ๋ธ์„ ๊ฐœ์„ (๋ชจ๋ธ ์ž์ฒด๋ฅผ ์ˆ˜์ •ํ•ด์•ผ ํ•จ)ํ•˜๋Š” ๊ฒƒ์— ์ดˆ์ ์„ ๋งž์ท„์ง€๋งŒ ์„ฑ๊ณต์ด ์ œํ•œ์ ์ด๊ณ  ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋†’๋‹ค๋Š” ๋‹จ์  ์กด์žฌ → adversarial examples๋ฅผ ํƒ์ง€ํ•จ์œผ๋กœ์จ DNN ๋ชจ๋ธ์„ ๊ฐ•ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” Feature Squeezing ๋ฐฉ์‹ ์ œ์‹œ Introduction - ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ advers..

Attention Is All You Need

์ด๋ฒˆ์ฃผ๋ถ€ํ„ฐ ํ•œ ์ฃผ์— ํ•˜๋‚˜์˜ ๋…ผ๋ฌธ์„ ์ฝ์–ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค. ๋‚˜ ์ž˜ํ•  ์ˆ˜ ์žˆ๊ฒ ์ง€ ? ^_^ ๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” ! ์›๋ฌธ : https://arxiv.org/pdf/1706.03762.pdf Abstract dominantํ•œ sequence transduction ๋ชจ๋ธ๋“ค์€ ๋ณต์žกํ•œ RNN/CNN ๊ตฌ์กฐ → Attention ๋งค์ปค๋‹ˆ์ฆ˜๋งŒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ์ƒˆ๋กญ๊ณ  ๊ฐ„๋‹จํ•œ ๊ตฌ์กฐ์˜ Transformer ์ œ์•ˆ 2022. 3. 4 ์ถ”๊ฐ€ Transformer ์š”์•ฝ : ํ•™์Šต๊ณผ ๋ณ‘๋ ฌํ™”๊ฐ€ ์‰ฝ๊ณ  attention ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์†๋„๋ฅผ ๋†’์ธ ๋ชจ๋ธ Introduction Attention ๋งค์ปค๋‹ˆ์ฆ˜์€ ์ž…๋ ฅ, ์ถœ๋ ฅ ๊ฐ„ ๊ฑฐ๋ฆฌ์— ์ƒ๊ด€์—†์ด modeling์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค๋Š” ์ ์—..

Adversarial Examples in the Physical World

์‚ฌ์‹ค ๋ถ„์„๋ณด๋‹ค ์ง์—ญ์— ๊ฐ€๊น์ง€๋งŒ ๋‚ด์šฉ ์ •๋ฆฌ ๋ฐ ์ง‘๋‹จ ์ง€์„ฑ์˜ ํž˜์„ ๋นŒ๋ ค ๋‚ด๊ฐ€ ์ž˜ ๋ชฐ๋ž๋˜ ๋ถ€๋ถ„์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์œผ๋กœ ์ ๋Š”๋‹ค. ๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” ! ์›๋ฌธ : https://arxiv.org/abs/1607.02533 Abstract โ—พ ์ด ๋…ผ๋ฌธ์€ ๋ฌผ๋ฆฌ์  ์„ธ๊ณ„์—์„œ๋„ ๋จธ์‹ ๋Ÿฌ๋‹ ์‹œ์Šคํ…œ์ด adversarial example์— ์ทจ์•ฝํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์ž„ Introduction โ—พ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ ์˜ค๋ถ„๋ฅ˜๋ฅผ ์ผ์œผํ‚ค๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ adversarial manipulation input์— ์ทจ์•ฝํ•˜๋ฉฐ ํŠนํžˆ ํ…Œ์ŠคํŠธ ์‹œ ๋ชจ๋ธ์ด ๋ฏธ์„ธํ•˜๊ฒŒ ๋ณ€๊ฒฝ๋œ ์ž…๋ ฅ์„ ๋ฐ›๋Š” ๊ฒƒ์— ๋Œ€ํ•ด ๋งค์šฐ ์ทจ์•ฝํ•จ โ—พ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ M, ์ž…๋ ฅ ์ƒ˜ํ”Œ C(๋ณ€๊ฒฝ๋˜์ง€ ์•Š์€ ๊นจ๋—ํ•œ ์ƒํƒœ์˜ ์ƒ˜ํ”Œ)๊ฐ€ ..