Artificial Intelligence/Paper

Adversarial Examples in the Physical World

geum 2022. 1. 27. 16:55

์‚ฌ์‹ค ๋ถ„์„๋ณด๋‹ค ์ง์—ญ์— ๊ฐ€๊น์ง€๋งŒ ๋‚ด์šฉ ์ •๋ฆฌ ๋ฐ ์ง‘๋‹จ ์ง€์„ฑ์˜ ํž˜์„ ๋นŒ๋ ค ๋‚ด๊ฐ€ ์ž˜ ๋ชฐ๋ž๋˜ ๋ถ€๋ถ„์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์œผ๋กœ ์ ๋Š”๋‹ค.

 

๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” !


์›๋ฌธ : https://arxiv.org/abs/1607.02533

 

Abstract

โ—พ ์ด ๋…ผ๋ฌธ์€ ๋ฌผ๋ฆฌ์  ์„ธ๊ณ„์—์„œ๋„ ๋จธ์‹ ๋Ÿฌ๋‹ ์‹œ์Šคํ…œ์ด adversarial example์— ์ทจ์•ฝํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์ž„

 

 

Introduction

โ—พ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ ์˜ค๋ถ„๋ฅ˜๋ฅผ ์ผ์œผํ‚ค๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ adversarial manipulation input์— ์ทจ์•ฝํ•˜๋ฉฐ ํŠนํžˆ ํ…Œ์ŠคํŠธ ์‹œ ๋ชจ๋ธ์ด ๋ฏธ์„ธํ•˜๊ฒŒ ๋ณ€๊ฒฝ๋œ ์ž…๋ ฅ์„ ๋ฐ›๋Š” ๊ฒƒ์— ๋Œ€ํ•ด ๋งค์šฐ ์ทจ์•ฝํ•จ

 

โ—พ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ M, ์ž…๋ ฅ ์ƒ˜ํ”Œ C(๋ณ€๊ฒฝ๋˜์ง€ ์•Š์€ ๊นจ๋—ํ•œ ์ƒํƒœ์˜ ์ƒ˜ํ”Œ)๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•  ๋•Œ C์™€ ๊ตฌ๋ณ„๋˜์ง€ ์•Š์ง€๋งŒ ๋ชจ๋ธ์ด ์ œ๋Œ€๋กœ ๋ถ„๋ฅ˜ํ•˜์ง€ ๋ชปํ•˜๋Š” adversarial example A ์ƒ์„ฑ ๊ฐ€๋Šฅ

โ—พ adverisarial example์˜ Transferbility property๋Š” ๊ณต๊ฒฉํ•˜๊ณ ์ž ํ•˜๋Š” ๋ชจ๋ธ์— accessํ•˜์ง€ ์•Š์•„๋„ ๊ณต๊ฒฉ์ด ๊ฐ€๋Šฅํ•จ์„ ์˜๋ฏธ

 

๐Ÿ” Transferbility property

M1 ๋ชจ๋ธ์˜ ์˜ค๋ถ„๋ฅ˜๋ฅผ ์œ ๋„ํ•˜๊ธฐ ์œ„ํ•ด ์ œ์ž‘๋œ adversarial example์ด M2 ๋ชจ๋ธ์—์„œ๋„ ์˜ค๋ถ„๋ฅ˜๋ฅผ ์ผ์œผํ‚ค๋Š” ๊ฒƒ

(**ํ•˜๋‚˜์˜ adversarial example๋กœ ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ๊ณต๊ฒฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฑธ๋กœ ์ดํ•ดํ–ˆ๋‹ค.)

 

โ—พ adversarial example์ด ์‹ค์„ธ๊ณ„์—์„œ ์นด๋ฉ”๋ผ๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋  ๊ฒฝ์šฐ ์—ฌ์ „ํžˆ ์ž˜๋ชป ๋ถ„๋ฅ˜๋  ๊ฒƒ์ธ์ง€๋Š” (๋…ผ๋ฌธ ๋‚˜์˜จ ์‹œ์  ๊ธฐ์ค€) ์ด์ „ ์—ฐ๊ตฌ๋“ค๋กœ๋ถ€ํ„ฐ ๋ฐํ˜€์ง€์ง€ ์•Š์Œ

โ–ช ๋…ผ๋ฌธ ์ €์ž๋“ค์€ ๋ฌผ๋ฆฌ์  ์„ธ๊ณ„&๋‹ค์–‘ํ•œ ์„ผ์„œ๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ธ์‹ํ•˜๋Š” ๊ธฐ๊ณ„ ํ•™์Šต ์‹œ์Šคํ…œ์— ๋Œ€ํ•ด adversarial example์„ ๋งŒ๋“ค๊ณ  adversarial attack์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€? ํ•˜๋Š” ์˜๋ฌธ ์ œ๊ธฐ 

 

โ—พ ๋ฌผ๋ฆฌ์  ์„ธ๊ณ„์—์„œ adversarial example์˜ ํŠน์„ฑ์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ์œ ์ง€๋˜๋Š”์ง€ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด pre-trained ImageNet classifier๋กœ ์‹คํ—˜ ์ง„ํ–‰ ํ›„ ํœด๋Œ€ํฐ ์นด๋ฉ”๋ผ๋กœ ์ธ์‹ํ•œ adversarial example์„ classifier์— ๋„ฃ๊ณ  ๋ถ„๋ฅ˜ ์ •ํ™•๋„ ์ธก์ •

โ–ช ์นด๋ฉ”๋ผ๋ฅผ ํ†ตํ•ด ์ธ์‹๋œ adversarial example๋„ ๋ชจ๋ธ์ด ์ž˜๋ชป ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•จ

 

 

Method of Generating Adversarial Images

Notation

โ—พ Cliping equation

 

Comparison of Methods of Generating Adversarial Examples

โ—พ adversarial example์ด ์‹ค์ œ๋กœ ์ž˜๋ชป ๋ถ„๋ฅ˜๋œ๋‹ค๊ณ  ๋ณด์žฅํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์˜ ์‹ค์ œ ๋ถ„๋ฅ˜ ์ •ํ™•๋„์™€ ๊ฐ ๋ฐฉ๋ฒ•(fast method, basic iterative, iterative least-likely class method)์œผ๋กœ ์ƒ์„ฑ๋œ perturbation ์œ ํ˜•์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ์‹คํ—˜์  ๋น„๊ต ์ˆ˜ํ–‰

โ–ช pre-trained Inception v3 classifier, ImageNet ๋ฐ์ดํ„ฐ์…‹์˜ validation sample 5๋งŒ ๊ฐœ ์‚ฌ์šฉ

 

 

Photos of Adversarial Examples

Destruction Rate of Adversarial Images

โ—พ ์ž„์˜์˜ ๋ณ€ํ™˜์ด adversarial image์— ์–ด๋–ค ์˜ํ–ฅ์„ ์ฃผ๋Š”์ง€ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด destruction rate ๊ฐœ๋… ๋„์ž…

 

๐Ÿ” Destruction rate

๋ณ€ํ™˜ ํ›„ ๋” ์ด์ƒ ์ž˜๋ชป ๋ถ„๋ฅ˜๋˜์ง€ ์•Š๋Š” adversarial image์˜ fraction

 

์ €์ž๋“ค์ด ์ •์˜ํ•œ destruction rate

 

Experimental Setup

โ—พ Clean image, adversarial image๋ฅผ ์ธ์‡„ํ•˜๊ณ  ์ธ์‡„๋œ ํŽ˜์ด์ง€๋ฅผ ์‚ฌ์ง„์œผ๋กœ ์ฐ์€ ํ›„ ์‚ฌ์ง„์œผ๋กœ๋ถ€ํ„ฐ ์ด๋ฏธ์ง€๋ฅผ ์ž˜๋ผ๋ƒ„

โ–ช photo transformation์ด๋ผ๋Š” ์šฉ์–ด๋กœ ์ •์˜(black box transformation์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Œ) 

 

โ—พ Photo transformation ์ „ํ›„ clean image, advesarial image์— ๋Œ€ํ•œ ์ •ํ™•๋„์™€ photo transformation์˜ ๋Œ€์ƒ์ธ adversarial image์˜ destruction rate ๊ณ„์‚ฐ

 

Demonstration of Black Box Adversarial Attack in the Physical World

โ—พ ์œ„์—์„œ ์ง„ํ–‰ํ•œ ์‹คํ—˜์€  adversary๊ฐ€ ๋ชจ๋ธ์— ์ ‘๊ทผ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์ „์ œ๋กœ ํ•˜์ง€๋งŒ ์‹ค์ œ ์„ธ๊ณ„์—์„œ๋Š” ๋ธ”๋ž™๋ฐ•์Šค ์‹œ๋‚˜๋ฆฌ์˜ค๊ฐ€ ํ˜„์‹ค์ 

โ–ช transferbility property๋กœ ์ธํ•ด adversarial example์€ ๋ธ”๋ž™๋ฐ•์Šค ๊ณต๊ฒฉ์— ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ธ์‡„๋œ advesarial example๋„ ์˜คํ”ˆ ์†Œ์Šค TensorFlow ์นด๋ฉ”๋ผ ๋ฐ๋ชจ๋ฅผ ์†์ด๋Š” ๊ฒƒ์„ ํ™•์ธ

 

 

Artificial Image Transformations

โ—พ ์ธ์œ„์ ์ธ ์ด๋ฏธ์ง€ ๋ณ€ํ™˜์— ๋Œ€ํ•œ destruction rate๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ๋Œ€๋น„/๋ฐ๊ธฐ ๋ณ€ํ™”, ๊ฐ€์šฐ์‹œ์•ˆ ๋ธ”๋Ÿฌ, ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ, JPEG ์ธ์ฝ”๋”ฉ ์ ์šฉ

(**์ด๋ฏธ์ง€ ์ธ์‡„, ์‚ฌ์ง„ ์ฐ๊ธฐ, ์ด๋ฏธ์ง€ ์ž˜๋ผ๋‚ด๊ธฐ๋Š” ๋‹จ์ˆœํ™˜ ๋ณ€ํ™˜์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋Œ€๋น„/๋ฐ๊ธฐ ๋ณ€ํ™”+a ์ ์šฉํ–ˆ๋‹ค๋Š” ์˜๋ฏธ์ธ ๋“ฏ) 

 

 

Conclusion

์€ ํŒจ์Šค!

 

 

 

์ •๋ฆฌํ•˜๊ณ  ๋ณด๋‹ˆ๊นŒ ์—„์ฒญ ์กฐ์žกํ•˜๋‹ค. ๋‹ค์Œ ๋ฒˆ์— ๋‹ค๋ฅธ ๋…ผ๋ฌธ ์ •๋ฆฌํ•  ๋•Œ๋Š” ๋” ๊น”๋”ํ•˜๊ฒŒ ์ ์„ ์ˆ˜ ์žˆ๊ธฐ๋ฅผ ๋ฐ”๋ผ๋ฉด์„œ ๐Ÿค