Artificial Intelligence/Paper

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

geum 2022. 3. 3. 10:59

์—ฐ๊ตฌ์‹ค์—์„œ AI๋ณด์•ˆ ์ชฝ ๊ณต๋ถ€ํ•  ๋•Œ ๊ณต๊ฒฉ์— ๋Œ€ํ•ด์„œ๋งŒ ๊ณต๋ถ€ํ–ˆ๋˜ ๊ฑฐ ๊ฐ™์•„์„œ ๋ฐฉ์–ด ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ๊ถ๊ธˆํ•ด์กŒ๋‹ค. ์ด๋ฒˆ ์ฃผ ๋…ผ๋ฌธ์œผ๋กœ ๋‹น์ฒจ ๐Ÿ‘Š

 

๐Ÿ’ฌ ๋…ผ๋ฌธ ๋‚ด์šฉ๊ณผ ์ด ๊ธ€์— ๋Œ€ํ•œ ์˜๊ฒฌ ๊ณต์œ , ์˜คํƒˆ์ž ์ง€์  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํŽธํ•˜๊ฒŒ ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” !


์›๋ฌธ : https://arxiv.org/pdf/1704.01155.pdf

 

Abstract

์ด์ „ ์—ฐ๊ตฌ๋“ค์€ adversarial example์„ ๋ฐฉ์–ดํ•˜๊ธฐ ์œ„ํ•ด DNN(Deep Neural Network) ๋ชจ๋ธ์„ ๊ฐœ์„ (๋ชจ๋ธ ์ž์ฒด๋ฅผ ์ˆ˜์ •ํ•ด์•ผ ํ•จ)ํ•˜๋Š” ๊ฒƒ์— ์ดˆ์ ์„ ๋งž์ท„์ง€๋งŒ ์„ฑ๊ณต์ด ์ œํ•œ์ ์ด๊ณ  ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋†’๋‹ค๋Š” ๋‹จ์  ์กด์žฌ

→ adversarial examples๋ฅผ ํƒ์ง€ํ•จ์œผ๋กœ์จ DNN ๋ชจ๋ธ์„ ๊ฐ•ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” Feature Squeezing ๋ฐฉ์‹ ์ œ์‹œ

 

Introduction

- ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ adversarial inputs์„ ํƒ์ง€ํ•œ๋‹ค๋ฉด ์‚ฌ์šฉ์ž์—๊ฒŒ ๊ฒฝ๊ณ ๋ฅผ ๋ณด๋‚ด๊ฑฐ๋‚˜ fail-safe action์„ ์ทจํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ณต๊ฒฉ ์‹œ๋„๋ฅผ ํƒ์ง€ํ•˜๋Š” ๊ฒƒ์€ ์ •ํ™•ํ•œ ๊ฒฐ๊ณผ ์˜ˆ์ธก๋งŒํผ ์ค‘์š”ํ•˜๋‹ค.

- ์šฐ๋ฆฌ ์—ฐ๊ตฌ๋Š” ์ž…๋ ฅ ์ƒ˜ํ”Œ์„ ๋ณ€๊ฒฝํ•˜๋˜ ๋ชจ๋ธ ์ž์ฒด๋Š” ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์ ์—์„œ ์ด์ „ ์—ฐ๊ตฌ๋“ค๊ณผ ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค. 

- Feature squeezing ๋ฐฉ์‹์€ ๋ถˆํ•„์š”ํ•œ ์ž…๋ ฅ ๊ณต๊ฐ„์„ ์ œ๊ฑฐํ•˜์—ฌ ๊ณต๊ฒฉ์ž๊ฐ€ adversarial example์„ ์ƒ์„ฑํ•  ๊ธฐํšŒ๋ฅผ ์ค„์ธ๋‹ค.

- Feature squeezing์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ์›๋ž˜ ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ์™€ squeezing์„ ๊ฑฐ์นœ ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

 

Feature squeezing ํ”„๋ ˆ์ž„์›Œํฌ

 

Background

Defensive Techniques

 

1) Adversarial Training : ๋ฐœ๊ฒฌ๋œ adversarial input๊ณผ adversarial input์— ๋Œ€์‘๋˜๋Š” ground truth ๋ ˆ์ด๋ธ”์„ ํ•™์Šต์— ์‚ฌ์šฉํ•œ๋‹ค. ๊ณต๊ฒฉ์ž๊ฐ€ ๋ฏธ๋ฆฌ ํ›ˆ๋ จ๋œ ๊ณต๊ฒฉ ๋ฐฉ์‹(= ๋ชจ๋ธ์ด ์•Œ๊ณ  ์žˆ๋Š” ๊ณต๊ฒฉ ๋ฐฉ์‹)๋งŒ์„ ์‚ฌ์šฉํ•  ๊ฒƒ์ด๋ผ๊ณ  ๋ณด์žฅํ•  ์ˆ˜ ์—†๊ณ  ํ•™์Šต ๋น„์šฉ์ด ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.

 

2) Gradient Masking : ** cleverhans blog์— ์ •๋ฆฌ๋œ ๊ฒŒ ์žˆ๋Š”๋ฐ ๋ญ”๋ง์ธ์ง€ ๋ชจ๋ฅด๊ฒ ์–ด์„œ ๋‚˜์ค‘์— ๋‹ค์‹œ ์ •๋ฆฌํ•ด์•ผ๊ฒ ๋‹ค.

 

3) Input Transformation : ์ž…๋ ฅ์„ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ชจ๋ธ์ด ์ž…๋ ฅ์˜ ์ž‘์€ ๋ณ€ํ™”์— ๋ฏผ๊ฐํ•˜์ง€ ์•Š๋„๋ก ๋ชจ๋ธ์„ ๊ฐ•ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

** ๋‚ด์šฉ ์ถ”๊ฐ€ ํ•„์š”

 

Feature Squeezing Methods

A. Color Depth

 

๋…ผ๋ฌธ ์ €์ž๋“ค์€ ๋น„ํŠธ ๊นŠ์ด๋ฅผ ์ค„์ด๋ฉด ๋ถ„๋ฅ˜๊ธฐ ์ •ํ™•๋„๋ฅผ ์†์ƒ์‹œํ‚ค์ง€ ์•Š๊ณ  ์ ๋Œ€์  ๊ธฐํšŒ๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ  ์ดˆ์ ์„ ๋งž์ถ˜ ๋‘ ๊ฐ€์ง€ ํ‘œํ˜„์€ 8๋น„ํŠธ ๊ทธ๋ ˆ์ด์Šค์ผ€์ผ, 24๋น„ํŠธ ์ƒ‰์ƒ์œผ๋กœ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์‚ฌ์šฉํ•œ๋‹ค. ๊ทธ๋ ˆ์ด์Šค์ผ€์ผ ์˜์ƒ์€ ๊ฐ ํ”ฝ์…€์— ๋Œ€ํ•ด 256(0~255)๊ฐœ์˜ ์ƒ‰์ƒ ๊ฐ’์„ ์ œ๊ณตํ•œ๋‹ค. 8๋น„ํŠธ ์Šค์ผ€์ผ์„ ํ™•์žฅํ•˜์—ฌ RGB ์ฑ„๋„๋กœ ์ปฌ๋Ÿฌ ์˜์ƒ์„ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

1) Squeezing Color Bits : ์‚ฌ๋žŒ๋“ค์€ ํ‘œ์‹œ๋œ ์ด๋ฏธ์ง€๋ฅผ natural image์— ๊ฐ€๊น๊ฒŒ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ larger bit depth๋ฅผ ์„ ํ˜ธํ•˜์ง€๋งŒ, ์‚ฌ์‹ค ์ด๋ฏธ์ง€ ํ•ด์„์— ์žˆ์–ด large color depth๊ฐ€ ํ•„์š”ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค(ํ‘๋ฐฑ ์ด๋ฏธ์ง€ ์ธ์‹์— ๋ฌธ์ œ๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ). MNIST, CIFAR-10, ImageNet ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ bit depth squeezing์„ ๊ด€์ฐฐํ–ˆ๋‹ค.

 

Bit depth reduction ์˜ˆ์‹œ

 

โ—ป Greyscale Images(MNIST) : ์œ„ ์‚ฌ์ง„์—์„œ ๊ฐ€์žฅ ์™ผ์ชฝ์— ์žˆ๋Š” ์ด๋ฏธ์ง€๋Š” ์›๋ณธ(8๋น„ํŠธ) ์ด๋ฏธ์ง€๊ณ  ๊ฐ€์žฅ ์˜ค๋ฅธ์ชฝ์€ 1๋น„ํŠธ monochrome ์ด๋ฏธ์ง€๋‹ค. 1๋น„ํŠธ ์ด๋ฏธ์ง€ feature space๋Š” 8๋น„ํŠธ ์ด๋ฏธ์ง€์˜ 1/128์ด์ง€๋งŒ ์ด๋ฏธ์ง€๋ฅผ ๋ถ„๊ฐ„ํ•˜๋Š” ๋ฐ์—๋Š” ์ง€์žฅ์ด ์—†๋‹ค.

 

โ—ป Color Images(CIFAR-10, ImageNet) : ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€๋„ MNIST์™€ ๋น„์Šทํ•˜๊ฒŒ ๋น„ํŠธ ๊นŠ์ด๊ฐ€ ๊ฐ์†Œํ•˜๋”๋ผ๋„ ์ด๋ฏธ์ง€๋ฅผ ๊ตฌ๋ณ„ํ•  ์ˆ˜๋Š” ์žˆ์ง€๋งŒ ๊ทธ๋ ˆ์ด์Šค์ผ€์ผ ์ด๋ฏธ์ง€์™€ ๋‹ค๋ฅด๊ฒŒ 4 ๋ฏธ๋งŒ์˜ ๋น„ํŠธ ๊นŠ์ด๋กœ ์ค„์ผ ๊ฒฝ์šฐ ์ด๋ฏธ์ง€์—์„œ ์ผ๋ถ€ ์†์‹ค์ด ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

2) Implementation : ์–ด๋–ค ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ผ๋Š”์ง€, ์–ด๋–ค ์‹์œผ๋กœ ๊ตฌํ˜„ํ–ˆ๋Š”์ง€์— ๋Œ€ํ•œ ๋‚ด์šฉ์ด๋ผ ํ•ด์„์€ ํŒจ์Šค!

 

B. Spatial Smoothing(=blur)

 

1) Local Smoothing : ๊ทผ์ฒ˜ ํ”ฝ์…€์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ”ฝ์…€์„ smoothํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. Gaussian smooting, mean smoothing, median smoothing์„ ์‚ฌ์šฉํ•˜์—ฌ ์ธ์ ‘ ํ”ฝ์…€์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•  ์ˆ˜ ์žˆ๊ณ  median smoothing์€ L0 attack์— ์˜ํ•ด ์ƒ์„ฑ๋œ adversarial example๋ฅผ ์™„ํ™”ํ•  ๋•Œ ํŠนํžˆ ํšจ๊ณผ์ ์ด๋‹ค.

 

2) Non-local Smoothing : Local Smoothing๋ณด๋‹ค ๋” ๋„“์€ ์˜์—ญ์„ ์‚ฌ์šฉํ•œ๋‹ค. **์ข€ ๋” ์ž์„ธํžˆ ์•Œ์•„๋ณผ ํ•„์š”๊ฐ€ ์žˆ๋‹ค. ์ž˜ ๋ชจ๋ฆ„ ๐Ÿ˜ต

 

C. Other Squeezing Methods

 

๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ ๋ฐฉ์‹์€ ์•„๋‹ˆ์ง€๋งŒ lossy compression, dimension reduction์œผ๋กœ๋„ Feature squeezing์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

 

Robustness

Feature squeezing์ด adversarial examples๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํƒ์ง€ํ•˜๋ ค๋ฉด ๋‘ ๊ฐ€์ง€ ํŠน์ง•(์•„๋ž˜)์„ ์ถฉ์กฑํ•ด์•ผ ํ•œ๋‹ค. ์ด ํŒŒํŠธ์—์„œ๋Š” ์—ฌ๋Ÿฌ ๋ฐฉ์‹์˜ feature squeezing์ด ์ด ํŠน์ง•๋“ค์„ ์–ด๋–ป๊ฒŒ ๋งŒ์กฑํ•˜๋Š”์ง€ ํ™•์ธํ•œ๋‹ค.

 

โ‘  adversarial examples์— ๋Œ€ํ•œ squeezing

โ‘ก **on legitimate examples, ๋ถ„๋ฅ˜๊ธฐ์˜ ์˜ˆ์ธก์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š๋Š” squeezing

 

โ—ป Threat Model

- Robustness๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ํ•™์Šต๋œ target model์— ์™„๋ฒฝํ•˜๊ฒŒ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์˜ํ–ฅ์€ ๋ชป ๋ฏธ์น˜๋Š” powerful adversary๋ฅผ ๊ฐ€์ •ํ•œ๋‹ค. ๋˜ํ•œ adversary๋Š” feature squeezing์„ ์ธ์ง€ํ•˜์ง€ ๋ชปํ•˜๋ฉฐ white-box attack ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ์˜ค๋ถ„๋ฅ˜๋ฅผ ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋Š” ์ž…๋ ฅ์„ ์ฐพ๊ณ ์ž ํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค.

-  Standalone feature squeezer์˜ robustness๋ฅผ ๋ถ„์„ํ•˜๊ธด ํ•˜์ง€๋งŒ, ๊ณต๊ฒฉ์ž๊ฐ€ DNN ๋ชจ๋ธ ๊ณต๊ฒฉ ์‹œ feature squeezing์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ฐฉ์–ด๋ฅผ ์œ„ํ•ด standalone squeezer์˜ ์‚ฌ์šฉ์„ ์ œ์•ˆํ•˜์ง€๋Š” ์•Š๋Š”๋‹ค(๊ถŒ์žฅํ•˜์ง€ ์•Š๋Š”๋‹ค ์ด๋Ÿฐ ๋Š๋‚Œ).  

 

โ—ป Target Models : MNIST, CIFAR-10 ๋ฐ ImageNet ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜๋ฉฐ ๊ฐ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๊ฐ–๋Š” pre-trained model์„ ์„ค์ •ํ•œ๋‹ค.

 

๊ฐ ๋ชจ๋ธ์˜ ์˜ˆ์ธก ์„ฑ๋Šฅ๊ณผ ๊ตฌ์กฐ ์š”์•ฝ

 

โ—ป Attacks : ์ด 11๊ฐœ ๊ณต๊ฒฉ ๋ฐฉ์‹์— ๋Œ€ํ•ด feature squeezing์„ ํ‰๊ฐ€ํ–ˆ๊ณ  ๊ฐ targeted attack์— ๋Œ€ํ•ด ๋‘ ๊ฐ€์ง€ ๋Œ€์ƒ(t=L+1 mod #classes์ธ Next class, t=min(y hat)์ธ least-likely class)์„ ์‚ฌ์šฉํ•œ๋‹ค.

 

- t : target class

- L : ground-truth class์˜ index

- y hat : ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ prediction vector

 

 

Detecting Adversarial Inputs

Robustness ํŒŒํŠธ์—์„œ feature squeezing์ด legitimate examples์— ๋Œ€ํ•ด ์ •ํ™•๋„ ๊ฐ์†Œ ์—†์ด ์ •ํ™•ํ•œ ๋ชจ๋ธ ์˜ˆ์ธก์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ–ˆ๋‹ค. Feature squeezing ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๊ธฐ๋ณธ ์•„์ด๋””์–ด๋Š” ์›๋ž˜ ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ ๋ชจ๋ธ ์˜ˆ์ธก๊ณผ squeezing ํ›„ ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ ๋™์ผํ•œ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๋‘ ์˜ˆ์ธก์€ ์œ ์‚ฌํ•ด์•ผ ํ•˜๋ฉฐ ๋™์ผํ•œ ๋ชจ๋ธ์ด ๋‘ ์ƒ˜ํ”Œ์— ๋Œ€ํ•ด ๋ˆˆ์— ๋„๊ฒŒ ๋‹ค๋ฅธ ์˜ˆ์ธก์„ ๋‚ธ๋‹ค๋ฉด ์ž…๋ ฅ ์ƒ˜ํ”Œ์€ adversarial์ผ ํ™•๋ฅ ์ด ๋†’๋‹ค.

 

A. Detection Method

 

DNN ๋ถ„๋ฅ˜๊ธฐ์— ์˜ํ•ด ์ƒ์„ฑ๋œ prediction vector๋Š” ์ž…๋ ฅ ์ƒ˜ํ”Œ์ด ๊ฐ€๋Šฅํ•œ ํด๋ž˜์Šค์— ์†ํ•  ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.  ๋ชจ๋ธ์˜ ์›๋ณธ ์˜ˆ์ธก๊ณผ squeezed sample์˜ ์˜ˆ์ธก์„ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์€ ๋‘ ํ™•๋ฅ  ๋ถ„ํฌ ๋ฒกํ„ฐ๋ฅผ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์œผ๋กœ ํ™•๋ฅ  ๋ถ„ํฌ ๋น„๊ต ๋ฐฉ๋ฒ•์€ L1 norm, L2 norm, K-L divergence ๋“ฑ์ด ์žˆ๋‹ค.

 

์ˆ˜์‹์ ์ธ ๋‚ด์šฉ์€ ํ‘œ๊ธฐ๊ฐ€ ์‚ด์ง ๊ท€์ฐฎ์•„์„œ ์ƒ๋žต

 

B. Experimental Setup

 

โ—ป Datasets : MNIST ๋ฐ์ดํ„ฐ์…‹์—์„œ 2,000๊ฐœ(legitimate example 1000๊ฐœ, adversarial example 1000๊ฐœ)+CIFAR10 ๋ฐ์ดํ„ฐ์…‹์—์„œ 2,200๊ฐœ+ImageNet ๋ฐ์ดํ„ฐ์…‹์—์„œ 1,800๊ฐœ

 

- ๊ฐ ํƒ์ง€ ๋ฐ์ดํ„ฐ์…‹์„ ๋žœ๋คํ•œ ๋‘ ๊ทธ๋ฃน์œผ๋กœ ๋ถ„ํ• ํ•˜๊ณ  ํ•˜๋‚˜๋Š” detector ํ›ˆ๋ จ์„ ์œ„ํ•ด, ํ•˜๋‚˜๋Š” validation์„ ์œ„ํ•ด ์‚ฌ์šฉํ•˜์˜€๋‹ค.

 

โ—ป Squeezers : ๋จผ์ € ๊ฐ squeezing ๊ตฌ์„ฑ์ด ๊ฐ ๊ณต๊ฒฉ ๋ฐฉ๋ฒ•์— ์˜ํ•ด ์ƒ์„ฑ๋œ adversarial example์— ์–ผ๋งˆ์ž ์ž˜ ์ ์šฉ๋˜๋Š”์ง€ ํ‰๊ฐ€ํ•œ ํ›„ defender๊ฐ€ ๊ณต๊ฒฉ์ž์˜ ๊ณต๊ฒฉ ๋ฐฉ๋ฒ•์„ ์•Œ์ง€ ๋ชปํ•  ๋•Œ ๊ณต๊ฒฉ ๋ถ„ํฌ์— ๋Œ€ํ•ด ์ž˜ ์ž‘๋™ํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์„ฑ์„ ์„ ํƒํ•ด์•ผ ํ•˜๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ๊ณ ๋ คํ•œ๋‹ค.