๐Ÿ‘ฉ‍๐Ÿ’ป

[ART] attack_defence_imagenet.ipynb ์ฝ”๋“œ ์‹ค์Šต

geum 2022. 1. 18. 13:37

 

์›๋ณธ ์ฝ”๋“œ๋ฅผ ๋Œ๋ ค๋ณด๊ณ  ๋๋‚ด๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ attack/defence์— ๋Œ€ํ•œ ์ฝ”๋“œ๋ฅผ ์กฐ๊ธˆ์”ฉ ๋ฐ”๊ฟ”๋ณด๋ฉด์„œ ์‹ค์Šต์„ ์ง„ํ–‰ํ–ˆ๋‹ค.

 

โœ… ์ฝ”๋“œ ์›๋ณธ : 

https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/attack_defence_imagenet.ipynb

 

GitHub - Trusted-AI/adversarial-robustness-toolbox: Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning S

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams - GitHub - Trusted-AI/adversarial-robustness-too...

github.com

 

Load Images

image_list = []

# imagenet_stubs.get_image_paths()์—์„œ ์ธ๋ฑ์Šค์™€ ์ด๋ฏธ์ง€ ๊ฒฝ๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ
for i, image_path in enumerate(imagenet_stubs.get_image_paths()):
    img = image.load_img(image_path, target_size=(224, 224))
    img = image.img_to_array(img)
    
    image_list.append(img)
    
    if 'koala.jpg' in image_path:
        koala_idx = i

imgs = np.array(image_list)

 

์ฒ˜์Œ์— if 'koala.jpg' in image_path: ๋ถ€๋ถ„์—์„œ ImageNet์— ์žˆ๊ฒ ์ง€ ์‹ถ์€ ์‚ฌ์ง„ ์ด๋ฆ„ ์•„๋ฌด๊ฑฐ๋‚˜ ๋„ฃ์—ˆ๋‹ค. 'car.jpg' ๋„ฃ์—ˆ๋Š”๋ฐ car_idx = i ๋ถ€๋ถ„์—์„œ ์—๋Ÿฌ๊ฐ€ ๋‚˜๊ธธ๋ž˜ imagenet_stubs.get_image_paths()๋ฅผ ์ฐพ์•„๋ณด์•˜๋‹ค.

 

์•„๋ฌด ์‚ฌ์ง„์ด๋‚˜ ๋„ฃ๋Š” ๊ฒŒ ์•„๋‹ˆ๊ณ  https://github.com/nottombrown/imagenet-stubs/tree/master/imagenet_stubs/images ์ด ํด๋”์— ์žˆ๋Š” ์ด๋ฏธ์ง€๋ฅผ ๋„ฃ์–ด์ค˜์•ผ ํ–ˆ๋‹ค.

 

์ด๋ฏธ์ง€ ์ธ๋ฑ์Šค๋„ ์ด ํด๋” ์•ˆ์˜ ์ˆœ์„œ๋ฅผ ๋”ฐ๋ผ๊ฐ

# koala_idx ์ถœ๋ ฅ ๊ฒฐ๊ณผ = 5
idx = koala_idx

 

์ด๋ฏธ์ง€๊ฐ€ ์ž˜ ๋‚˜์˜ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๐Ÿจ

 

Load ResNet50 Classifier

model = ResNet50(weights='imagenet')

 

ResNet50 ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด keras์— ๋‚ด์žฅ๋˜์–ด ์žˆ๋Š” ResNet50์„ ๋กœ๋“œํ•œ๋‹ค. weights ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” None(๋žœ๋ค ์ดˆ๊ธฐํ™”), imagenet(ImageNet์—์„œ pre-training), ๊ฐ€์ค‘์น˜ ํŒŒ์ผ ๊ฒฝ๋กœ ์„ธ ๊ฐ€์ง€ ์ค‘์—์„œ ์›ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ง€์ •ํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

 

# ResNet50์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€ ์ฐจ์› ์ถ”๊ฐ€ ๋ฐ ์ „์ฒ˜๋ฆฌ
x = np.expand_dims(imgs[idx].copy(), axis=0)
x = preprocess_input(x)

pred = model.predict(x)
label = np.argmax(pred, axis=1)[0]
confidence = pred[:, label][0]

 

์ฐจ์› ๊ฐœ๋…์ด ์•„์ง๋„ ๋„ˆ๋ฌด ์–ด๋ ต๋‹ค ๐Ÿ˜ญ

 

๋ชจ๋ธ์ด ์ž๊ธฐ๊ฐ€ ์˜ˆ์ธกํ•œ ๊ฐ’์— ๋Œ€ํ•ด ์ •๋‹ต์ด๋ผ๊ณ  ํ™•์‹ ํ•˜๊ณ  ์žˆ๊ณ  ์˜ˆ์ธก ๊ฒฐ๊ณผ๋„ ์ •ํ™•ํ•œ ๊ฒƒ ํ™•์ธ!

 

Create Adversarial Sample

# ๊ณต๊ฒฉ์ž ์ƒ์„ฑ
adv = ProjectedGradientDescent(classifier, targeted=False, max_iter=10, eps_step=1, eps=5)

# adversarial sample ์ƒ์„ฑ
x_art_adv = adv.generate(x_art)

 

PGD๋ฅผ ์ด์šฉํ•ด์„œ ๊ณต๊ฒฉ์ž์™€ adversarial example์„ ์ƒ์„ฑํ•œ๋‹ค. targeted=False๋กœ ์ง€์ •ํ•˜๊ณ  untargeted attack์— ๋Œ€ํ•ด์„œ ๋จผ์ € ํ™•์ธํ•ด๋ณด๊ธฐ๋กœ ํ•œ๋‹ค.

 

1. Untargeted Attack

plt.figure(figsize=(8, 8)); plt.imshow(x_art_adv[0]/255); plt.axis('off'); plt.show()

pred_adv = classifier.predict(x_art_adv)
label_adv = np.argmax(pred_adv, axis=1)[0]
confidence_adv = pred_adv[:, label_adv][0]

print(f'Prediction: {label_to_name(label_adv)} - confidence: {confidence_adv}')

 

 

๋ˆˆ์œผ๋กœ ๋ดค์„ ๋•Œ๋Š” ๊ทธ๋ƒฅ ์ฝ”์•Œ๋ผ ์‚ฌ์ง„์ด์ง€๋งŒ ๋ชจ๋ธ์€ ๋‚ฎ์ง€ ์•Š์€ confidence๋ฅผ ๊ฐ€์ง€๊ณ  ์กฑ์ œ๋น„๋กœ ๋ถ„๋ฅ˜ํ–ˆ๋‹ค.

 

2. Targeted Attack

Targeted attack์€ ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ํŠน์ • ๋ ˆ์ด๋ธ”๋กœ ์˜ค๋ถ„๋ฅ˜ํ•˜๋„๋ก ์›ํ•˜๋Š” ๋ ˆ์ด๋ธ”์„ ์ง€์ •ํ•ด์ค€๋‹ค. ๋ ˆ์ด๋ธ”์ด ๋ชจ๋‘ 1000๊ฐœ์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•ด ๊ณต๊ฒฉ์„ ์ง„ํ–‰ํ•œ๋‹ค.

 

# target label = label 84 peacock
target_label = 84

adv.set_params(targeted=True)

# adversarial sample ์ƒ์„ฑ
x_art_adv = adv.generate(x_art, y=to_categorical([target_label]))

plt.figure(figsize=(8, 8)); plt.imshow(x_art_adv[0]/255); plt.axis('off'); plt.show()

 

 

๋ถ„๋ฅ˜๊ธฐ๊ฐ€ adversarial sample์„ ๋ถ„๋ฅ˜ํ•œ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ด๋ณด๋ฉด target label๋กœ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

Apply Defences

adversarial example์ด ๋“ค์–ด์™€๋„ ์ œ๋Œ€๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ด๋‹น ๊ณต๊ฒฉ์— ๋Œ€ํ•œ ๋ฐฉ์–ด๋ฅผ ์ ์šฉํ•ด๋ณด๊ธฐ๋กœ ํ•œ๋‹ค. ์›๋ณธ ์ฝ”๋“œ์—์„œ ์‚ฌ์šฉํ•œ ๋ฐฉ์–ด ๋ฐฉ์‹์€ 'Spatial Smoothing'์ธ๋ฐ, ์•„์ง ๋ฐฉ์–ด์— ๋Œ€ํ•œ ๊ณต๋ถ€๋Š” ๋”ฐ๋กœ ์•ˆํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— spatial smoothing ๋ฐฉ์‹์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ–ˆ๋‹ค.

 

โœ… Spatial Smoothing

โ—ฝ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์‹ฑ์—์„œ ์žก์Œ(noise)๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•

โ—ฝ blur๋ผ๊ณ  ๋ถ€๋ฅด๊ธฐ๋„ ํ•จ

โ—ฝ ๊ฐ ํ”ฝ์…€์„ smoothํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๋ฐ์— ์‚ฌ์šฉ๋˜๋Š” ์˜์—ญ ๋ฒ”์œ„์— ๋”ฐ๋ผ local smoothing, non-local smoothing์œผ๋กœ ๋‚˜๋‰จ

 

# ๊ฐ ํ”ฝ์…€ ์œ„๋กœ sliding-window๋ฅผ ์ง„ํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์— window_size ์ง€์ • ๊ฐ€๋Šฅ
ss = SpatialSmoothing(window_size=3)

# ์›๋ณธ ์ž…๋ ฅ๊ณผ adversarial sample์— defences ์ ์šฉ
x_art_def, _ = ss(x_art)
x_art_adv_def, _ = ss(x_art_adv)

# Compute the classifier predictions on the preprocessed inputs:
pred_def = classifier.predict(x_art_def)
label_def = np.argmax(pred_def, axis=1)[0]
confidence_def = pred_def[:, label_def][0]

 

ss(x_art), ss(x_art_adv)๋ฅผ ํ• ๋‹นํ•  ๋•Œ ๋‘˜ ๋‹ค ๋ณ€์ˆ˜๋ช…์ด ์•„๋‹ˆ๋ผ _ ํ‘œ์‹œ๊ฐ€ ๋“ค์–ด๊ฐ€๋Š”๋ฐ ์ € ๋ถ€๋ถ„์„ ์ง€์›Œ๋ดค๋”๋‹ˆ ๋ฐฐ์—ด์ด์—ˆ๋‚˜ ๋ญ ํฌ๊ธฐ๊ฐ€ ์•ˆ ๋งž๋‹ค๊ณ  ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค. ๊ฐœ์ธ์ ์ธ ์ƒ๊ฐ์ธ๋ฐ ๋’ค์ชฝ _ ๋ถ€๋ถ„์€ ๋”ฑํžˆ ์—ญํ• ์ด ์žˆ๋Š” ๊ฒƒ ๊ฐ™์ง€ ์•Š๋‹ค. 

 

pred_adv_def = classifier.predict(x_art_adv_def)
label_adv_def = np.argmax(pred_adv_def, axis=1)[0]
confidence_adv_def = pred_adv_def[:, label_adv_def][0]

# Print the predictions:
print('Prediction of original sample:', label_to_name(label_def), '- confidence {0:.2f}'.format(confidence_def))
print('Prediction of adversarial sample:', label_to_name(label_adv_def), 
      '- confidence {0:.2f}'.format(confidence_adv_def))

# Show the preprocessed adversarial sample:
plt.figure(figsize=(8,8)); plt.imshow(x_art_adv_def[0] / 255); plt.axis('off'); plt.show()

 

 

์ถœ๋ ฅ๋œ ์ด๋ฏธ์ง€๋Š” ์›๋ณธ ์ฝ”์•Œ๋ผ ์‚ฌ์ง„๊ณผ ๋น„๊ตํ•˜๋ฉด ํ™•์‹คํžˆ ๋ธ”๋Ÿฌ ํšจ๊ณผ๊ฐ€ ๋“ค์–ด๊ฐ”๊ณ  adversarial sample์— ๋Œ€ํ•œ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋Š” original sample๊ณผ ๋™์ผํ•˜๋‹ค. perturbation์ด ์ถ”๊ฐ€๋œ ์ด๋ฏธ์ง€์— spatial smoothing๋ฅผ ์ ์šฉํ•ด์„œ perturbation์„ ์ƒ์‡„(๋ผ๋Š” ํ‘œํ˜„์ด ๋งž๋Š”์ง€ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ)ํ•ด์ฃผ๋Š” ๋ฐฉ์‹์œผ๋กœ ์ดํ•ดํ–ˆ๋‹ค.

 

Adaptive Whitebox Attack to Defeat Defences

์„ธ์ƒ์— ์™„๋ฒฝํ•œ ๋ฐฉ์–ด๋Š” ์—†๋‹ค๊ณ  ํ–ˆ๋‹ค. ๋ฐฉ์–ด๋ฒ•์ด ์žˆ์œผ๋ฉด ๋‹ค์‹œ ๊ทธ๊ฑธ ๊นจ๋Š” ๊ณต๊ฒฉ์ด ์žˆ๋Š” ๋ฒ•!

 

# defences๋ฅผ ํฌํ•จํ•˜๋Š” ๋ถ„๋ฅ˜๊ธฐ ์ƒ์„ฑ
classifier_def = KerasClassifier(preprocessing=preprocessor, preprocessing_defences=[ss], clip_values=(0, 255), 
                                 model=model)

 

์œ„์ชฝ์—์„œ ์‚ฌ์šฉํ•œ classifier์™€ ์ด ๋ถ€๋ถ„์—์„œ ์‚ฌ์šฉํ•˜๋Š” classifier์˜ ์ฐจ์ด์ ์€ preprocessing_defences ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ํฌํ•จ ์—ฌ๋ถ€์ด๋‹ค. ๋ถ„๋ฅ˜๊ธฐ์— ์ ์šฉ๋  preprocessing defence(s)๋ฅผ ์ง€์ •ํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

 

# Create the attacker.
# Note: here we use a larger number of iterations to achieve the same level of confidence in the misclassification
adv_def = ProjectedGradientDescent(classifier_def, targeted=True, max_iter=40, eps_step=1, eps=5)

# Generate the adversarial sample:
x_art_adv_def = adv_def.generate(x_art, y=to_categorical([target_label]))

# Plot the adversarial sample (note: we swap color channels back to RGB order):
plt.figure(figsize=(8,8)); plt.imshow(x_art_adv_def[0] / 255); plt.axis('off'); plt.show()

# And apply the classifier to it:
pred_adv = classifier_def.predict(x_art_adv_def)
label_adv = np.argmax(pred_adv, axis=1)[0]
confidence_adv = pred_adv[:, label_adv][0]
print('Prediction:', label_to_name(label_adv), '- confidence {0:.2f}'.format(confidence_adv))

 

๊ณต๊ฒฉ์ž์™€ adversarial sample ์ƒ์„ฑ ๋ฐ ์˜ˆ์ธก ์ฝ”๋“œ๋Š” ์ด์ „์— ์‚ฌ์šฉํ•œ ์ฝ”๋“œ์™€ ํฌ๊ฒŒ ๋‹ค๋ฅด์ง€ ์•Š๋‹ค.

 

 

์˜ˆ์ธก ๊ฒฐ๊ณผ๊ฐ€ ๋‹ค์‹œ peacock๋กœ ๋‚˜์˜จ๋‹ค.