Fooling an Image Classifier
In the dimly lit corridors of the ancient dungeon, where shadows dance and secrets lie in wait, an eerie silence is suddenly shattered by the faint creaking of wooden planks. Unbeknownst to the adventurers, a malevolent presence lurks among the mundane, adopting the guise of an innocuous chest or treasure trove. Beware the mimic, a shape-shifting aberration that hungers for the thrill of deception and the taste of unsuspecting intruders.
The Quest
Crafting a d&d mimic. Fool a CNN image classifier into thinking that a picture of a teapot is a chihuahua.
Image Classifier
Our target will be ResNet-50 pre-trained on Imagenet-1k.
Pull a pre-trained model from huggingface:
processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50")
model = ResNetForImageClassification.from_pretrained("microsoft/resnet-50")
For this post we’ll disable the normalization from the image processor because it makes things look wonky for humans
img_normalized = processor(image, return_tensors="pt")
img_not_normalized = processor(image, return_tensors="pt", do_normalize=False)
Modify a picture by hand
Just to have a point of comparison let’s see what it would take before the teapot stop to be recognized by ResNet. By adding noise to the image, or by masking the side of the image.
original_teapot = PIL.Image.open("teapot.jpg"
teapot = processor(original_teapot, return_tensors="pt", do_normalize=do_normalize)['pixel_values']
id, _, _, _ = classify(teapot)
# add random noise until it stops being a teapot
noisy_teapot = copy.deepcopy(teapot)
while True:
noise = torch.randn_like(teapot)
noisy_teapot += noise
noisy_teapot.clamp_(0, 1) # keep the image in the [0, 1] range
nid, _, _, _ = classify(noisy_teapot)
if nid != id: break
# mask sides of the picture until it stops being a teapot
masked_teapot = copy.deepcopy(teapot)
for i in range(0, 224, 10):
masked_teapot[0, :, :, :i] = 0 # mask a vertical strip on the left
masked_teapot[0, :, :i, :] = 0 # mask an horizontal strip on the top
nid, _, _, _ = classify(masked_teapot)
if nid != id: break
Modify using gradient
Let’s take advantage of the network to tell us how we could fool it.
Compute the gradient of the teapot with the desired label (e.g. “chihuahua”). And update the image with the generated gradient as noise, until it matches with the desired category.
To get the gradient I add a tensor of the same size as the trainingset filled with 0, and look at the gradient of this layer. By construction it will have the same gradient as the pixels in the image would.
def morph(teapot, target_idx=target_idx):
# some trick to get a gradient
gradient_noise = torch.zeros_like(teapot, requires_grad=True)
malicious_teapot = copy.deepcopy(teapot) + gradient_noise
for epoch in tqdm(range(epochs)):
# forward pass
logits = model(malicious_teapot).logits
predicted_idx = logits.argmax(-1).item()
# shortcircuit as soon as we match the target
if predicted_idx == target_idx:
print(f'Predicted {model.config.id2label[predicted_idx]} in {epoch} epochs')
break
gradient_noise.grad = None
loss = F.cross_entropy(logits, torch.tensor([target_idx]))
loss.backward()
# update image with noise
new_noise = gradient_noise.grad * learning_rate
malicious_teapot -= new_noise
# logging
if epoch % log_every == 0:
print(f"{epoch: 4} {loss=}")
return malicious_teapot
malicious_teapot = morph(teapot)
Our magestic mimic:
Dreaming of Chihuahua
Now let’s crank it up to 11, how would the image change if we made the network overfit the picture on the chihuahua label.
The code
You can get the code at https://github.com/peluche/mimic