In the dimly lit corridors of the ancient dungeon, where shadows dance and secrets lie in wait, an eerie silence is suddenly shattered by the faint creaking of wooden planks. Unbeknownst to the adventurers, a malevolent presence lurks among the mundane, adopting the guise of an innocuous chest or treasure trove. Beware the mimic, a shape-shifting aberration that hungers for the thrill of deception and the taste of unsuspecting intruders.
The Quest
Crafting a d&d mimic. Fool a CNN image classifier into thinking that a picture of a teapot is a chihuahua.
Just to have a point of comparison let’s see what it would take before the teapot stop to be recognized by ResNet. By adding noise to the image, or by masking the side of the image.
original_teapot=PIL.Image.open("teapot.jpg"teapot=processor(original_teapot,return_tensors="pt",do_normalize=do_normalize)['pixel_values']id,_,_,_=classify(teapot)# add random noise until it stops being a teapotnoisy_teapot=copy.deepcopy(teapot)whileTrue:noise=torch.randn_like(teapot)noisy_teapot+=noisenoisy_teapot.clamp_(0,1)# keep the image in the [0, 1] rangenid,_,_,_=classify(noisy_teapot)ifnid!=id:break# mask sides of the picture until it stops being a teapotmasked_teapot=copy.deepcopy(teapot)foriinrange(0,224,10):masked_teapot[0,:,:,:i]=0# mask a vertical strip on the leftmasked_teapot[0,:,:i,:]=0# mask an horizontal strip on the topnid,_,_,_=classify(masked_teapot)ifnid!=id:break
Modify using gradient
Let’s take advantage of the network to tell us how we could fool it.
Compute the gradient of the teapot with the desired label (e.g. “chihuahua”). And update the image with the generated gradient as noise, until it matches with the desired category.
To get the gradient I add a tensor of the same size as the trainingset filled with 0, and look at the gradient of this layer. By construction it will have the same gradient as the pixels in the image would.
defmorph(teapot,target_idx=target_idx):# some trick to get a gradientgradient_noise=torch.zeros_like(teapot,requires_grad=True)malicious_teapot=copy.deepcopy(teapot)+gradient_noiseforepochintqdm(range(epochs)):# forward passlogits=model(malicious_teapot).logitspredicted_idx=logits.argmax(-1).item()# shortcircuit as soon as we match the targetifpredicted_idx==target_idx:print(f'Predicted {model.config.id2label[predicted_idx]} in {epoch} epochs')breakgradient_noise.grad=Noneloss=F.cross_entropy(logits,torch.tensor([target_idx]))loss.backward()# update image with noisenew_noise=gradient_noise.grad*learning_ratemalicious_teapot-=new_noise# loggingifepoch%log_every==0:print(f"{epoch: 4}{loss=}")returnmalicious_teapotmalicious_teapot=morph(teapot)
Our magestic mimic:
Dreaming of Chihuahua
Now let’s crank it up to 11, how would the image change if we made the network overfit the picture on the chihuahua label.