A Beholder awakens. Its myriad eyes, each a facet of mechanistic insight, gaze upon the intricate layers of information, revealing hidden patterns in the dreams of code. In the tapestry of deepdream, the Beholder becomes the guardian of interpretability, its central eye illuminating the enigmatic connections woven within the digital labyrinth.
Beauty is in the eye of the Beholder
The Quest
Produce deepdreams from an image classifier. Try to identify specific features in the network, and alter them to blind the network.
Deepdream
Deepdream is fairly similar to what we used to fool an image classifier. Instead of backpropagating to the original image to minimize the loss for a malicious label. We backpropagate to the original image with the intention to maximize some activation layer in the middle of the network. This is called gradient ascent.
Hook into the image classifier
First we need to hook into the classifier to get access to the activation values of the network
By performing gradient ascent on any layer we amplify the input pixels that would make this layer more active.
defshallowdream(start,layer=35,channel=None,m=vgg_hooked,mem=vgg_mem,learning_rate=0.01,epochs=30):start=copy.deepcopy(start.detach())# move to devicedream=start.to(device).requires_grad_()m=m.to(device)for_intqdm(range(epochs)):m(dream)loss=mem[layer].norm()ifchannelisNoneelsemem[layer][:,channel,:,:].norm()dream.grad=Noneloss.backward()dream.data=torch.clip((dream+dream.grad*learning_rate),0.,1.).data# jumping through hoops to please pytorchreturndrea
Now we just have to choose a layer and let the model dream
Dreaming from the sky on all layers
In this example we can see that earlier layers produce simple features, and the deeper we probe into the network the more complex patterns emerge.
To me some of the layers seem to have meaning (but it might just be an illusion spell):
25 looks like buildings, human cronstructions
30 like mountains
and 27 like creatures
Some special layers?
We can also choose a given layer and just dream deeper and deeper
A dog's nightmare
And a few more samples for different starting points
Dog on different layersVangogh on different layersVoid on different layers
Mechanistic interpretability
Our secondary goal is to identify channels in each layers that are particulary ketering to Kelpie dogs.
Identify channels
We can feed a bunch of pictures of kelpies and look at the channels that are the most activated and shared between all kelpies.
defsave_activations(start,m=vgg_hooked,mem=vgg_mem):# move to devicedream=start.to(device).requires_grad_()m=m.to(device)# run the modelm(dream)# make a copy of the activationsactivations={k:copy.deepcopy(output.detach())fork,outputinmem.items()}returnactivations# compute the top n channels with the highest normdeftopn(activation,n,threshold):channels=activation.shape[1]top=sorted(zip(activation.view(channels,-1).norm(dim=1),range(channels)),reverse=True)[:n]return[idxfornorm,idxintopifnorm>threshold]deftopn_activations(activations,n=10,threshold=0):return{k:topn(activation,n=n,threshold=threshold)fork,activationinactivations.items()}defcount_topn(all_topn):counts=defaultdict(Counter)fortopninall_topn:forlayer,topintopn.items():counts[layer].update(top)returncountsall_activations=[save_activations(kelpie)forkelpieinkelpies]all_topn=[topn_activations(activations)foractivationsinall_activations]counts=count_topn(all_topn)
Take a look at the features:
Channel shared by all Kelpies
Blind the network
Now lets disable the channels we identified and see how the classifier behaves.
defblinder(counts,model,min_layer=20,most_common=5,threshold=4):# nuke a channeldefnuke(layer,channel):deff(module,input,output):output[:,channel,:,:]=0.layer.register_forward_hook(f)m=copy.deepcopy(model).to(device)forlayer,countincounts.items():# lower layers are basic features like edges, so it doesn't make sense to nuke themiflayer<min_layer:continueforchannel,occurencesincount.most_common(most_common):ifoccurences<threshold:breaknuke(m.features[layer],channel)returnmvgg_blind=blinder(counts,vgg)@torch.no_grad()defevaluate_blinder(img):res=[]models=[('VGG19',vgg.eval()),('blinded',vgg_blind.eval()),]forname,modelinmodels:label,confidence=classify(img,model)iflen(label)>20:label=label[:20]+'...'res.append(f'{name:8}: {label:23} ({confidence*100:.2f}%)')return'\n'.join(res)
Classifying Kelpies after brain surgeryClassifying controls after brain surgery
The third picture still register as kelpie, but everything else is gone, and the control still match. I’m ok with that, even a blind chicken finds a grain once in a while ;)