In this example we'll classify an image with the bundled CaffeNet model (which is based on the network architecture of Krizhevsky et al. for ImageNet).
We'll compare CPU and GPU modes and then dig into the model to inspect features and the output.
Set up input preprocessing. (We'll use Caffe's caffe.io.Transformer to do this, but this step is independent of other parts of Caffe, so any custom preprocessing code may be used).
Our default CaffeNet is configured to take images in BGR format. Values are expected to start in the range [0, 255] and then have the mean ImageNet pixel value subtracted from them. In addition, the channel dimension is expected as the first (outermost) dimension.
As matplotlib will load images with values in the range [0, 1] in RGB format with the channel as the innermost dimension, we are arranging for the needed transformations here.
In [5]: # load the mean ImageNet image (as distributed with Caffe) for subtraction mu = np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy') mu = mu.mean(1).mean(1) # average over pixels to obtain the mean (BGR) pixel values print 'mean-subtracted values:', zip('BGR', mu) # create transformer for the input called 'data' transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape}) transformer.set_transpose('data', (2,0,1)) # move image channels to outermost dimension transformer.set_mean('data', mu) # subtract the dataset-mean value in each channel transformer.set_raw_scale('data', 255) # rescale from [0, 1] to [0, 255] transformer.set_channel_swap('data', (2,1,0)) # swap channels from RGB to BGR mean-subtracted values: [('B', 104.0069879317889), ('G', 116.66876761696767), ('R', 122.6789143406786)]First we'll see how to read out the structure of the net in terms of activation and parameter shapes.
For each layer, let's look at the activation shapes, which typically have the form (batch_size, channel_dim, height, width).
The activations are exposed as an OrderedDict, net.blobs.
In [13]: # for each layer, show the output shape for layer_name, blob in net.blobs.iteritems(): print layer_name + '\t' + str(blob.data.shape) data (50, 3, 227, 227) conv1 (50, 96, 55, 55) pool1 (50, 96, 27, 27) norm1 (50, 96, 27, 27) conv2 (50, 256, 27, 27) pool2 (50, 256, 13, 13) norm2 (50, 256, 13, 13) conv3 (50, 384, 13, 13) conv4 (50, 384, 13, 13) conv5 (50, 256, 13, 13) pool5 (50, 256, 6, 6) fc6 (50, 4096) fc7 (50, 4096) fc8 (50, 1000) prob (50, 1000)Now look at the parameter shapes. The parameters are exposed as another OrderedDict, net.params. We need to index the resulting values with either [0]for weights or [1] for biases.
The param shapes typically have the form (output_channels, input_channels, filter_height, filter_width) (for the weights) and the 1-dimensional shape (output_channels,) (for the biases).
In [14]: for layer_name, param in net.params.iteritems(): print layer_name + '\t' + str(param[0].data.shape), str(param[1].data.shape) conv1 (96, 3, 11, 11) (96,) conv2 (256, 48, 5, 5) (256,) conv3 (384, 256, 3, 3) (384,) conv4 (384, 192, 3, 3) (384,) conv5 (256, 192, 3, 3) (256,) fc6 (4096, 9216) (4096,) fc7 (4096, 4096) (4096,) fc8 (1000, 4096) (1000,) Since we're dealing with four-dimensional data here, we'll define a helper function for visualizing sets of rectangular heatmaps. In [15]: def vis_square(data): """Take an array of shape (n, height, width) or (n, height, width, 3) and visualize each (height, width) thing in a grid of size approx. sqrt(n) by sqrt(n)""" # normalize data for display data = (data - data.min()) / (data.max() - data.min()) # force the number of filters to be square n = int(np.ceil(np.sqrt(data.shape[0]))) padding = (((0, n ** 2 - data.shape[0]), (0, 1), (0, 1)) # add some space between filters + ((0, 0),) * (data.ndim - 3)) # don't pad the last dimension (if there is one) data = np.pad(data, padding, mode='constant', constant_values=1) # pad with ones (white) # tile the filters into an image data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1))) data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:]) plt.imshow(data); plt.axis('off') First we'll look at the first layer filters, conv1 In [16]: # the parameters are a list of [weights, biases] filters = net.params['conv1'][0].data vis_square(filters.transpose(0, 2, 3, 1)) The first layer output, conv1 (rectified responses of the filters above, first 36 only) In [17]: feat = net.blobs['conv1'].data[0, :36] vis_square(feat) The fifth layer after pooling, pool5 In [18]: feat = net.blobs['pool5'].data[0] vis_square(feat)
The first fully connected layer, fc6 (rectified)
We show the output values and the histogram of the positive values
In [19]: feat = net.blobs['fc6'].data[0] plt.subplot(2, 1, 1) plt.plot(feat.flat) plt.subplot(2, 1, 2) _ = plt.hist(feat.flat[feat.flat > 0], bins=100) The final probability output, prob In [20]: feat = net.blobs['prob'].data[0] plt.figure(figsize=(15, 3)) plt.plot(feat.flat) Out[20]: [<matplotlib.lines.Line2D at 0x7f09587dfb50>]Note the cluster of strong predictions; the labels are sorted semantically. The top peaks correspond to the top predicted labels, as shown above.
Now we'll grab an image from the web and classify it using the steps above.
Try setting my_image_url to any JPEG image URL. In [ ]: # download an image my_image_url = "..." # paste your URL here # for example: # my_image_url = "https://upload.wikimedia.org/wikipedia/commons/b/be/Orang_Utan,_Semenggok_Forest_Reserve,_Sarawak,_Borneo,_Malaysia.JPG" !wget -O image.jpg $my_image_url # transform it and copy it into the net image = caffe.io.load_image('image.jpg') net.blobs['data'].data[...] = transformer.preprocess('data', image) # perform classification net.forward() # obtain the output probabilities output_prob = net.blobs['prob'].data[0] # sort top five predictions from softmax output top_inds = output_prob.argsort()[::-1][:5] plt.imshow(image) print 'probabilities and labels:' zip(output_prob[top_inds], labels[top_inds])