神经网络的学习还是比较麻烦的,首先你要理解它的原理,虽然我现在还觉得挺玄幻的,不过各种公式就够推一张纸了。
第二点,神经网络是需要数据进行训练的,如果没有大量的数据,那么你的代码也就没有针对性也不能看到效果。
我也是刚开始学,书的话就看了Neural Network and Deep Learning,网上能下到中文版吧,这里面还是讲的很清晰的,看这书的时候最好把基础和细节都好好过一遍,公式也都推一遍。
具体神经网络的东西还是不要说了,反正还是书里写的清晰。
本来今天想多写一点的,但是代码就调了一天,尴尬……
最重要的是数据,在 https://github.com/csuldw/MachineLearning/tree/master/dataset/MNIST 这个github里可以下到手写识别的数据,在书里,用的二进制数据,我就不写了,我下的是mnist.zip,这个解压以后就是一堆jpg的图片了,这样直觉上可能会舒服一点,毕竟二进制数据你也看不到结果,也不知道长得什么样的数字会被识别成什么。
直接用图片好处就是你稍微改一下代码,就可以在训练以后,指定某个图片,看你的神经网络是否很强大~ 当然,你也可以把自制的图片加进去。
这篇文章的代码基本是基于上面的书里面的,我都加了注释,只要把书看明白,公式都推过,应该看起来没什么障碍,毕竟是最简单的神经网络。
写代码的时候,图片是灰度图片,刚开始直接用灰度作为输入值了,这虽然理论上是没问题的,但是中间求sigmoid函数会溢出,导致正确率和随机差不多。。。改掉就好了,输入是0~1之间的值。
如果有问题,可以留言,大家一起学习交流。
读数据的代码(read_data.py):
#! usr/bin/python #coding=utf-8 # -*- coding:cp936 -*- from PIL import ImageGrab from PIL import Image import numpy as np import os import sys import random #生成输入矩阵(28x28 -> 784x1) def createImageInput(data): n = len(data) result = [[data[i][j]/255.0] for i in range(n) for j in range(n)] return np.array(result) #生成输出矩阵 def createResultOutput(value): result = [[0] for i in range(10)] result[int(value)][0] = 1 return np.array(result) def readDataFromImage(dir_path): fileSet = os.walk(dir_path) total = 0 count = 0 temp_data = [] for path,d,filelist in fileSet: total = len(filelist) print "Reading Data:" for filename in filelist: fullPath = os.path.join(path, filename) im = Image.open(fullPath) im_array = np.array(im) temp = filename.split('.') x = createImageInput(im_array) y = createResultOutput(temp[0]) temp_data.append((x,y)) count = count + 1 sys.stdout.write("\b\b\b\b\b\b\b\b\b\b\b{0}/{1}".format(count,total)) sys.stdout.flush() if count >= total: break print "" # print temp_data random.shuffle(temp_data) train_count = int(total*0.8) # print train_count test_data = temp_data[train_count:total] train_data = temp_data[:train_count] n_count = [0 for i in range(10)] for x,y in train_data: for v in range(10): if y[v] == 1: n_count[v] = n_count[v] + 1 # print len(test_data) # print len(train_data) print "over" return train_data, test_data if __name__ == "__main__": readDataFromImage('mnist')神经网络代码(network.py):
#! usr/bin/python #coding=utf-8 # -*- coding:cp936 -*- import read_data import numpy as np import random import read_data #神经网络结构类 class Network(object): #初始化函数 #num_layers : 神经网络每层的节点数 #sizes : 神经网络层数 #biases : 每层网络的偏置矩阵 #weights : 每层网络的权重矩阵 def __init__(self, sizes): self.num_layers = len(sizes) self.sizes = sizes self.biases = [np.random.randn(y,1) for y in sizes[1:]] self.weights = [np.random.randn(y,x) for x,y in zip(sizes[:-1], sizes[1:])] # for b,w in zip(self.biases, self.weights): # print "s b:" + str(b.shape) # print "s w:" + str(w.shape) #输出函数,对于给定网络的输入,返回对应输出 def feedforward(self, a): for b, w in zip(self.biases, self.weights): a = sigmoid(np.dot(w, a) + b) return a #梯度下降算法 # @param training_data 待训练数据 # @param epochs 迭代次数 # @param mini_batch_size 每次迭代数据数 # @param eta 学习率 # @param test_data 测试数据 def SGD(self, training_data, epochs, mini_batch_size, eta, test_data = None): n = len(training_data) for j in xrange(epochs): random.shuffle(training_data) mini_batches = [training_data[k:k + mini_batch_size] for k in xrange(0, n , mini_batch_size)] for mini_batch in mini_batches: self.update_mini_batch(mini_batch, eta) print ("Epoch {0} complete:".format(j)) #一次训练 def update_mini_batch(self, mini_batch, eta): nabla_b = [np.zeros(b.shape) for b in self.biases] nabla_w = [np.zeros(w.shape) for w in self.weights] for x, y in mini_batch: delta_b , delta_w = self.backprop(x, y) nabla_b = [(nb + dnb) for nb, dnb in zip(nabla_b, delta_b)] nabla_w = [(nw + dnw) for nw, dnw in zip(nabla_w, delta_w)] self.weights = [w - (eta / len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)] self.biases = [b - (eta / len(mini_batch))*nb for b, nb in zip(self.biases,nabla_b)] #反向传播算法 # @param x 输入激活值 # @param y 对应输出 # @return nabla_b, nabla_w 偏置与权重矩阵的误差 def backprop(self, x, y): #保存结果矩阵 nabla_b = [np.zeros(b.shape) for b in self.biases] nabla_w = [np.zeros(w.shape) for w in self.weights] activation = x activations = [x] #每层的输出激活值结果 zvalue = [] #每层的神经元带权输入 z = w*a + b #根据当前输入值,计算每层的输出值和带权输入 for b, w in zip(self.biases, self.weights): z = (np.dot(w, activation)) + b zvalue.append(z) activation = sigmoid(z) activations.append(activation) #先用公式(BP1)算出最后一层的误差 delta = self.cost_derivative(activations[-1] , y) * sigmoid_derivative(zvalue[-1]) nabla_b[-1] = delta nabla_w[-1] = np.dot(delta, activations[-2].transpose()) #使用(BP2)公式,反推各层误差值 for layer in xrange(2, self.num_layers): z = zvalue[-layer] sp = sigmoid_derivative(z) delta = np.dot(self.weights[-layer + 1].transpose(), delta) * sp nabla_b[-layer] = delta nabla_w[-layer] = np.dot(delta, activations[-layer - 1].transpose()) return (nabla_b, nabla_w) #代价函数求导 def cost_derivative(self, output, y): return (output - y) #计算测试结果 def evaluete(self, test_data): test_result = [(np.argmax(self.feedforward(x)),y) for (x,y) in test_data] return sum(int(y[x]) for (x,y) in test_result) #S型函数 def sigmoid(z): return 1.0/(1 + np.exp(-z)) #S型函数求导 def sigmoid_derivative(z): return sigmoid(z) * (1 - sigmoid(z)) if __name__ == "__main__": train_data, test_data = read_data.readDataFromImage("mnist") net = Network([784, 30, 10]) net.SGD(train_data,30,10,3.0) test_res = net.evaluete(test_data) # print test_res print test_res*1.0 / len(test_data)