Coursera ML笔记5

xiaoxiao2021-03-25 63

分类

Binaryclassification : K=1 ouutput unitMulti-class classification : K output unit(K>=3)

cost function

L = total number of layers in the network

sl = number of units (not counting bias unit) in layer lK = number of output units/classes

logistic regression: J(θ)=−1m∑mi=1[y(i) log(hθ(x(i)))+(1−y(i)) log(1−hθ(x(i)))]+λ2m∑nj=1θ2j neural networks:

J(Θ)=−1m∑i=1m∑k=1K[y(i)klog((hΘ(x(i)))k)+(1−y(i)k)log(1−(hΘ(x(i)))k)]+λ2m∑l=1L−1∑i=1sl∑j=1sl+1(Θ(l)j,i)2

Forwardpropagation Algorithm

Backpropagation Algorithm

calculation

For training example t =1 to m: 1. Set a(1):=x(t) 2. Perform forward propagation to compute a(l) for l=2,3,…,L 3. Using y(t) , compute δ(L)=a(L)−y(t) 4. Compute δ(L−1),δ(L−2),…,δ(2) using δ(l)=((Θ(l))Tδ(l+1)) .∗ g′(z(l))=((Θ(l))Tδ(l+1)) .∗ a(l) .∗ (1−a(l)) 5. Δ(l)i,j:=Δ(l)i,j+a(l)jδ(l+1)i or with vectorization, Δ(l):=Δ(l)+δ(l+1)(a(l))T Hence we update our new Δ matrix. - D(l)i,j:=1m(Δ(l)i,j+λΘ(l)i,j) - D(l)i,j:=1mΔ(l)i,j ∂∂Θ(l)ijJ(Θ)=D(l)ij

cost function

cost(t)=y(t) log(hΘ(x(t)))+(1−y(t)) log(1−hΘ(x(t))) δ(l)j=∂∂z(l)jcost(t)

Gradient Checking

∂∂ΘJ(Θ)≈J(Θ+ϵ)−J(Θ−ϵ)2ϵ ϵ≈10−4 ∂∂ΘjJ(Θ)≈J(Θ1,…,Θj+ϵ,…,Θn)−J(Θ1,…,Θj−ϵ,…,Θn)2ϵ

Random Initialization

Hence, we initialize each Θ(l)ij to a random value between[−ϵ,ϵ].

If the dimensions of Theta1 is 10x11, Theta2 is 10x11 and Theta3 is 1x11. Theta1 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON; Theta2 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON; Theta3 = rand(1,11) * (2 * INIT_EPSILON) - INIT_EPSILON;

Training a neural network

Number of input units = dimension of features x(i)Number of output units = number of classesNumber of hidden units per layer = usually more the better (must balance with cost of computation as it increases with more hidden units)

Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer.

Randomly initialize the weightsImplement forward propagation to get

hΘ(x(i)) for any

x(i) Implement the cost functionImplement backpropagation to compute partial derivativesUse gradient checking to confirm that your backpropagation works. Then disable gradient checking.Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta.

转载请注明原文地址: https://ju.6miu.com/read-14594.html

技术

最新回复(0)