logistic regression: J(θ)=−1m∑mi=1[y(i) log(hθ(x(i)))+(1−y(i)) log(1−hθ(x(i)))]+λ2m∑nj=1θ2j neural networks:
J(Θ)=−1m∑i=1m∑k=1K[y(i)klog((hΘ(x(i)))k)+(1−y(i)k)log(1−(hΘ(x(i)))k)]+λ2m∑l=1L−1∑i=1sl∑j=1sl+1(Θ(l)j,i)2For training example t =1 to m: 1. Set a(1):=x(t) 2. Perform forward propagation to compute a(l) for l=2,3,…,L 3. Using y(t) , compute δ(L)=a(L)−y(t) 4. Compute δ(L−1),δ(L−2),…,δ(2) using δ(l)=((Θ(l))Tδ(l+1)) .∗ g′(z(l))=((Θ(l))Tδ(l+1)) .∗ a(l) .∗ (1−a(l)) 5. Δ(l)i,j:=Δ(l)i,j+a(l)jδ(l+1)i or with vectorization, Δ(l):=Δ(l)+δ(l+1)(a(l))T Hence we update our new Δ matrix. - D(l)i,j:=1m(Δ(l)i,j+λΘ(l)i,j) - D(l)i,j:=1mΔ(l)i,j ∂∂Θ(l)ijJ(Θ)=D(l)ij
cost(t)=y(t) log(hΘ(x(t)))+(1−y(t)) log(1−hΘ(x(t))) δ(l)j=∂∂z(l)jcost(t)
∂∂ΘJ(Θ)≈J(Θ+ϵ)−J(Θ−ϵ)2ϵ ϵ≈10−4 ∂∂ΘjJ(Θ)≈J(Θ1,…,Θj+ϵ,…,Θn)−J(Θ1,…,Θj−ϵ,…,Θn)2ϵ
Hence, we initialize each Θ(l)ij to a random value between[−ϵ,ϵ].
If the dimensions of Theta1 is 10x11, Theta2 is 10x11 and Theta3 is 1x11. Theta1 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON; Theta2 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON; Theta3 = rand(1,11) * (2 * INIT_EPSILON) - INIT_EPSILON;Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer.
Randomly initialize the weightsImplement forward propagation to get hΘ(x(i)) for any x(i) Implement the cost functionImplement backpropagation to compute partial derivativesUse gradient checking to confirm that your backpropagation works. Then disable gradient checking.Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta.