Coursera ML笔记5

    xiaoxiao2021-03-25  63

    分类

    Binaryclassification : K=1 ouutput unitMulti-class classification : K output unit(K>=3)

    cost function

    L = total number of layers in the network sl = number of units (not counting bias unit) in layer lK = number of output units/classes

    logistic regression: J(θ)=1mmi=1[y(i) log(hθ(x(i)))+(1y(i)) log(1hθ(x(i)))]+λ2mnj=1θ2j neural networks:

    J(Θ)=1mi=1mk=1K[y(i)klog((hΘ(x(i)))k)+(1y(i)k)log(1(hΘ(x(i)))k)]+λ2ml=1L1i=1slj=1sl+1(Θ(l)j,i)2

    Forwardpropagation Algorithm

    Backpropagation Algorithm

    calculation

    For training example t =1 to m: 1. Set a(1):=x(t) 2. Perform forward propagation to compute a(l) for l=2,3,…,L 3. Using y(t) , compute δ(L)=a(L)y(t) 4. Compute δ(L1),δ(L2),,δ(2) using δ(l)=((Θ(l))Tδ(l+1)) . g(z(l))=((Θ(l))Tδ(l+1)) . a(l) . (1a(l)) 5. Δ(l)i,j:=Δ(l)i,j+a(l)jδ(l+1)i or with vectorization, Δ(l):=Δ(l)+δ(l+1)(a(l))T Hence we update our new Δ matrix. - D(l)i,j:=1m(Δ(l)i,j+λΘ(l)i,j) - D(l)i,j:=1mΔ(l)i,j Θ(l)ijJ(Θ)=D(l)ij

    cost function

    cost(t)=y(t) log(hΘ(x(t)))+(1y(t)) log(1hΘ(x(t))) δ(l)j=z(l)jcost(t)

    Gradient Checking

    ΘJ(Θ)J(Θ+ϵ)J(Θϵ)2ϵ ϵ104 ΘjJ(Θ)J(Θ1,,Θj+ϵ,,Θn)J(Θ1,,Θjϵ,,Θn)2ϵ

    Random Initialization

    Hence, we initialize each Θ(l)ij to a random value between[−ϵ,ϵ].

    If the dimensions of Theta1 is 10x11, Theta2 is 10x11 and Theta3 is 1x11. Theta1 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON; Theta2 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON; Theta3 = rand(1,11) * (2 * INIT_EPSILON) - INIT_EPSILON;

    Training a neural network

    Number of input units = dimension of features x(i)Number of output units = number of classesNumber of hidden units per layer = usually more the better (must balance with cost of computation as it increases with more hidden units)

    Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer.

    Randomly initialize the weightsImplement forward propagation to get hΘ(x(i)) for any x(i) Implement the cost functionImplement backpropagation to compute partial derivativesUse gradient checking to confirm that your backpropagation works. Then disable gradient checking.Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta.
    转载请注明原文地址: https://ju.6miu.com/read-14594.html

    最新回复(0)