Optimization: Stochastic Gradient Descent

xiaoxiao2021-03-25 74

introductionvisualizing the loss functionoptimization random searchrandom local searchfollowing the gradient computing the gradient numerically with finite differencesanalytically with calculus gradient descent

introduction

optimization: finding W minimize the loss function

visualizing the loss function

注意一点f扩展到神经网络的时候，目标函数不再是凸函数了。存在不可导点，需要用到subgradient ，

optimization

random search

尝试不同的随机权重，找到最优。

random local search

起点随机，找到一个deltaW使得loss function变小，更新W

following the gradient

梯度下降

computing the gradient

numerically with finite differences

在每个维度上取一个微小量，计算近似的梯度，可以使用centered difference formula: [f(x+h)−f(x−h)]/2h。关键是步长（learning rate）的选择 efficiency: 可能training set的维度太高，这种方法计算不是很好。

analytically with calculus

gradient check：这种方法计算出来比较可能会错，所以一般会和前一种方法的计算值进行比较。

gradient descent、

Mini-batch gradient descent。training set太大的情况下。

转载请注明原文地址: https://ju.6miu.com/read-36902.html

技术

最新回复(0)