linear regression(4)-normal equation compare with gradient descent

xiaoxiao2021-03-25 80

Normal Equation

Gradient descent gives one way of minimizing J.Here is an another wayof doing so.

θ=(XTX)−1XTy

There is no need to do feature scaling with the normal equation.

The following is a comparison of gradient descent and the normal equation:

Gradient DescentNormal EquationNeed to choose alphaNo need to choose alphaNeeds many iterationsNo need to iterateO (kn2)O (n3), need to calculate inverse of XTXWorks well when n is largeSlow if n is very large

With the normal equation, computing the inversion has complexity O(n3). So if we have a very large number of features, the normal equation will be slow. In practice, when n exceeds 10,000 it might be a good time to go from a normal solution to an iterative process.

Normal Equation Noninvertibility

When implementing the normal equation in octave we want to use the 'pinv' function rather than 'inv.' The 'pinv' function will give you a value of θ even if XTX is not invertible.

If XTX is noninvertible, the common causes might be having :

Redundant features, where two features are very closely related (i.e. they are linearly dependent)Too many features (e.g. m ≤ n). In this case, delete some features or use "regularization" (to be explained in a later lesson).

Solutions to the above problems include deleting a feature that is linearly dependent with another or deleting one or more features when there are too many features.

转载请注明原文地址: https://ju.6miu.com/read-16410.html

技术

最新回复(0)