线性分类器——parameter learning

xiaoxiao2021-12-14 21

本内容整理自coursera，欢迎交流转载。

1 如何确定系数矩阵

　　我们可以用likelihood l(w)来观测系数矩阵的分类质量。

2 Data likelihood

我们希望可以得到尽可能大的 l(w) 。

3 梯度上升（gradient ascend method）

　　回忆之前学过的梯度下降法，类比之我们可以得到梯度上升法的一般形式是：　　 w(t+1)←wt+ηdldw|wt

4 log-likelihood 的导数

　　 ∂l(w)∂wj=∑Ni=1hj(xi)(1[yi=+1]−P(y=+1|xi,w)) 其中：

1[yi=+1]={1 if yi=+10 if yi=−1 之后，我们描述一下算法流程：首先，init

w=0

while |Δl(wt)|>ϵ for j=0,1,2,…D partial[j]=∑Ni=1hj(xi)(1[yi=+1]−P(y=+1|xi,w)) w(t+1)←wt+ηdldw|wt t←t+1

4-** 导数公式如何得到（选学）

这里我们使用了一个性质，那就是对一个函数做对数变换，不影响函数取最大值的点，不影响我们求救最优系数矩阵。

　　首先，根据前面提到的基础代数知识，我们得到：　　 ll(w)=ln(l(w))=ln∏Ni=1P(yi|xi,w) =∑Ni=1lnP(yi|xi,w) =∑Ni=1{1[yi=+1]lnP(yi=+1|xi,w)+1[yi=−1]lnP(yi=−1|xi,w)} 　　有之前一篇博客提到过得：　　 P(y=+1|x,w)=11+e−wTh(x) 　　得知：　　 P(y=－1|x,w)=1−P(y=+1|x,w)=e−wTh(x)1+e−wTh(x) 　　由此化简 ll(w) 得对于某一 xi ：　　 ll(w)=1[yi=+1]lnP(yi=+1|xi,w)+1[yi=−1]lnP(yi=−1|xi,w) =1[yi=+1] ˙ln11+e−wTh(x)+(1−1[yi=+1]) ˙lne−wTh(x)1+e−wTh(x) =−(1−1[yi=+1])wTh(xi)−ln(1+e−wTh(x)) 　　则：　　 ∂ll∂wj=−(1−1[yi=+1])∂wTh(xi)∂wj−∂∂wjln(1+e−wTh(x)) =−(1−1[yi=+1])hj(xi)+hj(xi)P(y=−1|xi,w) =hj(xi)[1[yi=+1]−P(y=+1|xi,w)] 　　然后把所有的相加得到：　　 ∂l(w)∂wj=∑Ni=1hj(xi)(1[yi=+1]−P(y=+1|xi,w))

5 如何确定步长 η

　　 η 过小会使得收敛过慢， η 过大会导致算法震荡无法收敛。　　因此我们可以先确定一个比较大的和比较小的 η ，再从中间选择。这需要多次尝试，也需要经验和技巧。　　如果可以的话，我们可以一开始选择一个比较大的 η ，随着迭代次数的增加慢慢减小 η 。

6 代码实现

可以在这里下载代码和数据文件.

import graphlab products = graphlab.SFrame('amazon_baby_subset.gl/') import json with open('important_words.json', 'r') as f: # Reads the list of most frequent words important_words = json.load(f) important_words = [str(s) for s in important_words] #去掉标点符号 def remove_punctuation(text): import string return text.translate(None, string.punctuation) products['review_clean'] = products['review'].apply(remove_punctuation) #为每个important　words 建立一列，表示出现的次数 for word in important_words: products[word] = products['review_clean'].apply(lambda s : s.split().count(word)) #问题１ products['contains_perfect'] = products['perfect']>=1 products['contains_perfect'] sum(products['contains_perfect']) import numpy as np #转化为numpy array格式 def get_numpy_data(data_sframe, features, label): data_sframe['intercept'] = 1 features = ['intercept'] + features features_sframe = data_sframe[features] feature_matrix = features_sframe.to_numpy() label_sarray = data_sframe[label] label_array = label_sarray.to_numpy() return(feature_matrix, label_array) # Warning: This may take a few minutes... feature_matrix, sentiment = get_numpy_data(products, important_words, 'sentiment') ''' produces probablistic estimate for P(y_i = +1 | x_i, w). estimate ranges between 0 and 1. ''' import math def predict_probability(feature_matrix, coefficients): # Take dot product of feature_matrix and coefficients # YOUR CODE HERE score=np.dot(feature_matrix, coefficients) # Compute P(y_i = +1 | x_i, w) using the link function # YOUR CODE HERE predictions = 1/(1+math.e**(-score)) # return predictions return predictions def feature_derivative(errors, feature): # Compute the dot product of errors and feature derivative =np.dot(errors, feature) # Return the derivative return derivative def compute_log_likelihood(feature_matrix, sentiment, coefficients): indicator = (sentiment==+1) scores = np.dot(feature_matrix, coefficients) logexp = np.log(1. + np.exp(-scores)) # Simple check to prevent overflow mask = np.isinf(logexp) logexp[mask] = -scores[mask] lp = np.sum((indicator-1)*scores - logexp) return lp from math import sqrt def logistic_regression(feature_matrix, sentiment, initial_coefficients, step_size, max_iter): coefficients = np.array(initial_coefficients) # make sure it's a numpy array for itr in xrange(max_iter): # Predict P(y_i = +1|x_i,w) using your predict_probability() function # YOUR CODE HERE predictions = predict_probability(feature_matrix,coefficients) # Compute indicator value for (y_i = +1) indicator = (sentiment==+1) # Compute the errors as indicator - predictions errors = indicator - predictions for j in xrange(len(coefficients)): # loop over each coefficient # Recall that feature_matrix[:,j] is the feature column associated with coefficients[j]. # Compute the derivative for coefficients[j]. Save it in a variable called derivative # YOUR CODE HERE derivative = feature_derivative(errors,feature_matrix[:,j]) # add the step size times the derivative to the current coefficient ## YOUR CODE HERE coefficients[j]+=step_size*derivative # Checking whether log likelihood is increasing if itr <= 15 or (itr <= 100 and itr % 10 == 0) or (itr <= 1000 and itr % 100 == 0) \ or (itr <= 10000 and itr % 1000 == 0) or itr % 10000 == 0: lp = compute_log_likelihood(feature_matrix, sentiment, coefficients) print 'iteration %*d: log likelihood of observed labels = %.8f' % \ (int(np.ceil(np.log10(max_iter))), itr, lp) return coefficients

转载请注明原文地址: https://ju.6miu.com/read-964831.html

专利

最新回复(0)