Affinity Propagation

    xiaoxiao2025-01-31  4

    Affinity propagation 其中两点相似度s(i, j)的度量默认采用负欧氏距离。 sklearn.cluster.AffinityPropagation 有参数preference(设定每一个点的偏好,将偏好于跟其他节点的相似性进行比较,选择 高的作为exmplar,未设定则使用所有相似性的中位数)、damping (阻尼系数, 利用阻尼系数与1-阻尼系数对r 及 a进行有关迭代步数的凸组合,使得算法收敛 default 0.5 可以取值与[0.5, 1)) cluster_centers_indices_:中心样本的指标。 利用条件熵定义的同质性度量: sklearn.metrics.homogeneity_score:每一个聚出的类仅包含一个类别的程度度量。 sklearn.metrics.completeness:每一个类别被指向相同聚出的类的程度度量。 sklearn.metrics.v_measure_score:上面两者的一种折衷:  v = 2 * (homogeneity * completeness) / (homogeneity + completeness)  可以作为聚类结果的一种度量。 sklearn.metrics.adjusted_rand_score:调整兰德系数。 sklearn.metrics.adjusted_mutual_info_score:调整互信息。 sklearn.metrics.silhouette_score:  对于一个样本点(b - a)/max(a, b)  a平均类内距离,b样本点到与其最近的非此类的距离。  silihouette_score返回的是所有样本的该值。 这些度量均是越大越好(类似于判别) 下面是例子: from sklearn.cluster import AffinityPropagation from sklearn import metrics from sklearn.datasets.samples_generator import make_blobs centers = [[1, 1], [-1, -1], [1, -1]] X, labels_true = make_blobs(n_samples = 300, centers = centers, cluster_std = 0.5, random_state = 0) af = AffinityPropagation(preference = -50).fit(X) cluster_centers_indices = af.cluster_centers_indices_ labels = af.labels_ n_clusters_ = len(cluster_centers_indices) print "Estimated number of clusters: %d" % n_clusters_ print "Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels) print "Completeness: %0.3f" % metrics.completeness_score(labels_true, labels) print "V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels) print "Adjusted Rand Index: %0.3f" % metrics.adjusted_rand_score(labels_true, labels) print "Adjusted Mutual Information: %0.3f" % metrics.adjusted_mutual_info_score(labels_true, labels) print "Silhouette Coefficiet: %0.3f" % metrics.silhouette_score(X, labels, metric = 'sqeuclidean') import matplotlib.pyplot as plt from itertools import cycle plt.close('all') plt.figure(1) plt.clf() colors = cycle('bgrcmyk') for k, col in zip(range(n_clusters_), colors):  class_members = labels == k  cluster_center = X[cluster_centers_indices[k]]  plt.plot(X[class_members, 0], X[class_members, 1], col + '.')  plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor = col, \    markeredgecolor = 'k', markersize = 14)  for x in X[class_members]:   plt.plot([cluster_center[0], x[0]], [cluster_center[1], x[1]], col) plt.title('Estimated number of clusters: %d' % n_clusters_) plt.show()
    转载请注明原文地址: https://ju.6miu.com/read-1295948.html
    最新回复(0)