sklearn.cluster.Minibatch

    xiaoxiao2021-12-14  39

    Algorithm:

    http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

    The motivation behind this method is that mini-batches tend to have lower stochastic noise than individual examples in SGD

     (allowing conver-gence to better solutions) but do not suffer increased com-putational cost when data sets grow large with redundant examples.

    Use bootstrap sample method,even if a sample appear for twice.Important is the update of centers--c.

    from sklearn.cluster import MiniBatchKMeans

    parameters:

    n_clusters : int, optional, default: 8

    The number of clusters to form as well as the number of centroids to generate.

    max_iter : int, optional

    Maximum number of iterations over the complete dataset before stopping independently of any early stopping criterion heuristics.

    max_no_improvement : int, default: 10

    Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia.

    To disable convergence detection based on inertia, set max_no_improvement to None.

    tol : float, default: 0.0

    Control early stopping based on the relative center changes as measured by a smoothed, variance-normalized of the mean center squared position changes. This early stopping heuristics is closer to the one used for the batch variant of the algorithms but induces a slight computational and memory overhead over the inertia heuristic.

    To disable convergence detection based on normalized center change, set tol to 0.0 (default).

    batch_size : int, optional, default: 100

    Size of the mini batches.

    init_size : int, optional, default: 3 * batch_size

    Number of samples to randomly sample for speeding up the initialization (sometimes at the expense of accuracy): the only algorithm is initialized by running a batch KMeans on a random subset of the data. This needs to be larger than n_clusters.

    init : {‘k-means++’, ‘random’ or an ndarray}, default: ‘k-means++’

    Method for initialization, defaults to ‘k-means++’:

    ‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.

    ‘random’: choose k observations (rows) at random from data for the initial centroids.

    If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

    n_init : int, default=3

    Number of random initializations that are tried. In contrast to KMeans, the algorithm is only run once, using the best of the n_init initializations as measured by inertia.

    compute_labels : boolean, default=True

    Compute label assignment and inertia for the complete dataset once the minibatch optimization has converged in fit.

    random_state : integer or numpy.RandomState, optional

    The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.

    reassignment_ratio : float, default: 0.01

    Control the fraction of the maximum number of counts for a center to be reassigned. A higher value means that low count centers are more easily reassigned, which means that the model will take longer to converge, but should converge in a better clustering.

    verbose : boolean, optional

    Verbosity mode.

    attributes:

    cluster_centers_ : array, [n_clusters, n_features]

    Coordinates of cluster centers

    labels_ : :

    Labels of each point (if compute_labels is set to True).

    inertia_ : float

    The value of the inertia criterion associated with the chosen partition (if compute_labels is set to True). The inertia is defined as the sum of square distances of samples to their nearest neighbor.

    methods:

    fit(X[, y]) Compute the centroids on X by chunking it into mini-batches. fit_predict(X[, y]) Compute cluster centers and predict cluster index for each sample. fit_transform(X[, y]) Compute clustering and transform X to cluster-distance space. get_params([deep]) Get parameters for this estimator. partial_fit(X[, y]) Update k means estimate on a single mini-batch X. predict(X) Predict the closest cluster each sample in X belongs to. score(X[, y]) Opposite of the value of X on the K-means objective. set_params(\*\*params) Set the parameters of this estimator. transform(X[, y]) Transform X to a cluster-distance space.

    转载请注明原文地址: https://ju.6miu.com/read-963468.html

    最新回复(0)