sklearn.cluster.MeanShift¶
- class sklearn.cluster.MeanShift(bandwidth=None, seeds=None, bin_seeding=False, min_bin_freq=1, cluster_all=True)¶
- MeanShift clustering - Parameters : - bandwidth : float, optional - Bandwidth used in the RBF kernel. - If not given, the bandwidth is estimated using sklearn.cluster.estimate_bandwidth; see the documentation for that function for hints on scalability (see also the Notes, below). - seeds : array [n_samples, n_features], optional - Seeds used to initialize kernels. If not set, the seeds are calculated by clustering.get_bin_seeds with bandwidth as the grid size and default values for other parameters. - bin_seeding : boolean, optional - If true, initial kernel locations are not locations of all points, but rather the location of the discretized version of points, where points are binned onto a grid whose coarseness corresponds to the bandwidth. Setting this option to True will speed up the algorithm because fewer seeds will be initialized. default value: False Ignored if seeds argument is not None. - min_bin_freq : int, optional - To speed up the algorithm, accept only those bins with at least min_bin_freq points as seeds. If not defined, set to 1. - cluster_all : boolean, default True - If true, then all points are clustered, even those orphans that are not within any kernel. Orphans are assigned to the nearest kernel. If false, then orphans are given cluster label -1. - Notes - Scalability: - Because this implementation uses a flat kernel and a Ball Tree to look up members of each kernel, the complexity will is to O(T*n*log(n)) in lower dimensions, with n the number of samples and T the number of points. In higher dimensions the complexity will tend towards O(T*n^2). - Scalability can be boosted by using fewer seeds, for example by using a higher value of min_bin_freq in the get_bin_seeds function. - Note that the estimate_bandwidth function is much less scalable than the mean shift algorithm and will be the bottleneck if it is used. - References - Dorin Comaniciu and Peter Meer, “Mean Shift: A robust approach toward feature space analysis”. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002. pp. 603-619. - Attributes - cluster_centers_ - array, [n_clusters, n_features] - Coordinates of cluster centers. - labels_ : - Labels of each point. - Methods - fit(X) - Perform clustering. - fit_predict(X[, y]) - Performs clustering on X and returns cluster labels. - get_params([deep]) - Get parameters for this estimator. - predict(X) - Predict the closest cluster each sample in X belongs to. - set_params(**params) - Set the parameters of this estimator. - __init__(bandwidth=None, seeds=None, bin_seeding=False, min_bin_freq=1, cluster_all=True)¶
 - fit(X)¶
- Perform clustering. - Parameters : - X : array-like, shape=[n_samples, n_features] - Samples to cluster. 
 - fit_predict(X, y=None)¶
- Performs clustering on X and returns cluster labels. - Parameters : - X : ndarray, shape (n_samples, n_features) - Input data. - Returns : - y : ndarray, shape (n_samples,) - cluster labels 
 - get_params(deep=True)¶
- Get parameters for this estimator. - Parameters : - deep: boolean, optional : - If True, will return the parameters for this estimator and contained subobjects that are estimators. - Returns : - params : mapping of string to any - Parameter names mapped to their values. 
 - predict(X)¶
- Predict the closest cluster each sample in X belongs to. - Parameters : - X : {array-like, sparse matrix}, shape = [n_samples, n_features] - New data to predict. - Returns : - labels : array, shape [n_samples,] - Index of the cluster each sample belongs to. 
 - set_params(**params)¶
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. - Returns : - self : 
 
 
        