pyriemann.clustering.Kmeans¶
- class pyriemann.clustering.Kmeans(n_clusters=2, max_iter=100, metric='riemann', random_state=None, init='random', n_init=10, n_jobs=1, tol=0.0001)¶
Clustering by k-means with SPD matrices as inputs.
Find clusters that minimize the sum of squared distance to their centroids. This is a direct implementation of the k-means algorithm with a Riemannian metric.
- Parameters
- n_clusterint, default=2
Number of clusters.
- max_iterint, default=100
The maximum number of iteration to reach convergence.
- metricstring, default=’riemann’
The type of metric used for centroid and distance estimation.
- random_stateinteger or np.RandomState, optional
The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
- init‘random’ or ndarray, shape (n_clusters, n_channels, n_channels), default=’random’
Method for initialization of centers. ‘random’: choose k observations (rows) at random from data for the initial centroids. If an ndarray is passed, it should be of shape (n_clusters, n_channels, n_channels) and gives the initial centers.
- n_initint, default=10
Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
- n_jobsint, default=1
The number of jobs to use for the computation. This works by computing each of the n_init runs in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
- tolfloat, default=1e-4
The stopping criterion to stop convergence, representing the minimum amount of change in labels between two iterations.
See also
Kmeans
MDM
Notes
New in version 0.2.2.
- Attributes
- mdm_MDM instance.
MDM instance containing the centroids.
- labels_
Labels of each point.
- inertia_float
Sum of distances of samples to their closest cluster center.
- __init__(n_clusters=2, max_iter=100, metric='riemann', random_state=None, init='random', n_init=10, n_jobs=1, tol=0.0001)¶
Init.
- centroids()¶
Helper for fast access to the centroid.
- Returns
- centroidslist of SPD matrices, len (n_cluster)
Return a list containing the centroid of each cluster.
- fit(X, y=None)¶
Fit (estimates) the clusters.
- Parameters
- Xndarray, shape (n_matrices, n_channels, n_channels)
Set of SPD matrices.
- yndarray, shape (n_matrices,) | None, default=None
Not used, here for compatibility with sklearn API.
- Returns
- selfKmeans instance
The Kmeans instance.
- fit_predict(X, y=None)¶
Perform clustering on X and returns cluster labels.
- Parameters
- Xarray-like of shape (n_samples, n_features)
Input data.
- yIgnored
Not used, present for API consistency by convention.
- Returns
- labelsndarray of shape (n_samples,), dtype=np.int64
Cluster labels.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- predict(X)¶
Get the predictions.
- Parameters
- Xndarray, shape (n_matrices, n_channels, n_channels)
Set of SPD matrices.
- Returns
- predndarray of int, shape (n_matrices,)
Prediction for each matrix according to the closest centroid.
- score(X, y, sample_weight=None)¶
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns
- scorefloat
Mean accuracy of
self.predict(X)
wrt. y.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.
- transform(X)¶
Get the distance to each centroid.
- Parameters
- Xndarray, shape (n_matrices, n_channels, n_channels)
Set of SPD matrices.
- Returns
- distndarray, shape (n_matrices, n_cluster)
The distance to each centroid according to the metric.