8.17.3.4. sklearn.metrics.homogeneity_score¶
- sklearn.metrics.homogeneity_score(labels_true, labels_pred)¶
Homogeneity metric of a cluster labeling given a ground truth
A clustering result satisfies homogeneity if all of its clusters contain only data points which are members of a single class.
This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.
This metric is not symmetric: switching label_true with label_pred will return the completeness_score which will be different in general.
Parameters : labels_true : int array, shape = [n_samples]
ground truth class labels to be used as a reference
labels_pred : array, shape = [n_samples]
cluster labels to evaluate
Returns : homogeneity: float :
score between 0.0 and 1.0. 1.0 stands for perfectly homogeneous labeling
See also
References
[R64] Andrew Rosenberg and Julia Hirschberg V-Measure: A conditional entropy-based external cluster evaluation measure, 2007 http://acl.ldc.upenn.edu/D/D07/D07-1043.pdf Examples
Perfect labelings are homegenous:
>>> from sklearn.metrics.cluster import homogeneity_score >>> homogeneity_score([0, 0, 1, 1], [1, 1, 0, 0]) 1.0
Non-pefect labelings that futher split classes into more clusters can be perfectly homogeneous:
>>> homogeneity_score([0, 0, 1, 1], [0, 0, 1, 2]) 1.0 >>> homogeneity_score([0, 0, 1, 1], [0, 1, 2, 3]) 1.0
Clusters that include samples from different classes do not make for an homogeneous labeling:
>>> homogeneity_score([0, 0, 1, 1], [0, 1, 0, 1]) 0.0 >>> homogeneity_score([0, 0, 1, 1], [0, 0, 0, 0]) 0.0