This documentation is for scikit-learn version 0.11-gitOther versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.17.3.5. sklearn.metrics.completeness_score

sklearn.metrics.completeness_score(labels_true, labels_pred)

Completeness metric of a cluster labeling given a ground truth

A clustering result satisfies completeness if all the data points that are members of a given class are elements of the same cluster.

This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.

This metric is not symmetric: switching label_true with label_pred will return the homogeneity_score which will be different in general.

Parameters :

labels_true : int array, shape = [n_samples]

ground truth class labels to be used as a reference

labels_pred : array, shape = [n_samples]

cluster labels to evaluate

Returns :

completeness: float :

score between 0.0 and 1.0. 1.0 stands for perfectly complete labeling

References

[R63]Andrew Rosenberg and Julia Hirschberg V-Measure: A conditional entropy-based external cluster evaluation measure, 2007 http://acl.ldc.upenn.edu/D/D07/D07-1043.pdf

Examples

Perfect labelings are complete:

>>> from sklearn.metrics.cluster import completeness_score
>>> completeness_score([0, 0, 1, 1], [1, 1, 0, 0])
1.0

Non-pefect labelings that assign all classes members to the same clusters are still complete:

>>> completeness_score([0, 0, 1, 1], [0, 0, 0, 0])
1.0
>>> completeness_score([0, 1, 2, 3], [0, 0, 1, 1])
1.0

If classes members are splitted across different clusters, the assignment cannot be complete:

>>> completeness_score([0, 0, 1, 1], [0, 1, 0, 1])
0.0
>>> completeness_score([0, 0, 0, 0], [0, 1, 2, 3])
0.0