Fork me on GitHub

sklearn.utils.resample

sklearn.utils.resample(*arrays, **options)

Resample arrays or sparse matrices in a consistent way

The default strategy implements one step of the bootstrapping procedure.

Parameters :

`*arrays` : sequence of arrays or scipy.sparse matrices with same shape[0]

replace : boolean, True by default

Implements resampling with replacement. If False, this will implement (sliced) random permutations.

n_samples : int, None by default

Number of samples to generate. If left to None this is automatically set to the first dimension of the arrays.

random_state : int or RandomState instance

Control the shuffling for reproducible behavior.

Returns :

Sequence of resampled views of the collections. The original arrays are :

not impacted. :

Examples

It is possible to mix sparse and dense arrays in the same run:

>>> X = [[1., 0.], [2., 1.], [0., 0.]]
>>> y = np.array([0, 1, 2])

>>> from scipy.sparse import coo_matrix
>>> X_sparse = coo_matrix(X)

>>> from sklearn.utils import resample
>>> X, X_sparse, y = resample(X, X_sparse, y, random_state=0)
>>> X
array([[ 1.,  0.],
       [ 2.,  1.],
       [ 1.,  0.]])

>>> X_sparse                   
<3x2 sparse matrix of type '<... 'numpy.float64'>'
    with 4 stored elements in Compressed Sparse Row format>

>>> X_sparse.toarray()
array([[ 1.,  0.],
       [ 2.,  1.],
       [ 1.,  0.]])

>>> y
array([0, 1, 0])

>>> resample(y, n_samples=2, random_state=0)
array([0, 1])
Previous
Next