This documentation is for scikit-learn version 0.11-gitOther versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.7.2.6. sklearn.feature_extraction.text.Vectorizer

class sklearn.feature_extraction.text.Vectorizer(analyzer=None, max_df=1.0, max_features=None, norm='l2', use_idf=True, smooth_idf=True, sublinear_tf=False)

Convert a collection of raw documents to a matrix

Equivalent to CountVectorizer followed by TfidfTransformer.

Methods

fit(raw_documents) Learn a conversion law from documents to array data
fit_transform(raw_documents[, y]) Learn the representation and return the vectors.
get_params([deep]) Get parameters for the estimator
inverse_transform(X) Return terms per document with nonzero entries in X.
set_params(**params) Set the parameters of the estimator.
transform(raw_documents[, copy]) Transform raw text documents to tf–idf vectors
__init__(analyzer=None, max_df=1.0, max_features=None, norm='l2', use_idf=True, smooth_idf=True, sublinear_tf=False)
fit(raw_documents)

Learn a conversion law from documents to array data

fit_transform(raw_documents, y=None)

Learn the representation and return the vectors.

Parameters :

raw_documents: iterable :

an iterable which yields either str, unicode or file objects

Returns :

vectors: array, [n_samples, n_features] :

get_params(deep=True)

Get parameters for the estimator

Parameters :

deep: boolean, optional :

If True, will return the parameters for this estimator and contained subobjects that are estimators.

inverse_transform(X)

Return terms per document with nonzero entries in X.

Parameters :

X : {array, sparse matrix}, shape = [n_samples, n_features]

Returns :

X_inv : list of arrays, len = n_samples

List of arrays of terms.

set_params(**params)

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns :self :
transform(raw_documents, copy=True)

Transform raw text documents to tf–idf vectors

Parameters :

raw_documents: iterable :

an iterable which yields either str, unicode or file objects

Returns :

vectors: sparse matrix, [n_samples, n_features] :