.. _example_applications_plot_out_of_core_classification.py: ====================================================== Out-of-core classification of text documents ====================================================== This is an example showing how scikit-learn can be used for classification using an out-of-core approach: learning from data that doesn't fit into main memory. We make use of an online classifier, i.e., one that supports the partial_fit method, that will be fed with batches of examples. To guarantee that the features space remains the same over time we leverage a HashingVectorizer that will project each example into the same feature space. This is especially useful in the case of text classification where new features (words) may appear in each batch. The dataset used in this example is Reuters-21578 as provided by the UCI ML repository. It will be automatically downloaded and uncompressed on first run. The plot represents is the learning curve of the classifier: the evolution of classification accuracy over the course of the mini-batches. Accuracy is measured on the first 1000 samples, held out as a validation set. To limit the memory consumption, we queue examples up to a fixed amount before feeding them to the learner. **Python source code:** :download:`plot_out_of_core_classification.py ` .. literalinclude:: plot_out_of_core_classification.py :lines: 25-