CMD + K

datasets

Anaconda Verified

HuggingFace community-driven open-source library of datasets

Installation

To install this package, run one of the following:

Conda
$conda install main::datasets

Usage Tracking

4.4.1
3.3.2
2.19.1
2.12.0
2.10.1
5 / 8 versions selected
Total downloads: 0

Description

Datasets is a lightweight library providing two main features:

  • one-line dataloaders for many public datasets: one-liners to download and pre-process any of the number of datasets major public datasets (text datasets in 467 languages and dialects, image datasets, audio datasets, etc.) provided on the HuggingFace Datasets Hub. With a simple command like squaddataset = loaddataset("squad"), get any of these datasets ready to use in a dataloader for training/evaluating a ML model (Numpy/Pandas/PyTorch/TensorFlow/JAX),
  • efficient data pre-processing: simple, fast and reproducible data pre-processing for the above public datasets as well as your own local datasets in CSV/JSON/text/PNG/JPEG/etc. With simple commands like processed_dataset = dataset.map(process_example), efficiently prepare the dataset for inspection and ML model evaluation and training.

About

Summary

HuggingFace community-driven open-source library of datasets

Information Last Updated

Dec 5, 2025 at 13:59

License

Apache-2.0

Total Downloads

3.4K

Platforms

Linux ppc64le Version: 2.12.0
Linux aarch64 Version: 4.4.1
macOS 64 Version: 3.3.2
macOS arm64 Version: 4.4.1
Linux 64 Version: 4.4.1
Win 64 Version: 4.4.1
noarch Version: 1.12.1
Linux s390x Version: 3.3.2