Datasets is a lightweight library providing two main features: - one-line dataloaders for many public datasets: one-liners to download and pre-process any of the number of datasets major public datasets (text datasets in 467 languages and dialects, image datasets, audio datasets, etc.) provided on the HuggingFace Datasets Hub. With a simple command like squad_dataset = load_dataset("squad"), get any of these datasets ready to use in a dataloader for training/evaluating a ML model (Numpy/Pandas/PyTorch/TensorFlow/JAX), - efficient data pre-processing: simple, fast and reproducible data pre-processing for the above public datasets as well as your own local datasets in CSV/JSON/text/PNG/JPEG/etc. With simple commands like `processed_dataset = dataset.map(process_example)`, efficiently prepare the dataset for inspection and ML model evaluation and training.
Uploaded | Mon Mar 31 21:27:31 2025 |
md5 checksum | 977f3d4dba8c72a0a0bccc1f865176fe |
arch | x86_64 |
build | py312h06a4308_0 |
constrains | pillow >=6.2.1, apache-beam >=2.26.0, soundfile >=0.12.1, jax >=0.3.14, jaxlib >=0.3.14, tensorflow-base >=2.6.0 |
depends | aiohttp, dill >=0.3.0,<0.3.9, filelock, fsspec >=2023.1.0,<=2024.3.1, huggingface_hub >=0.21.2, multiprocess, numpy >=1.17,<2.0a0, packaging, pandas, pyarrow >=12.0.0, python >=3.12,<3.13.0a0, python-xxhash, pyyaml >=5.1, requests >=2.19.0, tqdm >=4.62.1 |
license | Apache-2.0 |
license_family | Apache |
md5 | 977f3d4dba8c72a0a0bccc1f865176fe |
name | datasets |
platform | linux |
sha1 | 867eef824435ba0d7f9a24662f3294a0673f4367 |
sha256 | 12a17204c362faf74fd5b3c5554c05d56546cf6145c84dc34539c1de6192ac4a |
size | 948112 |
subdir | linux-64 |
timestamp | 1716911797269 |
version | 2.19.1 |