Datasets is a lightweight library providing two main features: - one-line dataloaders for many public datasets: one-liners to download and pre-process any of the number of datasets major public datasets (text datasets in 467 languages and dialects, image datasets, audio datasets, etc.) provided on the HuggingFace Datasets Hub. With a simple command like squad_dataset = load_dataset("squad"), get any of these datasets ready to use in a dataloader for training/evaluating a ML model (Numpy/Pandas/PyTorch/TensorFlow/JAX), - efficient data pre-processing: simple, fast and reproducible data pre-processing for the above public datasets as well as your own local datasets in CSV/JSON/text/PNG/JPEG/etc. With simple commands like `processed_dataset = dataset.map(process_example)`, efficiently prepare the dataset for inspection and ML model evaluation and training.
| Uploaded | Mon Mar 31 21:27:28 2025 |
| md5 checksum | 49edab3bdae5e7a9bf29f1066dec4788 |
| arch | x86_64 |
| build | py310h06a4308_0 |
| constrains | pillow >=9.4.0, jaxlib >=0.3.14, jax >=0.3.14, soundfile >=0.12.1, tensorflow-base >=2.6.0, soxr >=5.1, apache-beam >=2.26.0 |
| depends | aiohttp, dill >=0.3.0,<0.3.9, filelock, fsspec >=2023.1.0,<=2024.12.0, huggingface_hub >=0.24.0, multiprocess <0.70.17, numpy >=1.17, packaging, pandas, pyarrow >=15.0.0, python >=3.10,<3.11.0a0, python-xxhash, pyyaml >=5.1, requests >=2.32.2, tqdm >=4.66.3 |
| license | Apache-2.0 |
| license_family | Apache |
| md5 | 49edab3bdae5e7a9bf29f1066dec4788 |
| name | datasets |
| platform | linux |
| sha1 | 45999fbb2d9d8a5087aac54dce9f480625673357 |
| sha256 | cd47db2a02ee2f71e806e8bbcfa482361a416c1eb44b780dfc7d51b5030ff4c5 |
| size | 666467 |
| subdir | linux-64 |
| timestamp | 1741368138624 |
| version | 3.3.2 |