transformers
model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
To install this package, run one of the following:
Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.
These models can be applied on:
Transformer models can also perform tasks on several modalities combined, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
Summary
model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Last Updated
Mar 30, 2026 at 19:50
License
Apache-2.0
Supported Platforms
Unsupported Platforms
GitHub Repository
https://github.com/huggingface/transformersDocumentation
https://huggingface.co/docs/transformers/index