CMD + K

sentencepiece

Community

An unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems.

Installation

To install this package, run one of the following:

Conda
$conda install rocketce::sentencepiece

Usage Tracking

0.1.99
0.1.97
0.1.96
3 / 8 versions selected
Downloads (Last 6 months): 0

Description

SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements subword units (e.g., byte-pair-encoding (BPE) [Sennrich et al.]) and unigram language model [Kudo.]) with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing.

About

Summary

An unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems.

Last Updated

Jul 12, 2024 at 11:18

License

Apache License 2.0

Total Downloads

6.6K

Supported Platforms

linux-ppc64le