About Anaconda Help Download Anaconda

conda-forge / packages / sentencepiece 0.2.0

Unsupervised text tokenizer for Neural Network-based text generation.

copied from cf-staging / sentencepiece

Installers

  • linux-64 v0.2.0
  • osx-arm64 v0.2.0
  • linux-aarch64 v0.2.0
  • linux-ppc64le v0.2.0
  • win-64 v0.2.0
  • osx-64 v0.2.0

conda install

To install this package run one of the following:
conda install conda-forge::sentencepiece

Description

SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training.

SentencePiece implements subword units (e.g., byte-pair-encoding (BPE) [Sennrich et al.]) and unigram language model [Kudo]) with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing.


© 2024 Anaconda, Inc. All Rights Reserved. (v4.0.1) Legal | Privacy Policy