SentencePiece implements subword units (e.g., byte-pair-encoding (BPE) [Sennrich et al.]) and unigram language model [Kudo.]) with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing.

Uploaded	Mon Mar 31 02:25:49 2025
md5 checksum	54d65074e36f5e0f66fdbc28b6924800
arch	x86_64
build	py39hd09550d_0
depends	libgcc-ng >=7.5.0, libstdcxx-ng >=7.5.0, python >=3.9,<3.10.0a0
license	Apache-2.0
license_family	Apache
md5	54d65074e36f5e0f66fdbc28b6924800
name	sentencepiece
platform	linux
sha1	44df3e7845c8ed02790304a57e8f87b9a36eb554
sha256	6e3fae5b9153c5ae91a4b7778dc74a460b0af6ea5456889002aa4ff0a294eaa2
size	2693190
subdir	linux-64
timestamp	1635181246668
version	0.1.95

linux-64/sentencepiece-0.1.95-py39hd09550d_0.conda