About Anaconda Help Download Anaconda

sentence segmentation and word tokenization toolkit

copied from cf-staging / syntok

Installers

  • noarch v1.4.2

conda install

To install this package run one of the following:
conda install conda-forge::syntok

Description

Syntok is the successor of an earlier, very similar tool, segtok, but has evolved significantly in terms of providing better segmentation and tokenization performance and throughput (syntok can segment documents at a rate of about 100k tokens per second without problems). For example, if a sentence terminal marker is not followed by a spacing character, segtok is unable to detect that as a terminal marker, while syntok has no problem segmenting that case (as it uses tokenization first, and does segmentation afterwards).


© 2024 Anaconda, Inc. All Rights Reserved. (v4.0.6) Legal | Privacy Policy