CMD + K

syntok

Community

sentence segmentation and word tokenization toolkit

Installation

To install this package, run one of the following:

Conda
$conda install conda-forge::syntok

Usage Tracking

1.4.2
1.4.1
1.3.3
1.3.2
1.3.1
5 / 8 versions selected
Downloads (Last 6 months): 0

Description

Syntok is the successor of an earlier, very similar tool, segtok, but has evolved significantly in terms of providing better segmentation and tokenization performance and throughput (syntok can segment documents at a rate of about 100k tokens per second without problems). For example, if a sentence terminal marker is not followed by a spacing character, segtok is unable to detect that as a terminal marker, while syntok has no problem segmenting that case (as it uses tokenization first, and does segmentation afterwards).

About

Summary

sentence segmentation and word tokenization toolkit

Last Updated

Jan 31, 2022 at 15:06

License

MIT

Supported Platforms

noarch