CMD + K

tweet-preprocessor

Community

Installation

To install this package, run one of the following:

Pip
$pip install -i https://pypi.anaconda.org/berber/simple tweet-preprocessor

Usage Tracking

0.5.0
1 / 8 versions selected
Downloads (Last 6 months): 0

Description

Preprocessor ============

    .. image:: https://travis-ci.org/s/preprocessor.svg?branch=master

    Preprocessor is a preprocessing library for tweet data written in Python.

    When building Machine Learning systems based on tweet data, a preprocessing is required. This library makes it easy to clean, parse or tokenize the tweets.

    Features
    ========
    Currently supports cleaning, tokenizing and parsing:

    - URLs
    - Hashtags
    - Mentions
    - Reserved words (RT, FAV)
    - Emojis
    - Smileys

    Supports Python 2.7 and 3.3+

    Usage
    =====

    Basic cleaning:
    ^^^^^^^^^^^^^^^

    .. code-block:: python

        >>> import preprocessor as p
        >>> p.clean('Preprocessor is #awesome 👍 https://github.com/s/preprocessor')
        'Preprocessor is'

    Tokenizing:
    ^^^^^^^^^^^

    .. code-block:: python

        >>> p.tokenize('Preprocessor is #awesome 👍 https://github.com/s/preprocessor')
        'Preprocessor is $HASHTAG$ $EMOJI$ $URL$'

    Parsing:
    ^^^^^^^^

    .. code-block:: python

        >>> parsed_tweet = p.parse('Preprocessor is #awesome https://github.com/s/preprocessor')
        <preprocessor.parse.ParseResult instance at 0x10f430758>
        >>> parsed_tweet.urls
        [(25:58) => https://github.com/s/preprocessor]
        >>> parsed_tweet.urls[0].start_index
        25
        >>> parsed_tweet.urls[0].match
        'https://github.com/s/preprocessor'
        >>> parsed_tweet.urls[0].end_index
        58

    Fully customizable:
    ^^^^^^^^^^^^^^^^^^^

    .. code-block:: python

        >>> p.set_options(p.OPT.URL, p.OPT.EMOJI)
        >>> p.clean('Preprocessor is #awesome 👍 https://github.com/s/preprocessor')
        'Preprocessor is #awesome'

    Preprocessor will go through all of the options by default unless you specify some options.

    Available Options:
    ^^^^^^^^^^^^^^^^^^
    ==============  ======================
    Option Name     Option Short Code
    ==============  ======================
    URL             :code:`p.OPT.URL`
    Mention         :code:`p.OPT.MENTION`
    Hashtag         :code:`p.OPT.HASHTAG`
    Reserved Words  :code:`p.OPT.RESERVED`
    Emoji           :code:`p.OPT.EMOJI`
    Smiley          :code:`p.OPT.SMILEY`
    Number          :code:`p.OPT.NUMBER`
    ==============  ======================


    Installation
    ===================
    using pip:

    .. code-block:: bash

        $ pip install tweet-preprocessor

    using manual installation:

    .. code-block:: bash

        $ python setup.py build
        $ python setup.py install

About

Last Updated

May 7, 2019 at 13:34

Supported Platforms

noarch