CMD + K

ramp

Community

Installation

To install this package, run one of the following:

Pip
$pip install -i https://pypi.anaconda.org/saundramonroe/simple ramp

Usage Tracking

0.1.4
1 / 8 versions selected
Downloads (Last 6 months): 0

Description

Ramp - Rapid Machine Learning Prototyping =========================================

    Ramp is a python module for rapid prototyping of machine learning
    solutions. It is essentially a [pandas](http://pandas.pydata.org)
    wrapper around various python machine learning and statistics libraries
    ([scikit-learn](http://scikit-learn.org), [rpy2](http://rpy.sourceforge.net/rpy2.html), etc.),
    providing a simple, declarative syntax for
    exploring features, algorithms and transformations quickly and
    efficiently.

    Documentation: http://ramp.readthedocs.org

    **Why Ramp?**

     *  **Clean, declarative syntax**

        No more hackish one-off spaghetti scripts!

     *  **Complex feature transformations**

        Chain and combine features:
    <div class="codehilite">
    <pre><span></span><code><span class="n">Normalize</span><span class="p">(</span><span class="n">Log</span><span class="p">(</span><span class="s1">&#39;x&#39;</span><span class="p">))</span>
    <span class="n">Interactions</span><span class="p">([</span><span class="n">Log</span><span class="p">(</span><span class="s1">&#39;x1&#39;</span><span class="p">),</span> <span class="p">(</span><span class="n">F</span><span class="p">(</span><span class="s1">&#39;x2&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="n">F</span><span class="p">(</span><span class="s1">&#39;x3&#39;</span><span class="p">))</span> <span class="o">/</span> <span class="mi">2</span><span class="p">])</span>
    </code></pre>
    </div>

        Reduce feature dimension:
    <div class="codehilite">
    <pre><span></span><code><span class="n">DimensionReduction</span><span class="p">([</span><span class="n">F</span><span class="p">(</span><span class="s1">&#39;x</span><span class="si">%d</span><span class="s1">&#39;</span><span class="o">%</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100</span><span class="p">)],</span> <span class="n">decomposer</span><span class="o">=</span><span class="n">PCA</span><span class="p">(</span><span class="n">n_components</span><span class="o">=</span><span class="mi">3</span><span class="p">))</span>
    </code></pre>
    </div>

        Incorporate residuals or predictions to blend with other models:
    <div class="codehilite">
    <pre><span></span><code><span class="n">Residuals</span><span class="p">(</span><span class="n">config_model1</span><span class="p">)</span> <span class="o">+</span> <span class="n">Predictions</span><span class="p">(</span><span class="n">config_model2</span><span class="p">)</span>
    </code></pre>
    </div>

        Any feature that uses the target ("y") variable will automatically respect the
        current training and test sets.


     *  **Caching**

        Ramp caches and stores on disk in fast HDF5 format (or elsewhere if you want) all features and models it
        computes, so nothing is recomputed unnecessarily. Results are stored 
        and can be retrieved, compared, blended, and reused between runs.

     *  **Easy extensibility**

        Ramp has a simple API, allowing you to plug in estimators from
        scikit-learn, rpy2 and elsewhere, or easily build your own feature
        transformations, metrics, feature selectors, reporters, or estimators.


    ## Quick start
    [Getting started with Ramp: Classifying insults](http://www.kenvanharen.com/2012/11/getting-started-with-ramp-detecting.html)

    Or, the quintessential Iris example:
    <div class="codehilite">
    <pre><span></span><code>    <span class="kn">import</span><span class="w"> </span><span class="nn">pandas</span>
        <span class="kn">from</span><span class="w"> </span><span class="nn">ramp</span><span class="w"> </span><span class="kn">import</span> <span class="o">*</span>
        <span class="kn">import</span><span class="w"> </span><span class="nn">urllib2</span>
        <span class="kn">import</span><span class="w"> </span><span class="nn">sklearn</span>
        <span class="kn">from</span><span class="w"> </span><span class="nn">sklearn</span><span class="w"> </span><span class="kn">import</span> <span class="n">decomposition</span>


        <span class="c1"># fetch and clean iris data from UCI</span>
        <span class="n">data</span> <span class="o">=</span> <span class="n">pandas</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">urllib2</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span>
            <span class="s2">&quot;http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data&quot;</span><span class="p">))</span>
        <span class="n">data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">drop</span><span class="p">([</span><span class="mi">149</span><span class="p">])</span> <span class="c1"># bad line</span>
        <span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;sepal_length&#39;</span><span class="p">,</span> <span class="s1">&#39;sepal_width&#39;</span><span class="p">,</span> <span class="s1">&#39;petal_length&#39;</span><span class="p">,</span> <span class="s1">&#39;petal_width&#39;</span><span class="p">,</span> <span class="s1">&#39;class&#39;</span><span class="p">]</span>
        <span class="n">data</span><span class="o">.</span><span class="n">columns</span> <span class="o">=</span> <span class="n">columns</span>


        <span class="c1"># all features</span>
        <span class="n">features</span> <span class="o">=</span> <span class="p">[</span><span class="n">FillMissing</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">columns</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]]</span>

        <span class="c1"># features, log transformed features, and interaction terms</span>
        <span class="n">expanded_features</span> <span class="o">=</span> <span class="p">(</span>
            <span class="n">features</span> <span class="o">+</span>
            <span class="p">[</span><span class="n">Log</span><span class="p">(</span><span class="n">F</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">features</span><span class="p">]</span> <span class="o">+</span>
            <span class="p">[</span>
                <span class="n">F</span><span class="p">(</span><span class="s1">&#39;sepal_width&#39;</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">,</span>
                <span class="n">combo</span><span class="o">.</span><span class="n">Interactions</span><span class="p">(</span><span class="n">features</span><span class="p">),</span>
            <span class="p">]</span>
        <span class="p">)</span>


        <span class="c1"># Define several models and feature sets to explore,</span>
        <span class="c1"># run 5 fold cross-validation on each and print the results.</span>
        <span class="c1"># We define 2 models and 4 feature sets, so this will be</span>
        <span class="c1"># 4 * 2 = 8 models tested.</span>
        <span class="n">shortcuts</span><span class="o">.</span><span class="n">cv_factory</span><span class="p">(</span>
            <span class="n">data</span><span class="o">=</span><span class="n">data</span><span class="p">,</span>

            <span class="n">target</span><span class="o">=</span><span class="p">[</span><span class="n">AsFactor</span><span class="p">(</span><span class="s1">&#39;class&#39;</span><span class="p">)],</span>
            <span class="n">metrics</span><span class="o">=</span><span class="p">[[</span><span class="n">metrics</span><span class="o">.</span><span class="n">GeneralizedMCC</span><span class="p">()]],</span>

            <span class="c1"># Try out two algorithms</span>
            <span class="n">model</span><span class="o">=</span><span class="p">[</span>
                <span class="n">sklearn</span><span class="o">.</span><span class="n">ensemble</span><span class="o">.</span><span class="n">RandomForestClassifier</span><span class="p">(</span><span class="n">n_estimators</span><span class="o">=</span><span class="mi">20</span><span class="p">),</span>
                <span class="n">sklearn</span><span class="o">.</span><span class="n">linear_model</span><span class="o">.</span><span class="n">LogisticRegression</span><span class="p">(),</span>
                <span class="p">],</span>

            <span class="c1"># and 4 feature sets</span>
            <span class="n">features</span><span class="o">=</span><span class="p">[</span>
                <span class="n">expanded_features</span><span class="p">,</span>

                <span class="c1"># Feature selection</span>
                <span class="p">[</span><span class="n">trained</span><span class="o">.</span><span class="n">FeatureSelector</span><span class="p">(</span>
                    <span class="n">expanded_features</span><span class="p">,</span>
                    <span class="c1"># use random forest&#39;s importance to trim</span>
                    <span class="n">selectors</span><span class="o">.</span><span class="n">RandomForestSelector</span><span class="p">(</span><span class="n">classifier</span><span class="o">=</span><span class="kc">True</span><span class="p">),</span>
                    <span class="n">target</span><span class="o">=</span><span class="n">AsFactor</span><span class="p">(</span><span class="s1">&#39;class&#39;</span><span class="p">),</span> <span class="c1"># target to use</span>
                    <span class="n">n_keep</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="c1"># keep top 5 features</span>
                    <span class="p">)],</span>

                <span class="c1"># Reduce feature dimension (pointless on this dataset)</span>
                <span class="p">[</span><span class="n">combo</span><span class="o">.</span><span class="n">DimensionReduction</span><span class="p">(</span><span class="n">expanded_features</span><span class="p">,</span>
                                    <span class="n">decomposer</span><span class="o">=</span><span class="n">decomposition</span><span class="o">.</span><span class="n">PCA</span><span class="p">(</span><span class="n">n_components</span><span class="o">=</span><span class="mi">4</span><span class="p">))],</span>

                <span class="c1"># Normalized features</span>
                <span class="p">[</span><span class="n">Normalize</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">expanded_features</span><span class="p">],</span>
            <span class="p">]</span>
        <span class="p">)</span>
    </code></pre>
    </div>


    ## Status
    Ramp is very alpha currently, so expect bugs, bug fixes and API changes.

    ## Requirements
     * Numpy
     * Scipy    
     * Pandas
     * PyTables
     * Sci-kit Learn

    ## Author
    Ken Van Haren. Email with feedback/questions: [email protected]

About

Last Updated

Mar 24, 2023 at 19:57

Supported Platforms

noarch