To install this package, run one of the following:
Ramp - Rapid Machine Learning Prototyping =========================================
Ramp is a python module for rapid prototyping of machine learning
solutions. It is essentially a [pandas](http://pandas.pydata.org)
wrapper around various python machine learning and statistics libraries
([scikit-learn](http://scikit-learn.org), [rpy2](http://rpy.sourceforge.net/rpy2.html), etc.),
providing a simple, declarative syntax for
exploring features, algorithms and transformations quickly and
efficiently.
Documentation: http://ramp.readthedocs.org
**Why Ramp?**
* **Clean, declarative syntax**
No more hackish one-off spaghetti scripts!
* **Complex feature transformations**
Chain and combine features:
<div class="codehilite">
<pre><span></span><code><span class="n">Normalize</span><span class="p">(</span><span class="n">Log</span><span class="p">(</span><span class="s1">'x'</span><span class="p">))</span>
<span class="n">Interactions</span><span class="p">([</span><span class="n">Log</span><span class="p">(</span><span class="s1">'x1'</span><span class="p">),</span> <span class="p">(</span><span class="n">F</span><span class="p">(</span><span class="s1">'x2'</span><span class="p">)</span> <span class="o">+</span> <span class="n">F</span><span class="p">(</span><span class="s1">'x3'</span><span class="p">))</span> <span class="o">/</span> <span class="mi">2</span><span class="p">])</span>
</code></pre>
</div>
Reduce feature dimension:
<div class="codehilite">
<pre><span></span><code><span class="n">DimensionReduction</span><span class="p">([</span><span class="n">F</span><span class="p">(</span><span class="s1">'x</span><span class="si">%d</span><span class="s1">'</span><span class="o">%</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100</span><span class="p">)],</span> <span class="n">decomposer</span><span class="o">=</span><span class="n">PCA</span><span class="p">(</span><span class="n">n_components</span><span class="o">=</span><span class="mi">3</span><span class="p">))</span>
</code></pre>
</div>
Incorporate residuals or predictions to blend with other models:
<div class="codehilite">
<pre><span></span><code><span class="n">Residuals</span><span class="p">(</span><span class="n">config_model1</span><span class="p">)</span> <span class="o">+</span> <span class="n">Predictions</span><span class="p">(</span><span class="n">config_model2</span><span class="p">)</span>
</code></pre>
</div>
Any feature that uses the target ("y") variable will automatically respect the
current training and test sets.
* **Caching**
Ramp caches and stores on disk in fast HDF5 format (or elsewhere if you want) all features and models it
computes, so nothing is recomputed unnecessarily. Results are stored
and can be retrieved, compared, blended, and reused between runs.
* **Easy extensibility**
Ramp has a simple API, allowing you to plug in estimators from
scikit-learn, rpy2 and elsewhere, or easily build your own feature
transformations, metrics, feature selectors, reporters, or estimators.
## Quick start
[Getting started with Ramp: Classifying insults](http://www.kenvanharen.com/2012/11/getting-started-with-ramp-detecting.html)
Or, the quintessential Iris example:
<div class="codehilite">
<pre><span></span><code> <span class="kn">import</span><span class="w"> </span><span class="nn">pandas</span>
<span class="kn">from</span><span class="w"> </span><span class="nn">ramp</span><span class="w"> </span><span class="kn">import</span> <span class="o">*</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">urllib2</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">sklearn</span>
<span class="kn">from</span><span class="w"> </span><span class="nn">sklearn</span><span class="w"> </span><span class="kn">import</span> <span class="n">decomposition</span>
<span class="c1"># fetch and clean iris data from UCI</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">pandas</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">urllib2</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span>
<span class="s2">"http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"</span><span class="p">))</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">drop</span><span class="p">([</span><span class="mi">149</span><span class="p">])</span> <span class="c1"># bad line</span>
<span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'sepal_length'</span><span class="p">,</span> <span class="s1">'sepal_width'</span><span class="p">,</span> <span class="s1">'petal_length'</span><span class="p">,</span> <span class="s1">'petal_width'</span><span class="p">,</span> <span class="s1">'class'</span><span class="p">]</span>
<span class="n">data</span><span class="o">.</span><span class="n">columns</span> <span class="o">=</span> <span class="n">columns</span>
<span class="c1"># all features</span>
<span class="n">features</span> <span class="o">=</span> <span class="p">[</span><span class="n">FillMissing</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">columns</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]]</span>
<span class="c1"># features, log transformed features, and interaction terms</span>
<span class="n">expanded_features</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">features</span> <span class="o">+</span>
<span class="p">[</span><span class="n">Log</span><span class="p">(</span><span class="n">F</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">features</span><span class="p">]</span> <span class="o">+</span>
<span class="p">[</span>
<span class="n">F</span><span class="p">(</span><span class="s1">'sepal_width'</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">,</span>
<span class="n">combo</span><span class="o">.</span><span class="n">Interactions</span><span class="p">(</span><span class="n">features</span><span class="p">),</span>
<span class="p">]</span>
<span class="p">)</span>
<span class="c1"># Define several models and feature sets to explore,</span>
<span class="c1"># run 5 fold cross-validation on each and print the results.</span>
<span class="c1"># We define 2 models and 4 feature sets, so this will be</span>
<span class="c1"># 4 * 2 = 8 models tested.</span>
<span class="n">shortcuts</span><span class="o">.</span><span class="n">cv_factory</span><span class="p">(</span>
<span class="n">data</span><span class="o">=</span><span class="n">data</span><span class="p">,</span>
<span class="n">target</span><span class="o">=</span><span class="p">[</span><span class="n">AsFactor</span><span class="p">(</span><span class="s1">'class'</span><span class="p">)],</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[[</span><span class="n">metrics</span><span class="o">.</span><span class="n">GeneralizedMCC</span><span class="p">()]],</span>
<span class="c1"># Try out two algorithms</span>
<span class="n">model</span><span class="o">=</span><span class="p">[</span>
<span class="n">sklearn</span><span class="o">.</span><span class="n">ensemble</span><span class="o">.</span><span class="n">RandomForestClassifier</span><span class="p">(</span><span class="n">n_estimators</span><span class="o">=</span><span class="mi">20</span><span class="p">),</span>
<span class="n">sklearn</span><span class="o">.</span><span class="n">linear_model</span><span class="o">.</span><span class="n">LogisticRegression</span><span class="p">(),</span>
<span class="p">],</span>
<span class="c1"># and 4 feature sets</span>
<span class="n">features</span><span class="o">=</span><span class="p">[</span>
<span class="n">expanded_features</span><span class="p">,</span>
<span class="c1"># Feature selection</span>
<span class="p">[</span><span class="n">trained</span><span class="o">.</span><span class="n">FeatureSelector</span><span class="p">(</span>
<span class="n">expanded_features</span><span class="p">,</span>
<span class="c1"># use random forest's importance to trim</span>
<span class="n">selectors</span><span class="o">.</span><span class="n">RandomForestSelector</span><span class="p">(</span><span class="n">classifier</span><span class="o">=</span><span class="kc">True</span><span class="p">),</span>
<span class="n">target</span><span class="o">=</span><span class="n">AsFactor</span><span class="p">(</span><span class="s1">'class'</span><span class="p">),</span> <span class="c1"># target to use</span>
<span class="n">n_keep</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="c1"># keep top 5 features</span>
<span class="p">)],</span>
<span class="c1"># Reduce feature dimension (pointless on this dataset)</span>
<span class="p">[</span><span class="n">combo</span><span class="o">.</span><span class="n">DimensionReduction</span><span class="p">(</span><span class="n">expanded_features</span><span class="p">,</span>
<span class="n">decomposer</span><span class="o">=</span><span class="n">decomposition</span><span class="o">.</span><span class="n">PCA</span><span class="p">(</span><span class="n">n_components</span><span class="o">=</span><span class="mi">4</span><span class="p">))],</span>
<span class="c1"># Normalized features</span>
<span class="p">[</span><span class="n">Normalize</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">expanded_features</span><span class="p">],</span>
<span class="p">]</span>
<span class="p">)</span>
</code></pre>
</div>
## Status
Ramp is very alpha currently, so expect bugs, bug fixes and API changes.
## Requirements
* Numpy
* Scipy
* Pandas
* PyTables
* Sci-kit Learn
## Author
Ken Van Haren. Email with feedback/questions: [email protected]
Last Updated
Mar 24, 2023 at 19:57
Supported Platforms