brittainhard / packages / fancyimpute 0.2.0

0

Matrix completion and feature imputation algorithms

License: Apache Software License
Home: https://github.com/hammerlab/fancyimpute
1452 total downloads
Last upload: 7 years and 8 months ago

Installers

linux-64 v0.2.0

conda install

To install this package run one of the following:
conda install brittainhard::fancyimpute

Description

fancyimpute

A variety of matrix completion and imputation algorithms implemented in Python.

Usage

.. code:: python

from fancyimpute import BiScaler, KNN, NuclearNormMinimization, SoftImpute

# X is the complete data matrix
# X_incomplete has the same values as X except a subset have been replace with NaN

# Use 3 nearest rows which have a feature to fill in each row's missing features
X_filled_knn = KNN(k=3).complete(X_incomplete)

# matrix completion using convex optimization to find low-rank solution
# that still matches observed values. Slow!
X_filled_nnm = NuclearNormMinimization().complete(X_incomplete)

# Instead of solving the nuclear norm objective directly, instead
# induce sparsity using singular value thresholding
X_filled_softimpute = SoftImpute().complete(X_incomplete_normalized)

# print mean squared error for the three imputation methods above
nnm_mse = ((X_filled_nnm[missing_mask] - X[missing_mask]) ** 2).mean()
print("Nuclear norm minimization MSE: %f" % nnm_mse)

softImpute_mse = ((X_filled_softimpute[missing_mask] - X[missing_mask]) ** 2).mean()
print("SoftImpute MSE: %f" % softImpute_mse)

knn_mse = ((X_filled_knn[missing_mask] - X[missing_mask]) ** 2).mean()
print("knnImpute MSE: %f" % knn_mse)

Algorithms

SimpleFill: Replaces missing entries with the mean or median of each column.
KNN: Nearest neighbor imputations which weights samples using the mean squared difference on features for which two rows both have observed data.
SoftImpute: Matrix completion by iterative soft thresholding of SVD decompositions. Inspired by the softImpute <https://web.stanford.edu/~hastie/swData/softImpute/vignette.html>_ package for R, which is based on Spectral Regularization Algorithms for Learning Large Incomplete Matrices <http://web.stanford.edu/~hastie/Papers/mazumder10a.pdf>_ by Mazumder et. al.
IterativeSVD: Matrix completion by iterative low-rank SVD decomposition. Should be similar to SVDimpute from Missing value estimation methods for DNA microarrays <http://www.ncbi.nlm.nih.gov/pubmed/11395428>__ by Troyanskaya et. al.
MICE: Reimplementation of Multiple Imputation by Chained Equations <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/>__.
MatrixFactorization: Direct factorization of the incomplete matrix into low-rank U and V, with an L1 sparsity penalty on the elements of U and an L2 penalty on the elements of V. Solved by gradient descent.
NuclearNormMinimization: Simple implementation of Exact Matrix Completion via Convex Optimization <http://statweb.stanford.edu/~candes/papers/MatrixCompletion.pdf>_ by Emmanuel Candes and Benjamin Recht using cvxpy <http://www.cvxpy.org/en/latest/>_. Too slow for large matrices.
BiScaler: Iterative estimation of row/column means and standard deviations to get doubly normalized matrix. Not guaranteed to converge but works well in practice. Taken from Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares <http://arxiv.org/abs/1410.2596>__.

.. |Build Status| image:: https://travis-ci.org/hammerlab/fancyimpute.svg?branch=master :target: https://travis-ci.org/hammerlab/fancyimpute .. |Coverage Status| image:: https://coveralls.io/repos/github/hammerlab/fancyimpute/badge.svg?branch=master :target: https://coveralls.io/github/hammerlab/fancyimpute?branch=master .. |DOI| image:: https://zenodo.org/badge/doi/10.5281/zenodo.51773.svg :target: http://dx.doi.org/10.5281/zenodo.51773

brittainhard / packages / fancyimpute 0.2.0 0