pyminhash

Community

Efficient MinHashing

Copied fromcf-staging / pyminhash

Versions

Installation

To install this package, run one of the following:

Conda

$conda install conda-forge::pyminhash

Usage Tracking

Version

5 / 8 versions selected

Downloads (Last 6 months): 0

Description

MinHashing is a very efficient way of finding similar records in a dataset based on Jaccard similarity. PyMinHash implements efficient minhashing for Pandas dataframes. See instructions below or look at the example notebook to get started.

Developed by Frits Hermans

PyPI: https://pypi.org/project/PyMinHash/

About

Summary

Efficient MinHashing

Last Updated

Jan 6, 2023 at 19:56

License

MIT

Total Downloads

15.0K

Version Downloads

2.1K

Supported Platforms

noarch

Home

https://github.com/fritshermans/pyminhash

GitHub Repository

https://github.com/fritshermans/pyminhash

Documentation

https://pyminhash.readthedocs.io/en/latest/