pyminhash
Efficient MinHashing
Efficient MinHashing
To install this package, run one of the following:
MinHashing is a very efficient way of finding similar records in a dataset based on Jaccard similarity. PyMinHash implements efficient minhashing for Pandas dataframes. See instructions below or look at the example notebook to get started.
Developed by Frits Hermans
Summary
Efficient MinHashing
Last Updated
Jan 6, 2023 at 19:56
License
MIT
Total Downloads
15.0K
Version Downloads
2.1K
Supported Platforms
GitHub Repository
https://github.com/fritshermans/pyminhashDocumentation
https://pyminhash.readthedocs.io/en/latest/