Deduplicate CSVs

A robust, no code solution for complex deduplication needs. Start with a table or flat file or use SQL to query specific columns (full SQL keyword support). Deduplicate using multiple column values.

./dedupercli -r Successfully activated license file. ./dedupercli dedupe -sj CsvToCsvIn default_ds Sacramentorealestatetransactions -hc street,city,state,zip,price -TJtype=csv -TJjndiName=CsvToCsvOut -TJcontextName=default_ds -TJdeleteIfExists=true Successfully dedupified new file saved: /bin/data/DedupifiedCsvFile.csv

Things you can do without any effort using Dedupify.


Explore data

Dedupify accesses data via JDBC — and supports SQL. 


Profile data

Understand and document how big your duplicate issues is. 


Clean data

Export cleaned data to flat file or to any db available via JDBC. 



Identify duplicates in one table and find them in another table (MD5 signnatures)