Deduplicate CSVs

A robust, no code solution for complex deduplication needs. Start with a table or flat file or use SQL to query specific columns (full SQL keyword support). Deduplicate using multiple column values.

./dedupercli -r Successfully activated license file. ./dedupercli dedupe -sj CsvToCsvIn default_ds Sacramentorealestatetransactions -hc street,city,state,zip,price -TJtype=csv -TJjndiName=CsvToCsvOut -TJcontextName=default_ds -TJdeleteIfExists=true Successfully dedupified new file saved: /bin/data/DedupifiedCsvFile.csv

Things you can do without any effort using Dedupify.

E

Explore data

Dedupify accesses data via JDBC — and supports SQL. 

E

Profile data

Understand and document how big your duplicate issues is. 

E

Clean data

Export cleaned data to flat file or to any db available via JDBC. 

E

Advanced

Identify duplicates in one table and find them in another table (MD5 signnatures)