A command line interface for deduplicating data. Never write custom code again.

./dedupercli -r Successfully activated license file. ./dedupercli dedupe -sj CsvToCsvIn default_ds Sacramentorealestatetransactions -hc street,city,state,zip,price -TJtype=csv -TJjndiName=CsvToCsvOut -TJcontextName=default_ds -TJdeleteIfExists=true Successfully dedupified new file saved: /bin/data/DedupifiedCsvFile.csv
./dedupercli dedupe -sj CsvToCsvIn default_ds inputCSVfile -hc street,city,state,zip,price -TJtype=csv -TJjndiName=CsvToCsvOut -TJcontextName=default_ds -TJdeleteIfExists=true Successfully dedupified new file saved: DedupifiedFile.csv

Deduplicate any table(s) or CSV(s)

Target a table, a set of tables (use SQL) or flat files. Access all the major DBs.

Robust duplicate discovery

Deduplicate on any combination of columns.
Easy output and workflow integrations

Get meta data, a duplicates reports, the deduped dataset and the duplicates dataset.


What databases does it work with?

Any SQL database, flat file, txt, csv. We use JDBC.

What are the requirements?

JVM 8+ That’s it.

Does it overwrite my source?

It’s read only with regards to the source. You’re safe!

Does it have a CLI?