A command line interface for deduplicating data. Never write custom code again.

./dedupercli -r Successfully activated license file. ./dedupercli dedupe -sj CsvToCsvIn default_ds Sacramentorealestatetransactions -hc street,city,state,zip,price -TJtype=csv -TJjndiName=CsvToCsvOut -TJcontextName=default_ds -TJdeleteIfExists=true Successfully dedupified new file saved: /bin/data/DedupifiedCsvFile.csv
./dedupercli dedupe -sj CsvToCsvIn default_ds inputCSVfile -hc street,city,state,zip,price -TJtype=csv -TJjndiName=CsvToCsvOut -TJcontextName=default_ds -TJdeleteIfExists=true Successfully dedupified new file saved: DedupifiedFile.csv

Deduplicate any table(s) or CSV(s)

Target a table, a set of tables (use SQL) or flat files. Access all the major DBs.

Robust duplicate discovery

Deduplicate on any combination of columns.
Easy output and workflow integrations

Get meta data, a duplicates reports, the deduped dataset and the duplicates dataset.

FAQs

What databases does it work with?

Any SQL database, flat file, txt, csv. We use JDBC.

What are the requirements?

JVM 8+ That’s it.

Does it overwrite my source?

It’s read only with regards to the source. You’re safe!

Does it have a CLI?