How It Works

Prerequisites 

Getting Started

Purchase a yearly subscription here
Your subscription will be renewed automatically every year on the same date and time

Open your order confirmation email (or visit “My Account” area on dedupify.com) and obtain your License Key

Visit “My Account” area on dedupify.com and download the .zip file

Extract (unzip) the file

Launch your favorite terminal and head to the extracted folder’s location

Register your License

The first thing you have to do is to register your license key.

Navigate to the /bin/registration/ folder and edit the licensekey.txt file

Replace the content of licensekey.txt with your unique License Key

Save the file, navigate to the /bin/ folder and execute the registration command:

./dedupercli -r

If your registered key is active, you will receive the following message:

“Successfully activated license file.”

MPORTANT: Do NOT share that key with anyone. Can be activated only ONCE.

How to Use

This is a cross platform software. Will execute both on Windows and Linux.

Inside the /bin/data/ folder you will find a few test Databases and a Flat File.

chinook.db
real_estate.db
Sacramentorealestatetransactions.csv

You can check the columns on the flat file Sacramentorealestatetransactions.csv 

vim Sacramentorealestatetransactions.csv

street,city,zip,state,beds,baths,sq__ft,type,sale_date,price,latitude,longitude

You can check the SQL Lite test databases. To install sqlite3 follow this guide

sqlite3 chinook.db
sqlite> .schema

You can see all the different tables inside the test chinook.db file.

sqlite> select * from tracks
...> ;

You can check the SQL Lite test databases. To install sqlite3 follow this guide

Use this to prove that its running correctly on your machine. You don’t have to worry about connecting to a network, it’s all file based.

IMPORTANT: All of the existing tasks can run from the /bin/data/ folder 

JNDI Config

This is a way of defining your database connections.

cd /bin/conf/jndi
vim default_ds.properties

In later steps you will see that you just refer to the properties files without the .properties extension. You can either edit directly the .properties files OR use the CLI (Command Line Interface) shortcuts.

There is two types allowed in dedupify.

First is javax.sql.DataSource

This is a JDBC4 connection. It’s got a driver, a url, a user, and a password.

SqlLiteTest/type=javax.sql.DataSource
SqlLiteTest/driver=org.sqlite.JDBC4
SqlLiteTest/url=jdbc:sqlite:data/real_estate.db
SqlLiteTest/user=
SqlLiteTest/password=

The other type is a java.util.Map 

In this case, the RealEstateOutHash it’s got an extension which is txt and a delimiter which is pipe |.

If you are gonna spool out a flat file, it’s gonna spool it out with .txt and with a pipe | between each value. The targetName is a file path.

RealEstateOutHash/type=java.util.Map
RealEstateOutHash/ext=txt
RealEstateOutHash/delimiter=|
RealEstateOutHash/targetName=data/outputData/dupeName

Run the help

If you want to run the help use:

./dedupercli -h

It’s self-documented and this is everything you can do in the software.

How to run a CSV to a CSV – dedupified output

Add the following lines to /bin/conf/jndi/default_ds.properties

CsvToCsvIn/type=javax.sql.DataSource
CsvToCsvIn/driver=org.relique.jdbc.csv.CsvDriver
CsvToCsvIn/url=jdbc:relique:csv:data/
CsvToCsvIn/user=
CsvToCsvIn/password=

CsvToCsvOut/type=java.util.Map
CsvToCsvOut/targetName=data/DedupifiedCsvFile
CsvToCsvOut/ext=csv

To generate the dedupified csv, run:

./dedupercli dedupe -sj CsvToCsvIn default_ds Sacramentorealestatetransactions -hc street,city,state,zip,price -TJtype=csv -TJjndiName=CsvToCsvOut -TJcontextName=default_ds -TJdeleteIfExists=true

This will result to a dedupified new file under /bin/data/DedupifiedCsvFile.csv

How to run a CSV to a SQLite – only dupes output

Add the following lines to /bin/conf/jndi/default_ds.properties

CsvToSqliteIn/type=javax.sql.DataSource
CsvToSqliteIn/driver=org.relique.jdbc.csv.CsvDriver
CsvToSqliteIn/url=jdbc:relique:csv:data/
CsvToSqliteIn/user=
CsvToSqliteIn/password=

CsvToSqliteOut/type=javax.sql.DataSource
CsvToSqliteOut/driver=org.sqlite.JDBC4
CsvToSqliteOut/url=jdbc:sqlite:data/CsvToSqliteOutDupes.db
CsvToSqliteOut/user=
CsvToSqliteOut/password=

To generate the dupes only db, run:

./dedupercli dedupe -sj CsvToSqliteIn default_ds Sacramentorealestatetransactions -hc street,city,state,zip,price -DJtype=sql -DJjndiName=CsvToSqliteOut -DJcontextName=default_ds -DJdeleteIfExists=true

This will result to a dupes only file under /bin/data/CsvToSqliteOutDupes.db

How to run an SQL query in SQLite and export to CSV – dedupified output

Add the following lines to /bin/conf/jndi/default_ds.properties

SqliteToCsvDupesIn/type=javax.sql.DataSource
SqliteToCsvDupesIn/driver=org.sqlite.JDBC4
SqliteToCsvDupesIn/url=jdbc:sqlite:data/real_estate.db
SqliteToCsvDupesIn/user=
SqliteToCsvDupesIn/password=

SqliteToCsvDupesOut/type=java.util.Map
SqliteToCsvDupesOut/targetName=data/SqliteToCsvDedupified
SqliteToCsvDupesOut/ext=csv

To generate the dedupified only csv, run:

./dedupercli dedupe -sj SqliteToCsvDupesIn default_ds real_estate -hc city -TJtype=csv -TJjndiName=SqliteToCsvDupesOut -TJcontextName=default_ds -TJdeleteIfExists=true

This will result to a dedupified file under /bin/data/SqliteToCsvDedupified.csv