Blade Runner - pet project

Blade Runner poster

Lately, I've been clearing-out of my drives. You know that feeling when you have backup folder inside backup final folder, inside of another backup final (1) folder. I have a lot of this recursive backups, they are migrating from one drive to another, from one PC to another.

One of the biggest directories is the directory with all my photos and videos. Every backup contains this photos directory just in case.

So there's a problem. Content is copied multiple times, copies occupy my gigabytes, it gets harder on each iteration. I have no will to deal with it manually.

I thought it's a great example of a problem I can resolve using my programming skills and contribute to the open-source world. Just a small pet project utility doing all this stuff. It can be considered as an understandable beginners showcase of how to write console utility and deliver it for users.

Moreover, October is the month of Hacktoberfest. It is a nice time to get things done.

Todo

What do we need to do? Let's define this task.

  • Read one file by one and copy it to the result directory if it's unique;
  • Decide if the file unique or not using cached hashes of all files we read;
  • Clones should be removed in the result directory (naming is vital, replicants should be destroyed).

Let's add some extra tasks.

  • Get a list of all clones without removing;
  • Possibility to delete clones from the given directory;
  • Mainline command should copy all unique files from the given input directory to the given output directory trying to keep directory tree and file names;
  • Flatten option to copy all unique files into a single directory without keeping a directory tree;
  • Make it possible to rename copied files by some optimal rule, like date;
  • For photo files, we can use the date in which photo was taken.

Tools

I love Kotlin. It may look like not the best language to write console utilities, but why not? By using kotlin I significantly decrease the amount of time to develop this thing. From my experience pet projects are often get forgotten because you don't have time to finish them.

I wanted to create console utility with simple syntax with arguments. Clikt is amazing for this. It makes writing command-line interface really intuitive. I love how it creates all these helping messages without me doing anything. A programmer just writes typed arguments and Clikt do everything else.

MetadataExtractor is a nice library to work with media files. The main bladerunner purpose for me is to copy unique photos, so I need something to work with photos metadata. It's used to get the photo taken date, so I don't need to think about it.

That's kind of all. As the cherry on my cake, I wanted to write unit tests. Junit5 and AssertJ are my choice for this project. Assert library is just a matter of your habit or personal sense of beauty.

Github Actions is used as my CI/CD tool.

Syntax

There are three main commands

Run

Copies all unique files to output directory.

Usage: bladerunner run [OPTIONS]

Options:
  -din, --directory-in DIRECTORY   Path to root directory of input
  -dout, --directory-out DIRECTORY
                                   Path to output directory
  -ns, --naming-strategy [DEFAULT|DATE_MODIFIED|PHOTO_TAKEN]
                                   Naming strategy for created files
  -o, --out FILE                   Path to output file
  -s, --silent                     Do not log activity
  -f, --flatten                    Copy all files into out directory without
                                   saving directory tree
  -h, --help                       Show this message and exit

Naming strategy

A couple of words on how names for copied files created.

  • DEFAULT obtains original file name. DEFAULT is default fallback behavior.
  • DATE_MODIFIED obtains file last modified date.
  • PHOTO_TAKEN obtains photo taken date from file EXIF if possible, uses fallback otherwise.

A random UUID string is added as file name suffix if there's file with same file name in output directory.

Clean

Deletes all non-unique files from the given directory.

Usage: bladerunner clean [OPTIONS]

Options:
  -din, --directory-in DIRECTORY  Path to root directory of input
  -o, --out FILE                  Path to output file
  -s, --silent                    Do not log activity
  -h, --help                      Show this message and exit

Find

Prints information if it's a clone or not about all files in the given directory.

Usage: bladerunner find [OPTIONS]

Options:
  -din, --directory-in DIRECTORY  Path to root directory of input
  -o, --out FILE                  Path to output file
  -h, --help                      Show this message and exit

Result

Now I have a nice repo pavelkorolevxyz/bladerunner-desktop in profile. You can find latest bladerunner artifacts on Releases page.

Every push to the main development branch is tested and built with help of Github Actions automagically. For now, it looks pretty finished, in terms of features, documentation, tests. I love this fact.

Finally, the main thing is that now I have my backups folders clean and without tons of clones.