Lately, I've been clearing-out of my drives. You know that feeling when you have
backup folder inside
backup final folder, inside of another
backup final (1) folder. I have a lot of this recursive backups, they are migrating from one drive to another, from one PC to another.
One of the biggest directories is the directory with all my photos and videos. Every backup contains this photos directory just in case.
So there's a problem. Content is copied multiple times, copies occupy my gigabytes, it gets harder on each iteration. I have no will to deal with it manually.
I thought it's a great example of a problem I can resolve using my programming skills and contribute to the open-source world. Just a small pet project utility doing all this stuff. It can be considered as an understandable beginners showcase of how to write console utility and deliver it for users.
Moreover, October is the month of Hacktoberfest. It is a nice time to get things done.
What do we need to do? Let's define this task.
- Read one file by one and copy it to the result directory if it's unique;
- Decide if the file unique or not using cached hashes of all files we read;
- Clones should be removed in the result directory (naming is vital, replicants should be destroyed).
Let's add some extra tasks.
- Get a list of all clones without removing;
- Possibility to delete clones from the given directory;
- Mainline command should copy all unique files from the given input directory to the given output directory trying to keep directory tree and file names;
- Flatten option to copy all unique files into a single directory without keeping a directory tree;
- Make it possible to rename copied files by some optimal rule, like date;
- For photo files, we can use the date in which photo was taken.
I love Kotlin. It may look like not the best language to write console utilities, but why not? By using kotlin I significantly decrease the amount of time to develop this thing. From my experience pet projects are often get forgotten because you don't have time to finish them.
I wanted to create console utility with simple syntax with arguments. Clikt is amazing for this. It makes writing command-line interface really intuitive. I love how it creates all these helping messages without me doing anything. A programmer just writes typed arguments and Clikt do everything else.
MetadataExtractor is a nice library to work with media files. The main bladerunner purpose for me is to copy unique photos, so I need something to work with photos metadata. It's used to get the photo taken date, so I don't need to think about it.
That's kind of all. As the cherry on my cake, I wanted to write unit tests. Junit5 and AssertJ are my choice for this project. Assert library is just a matter of your habit or personal sense of beauty.
Github Actions is used as my CI/CD tool.
There are three main commands
Copies all unique files to output directory.
Usage: bladerunner run [OPTIONS] Options: -din, --directory-in DIRECTORY Path to root directory of input -dout, --directory-out DIRECTORY Path to output directory -ns, --naming-strategy [DEFAULT|DATE_MODIFIED|PHOTO_TAKEN] Naming strategy for created files -o, --out FILE Path to output file -s, --silent Do not log activity -f, --flatten Copy all files into out directory without saving directory tree -h, --help Show this message and exit
A couple of words on how names for copied files created.
DEFAULTobtains original file name.
DEFAULTis default fallback behavior.
DATE_MODIFIEDobtains file last modified date.
PHOTO_TAKENobtains photo taken date from file EXIF if possible, uses fallback otherwise.
A random UUID string is added as file name suffix if there's file with same file name in output directory.
Deletes all non-unique files from the given directory.
Usage: bladerunner clean [OPTIONS] Options: -din, --directory-in DIRECTORY Path to root directory of input -o, --out FILE Path to output file -s, --silent Do not log activity -h, --help Show this message and exit
Prints information if it's a clone or not about all files in the given directory.
Usage: bladerunner find [OPTIONS] Options: -din, --directory-in DIRECTORY Path to root directory of input -o, --out FILE Path to output file -h, --help Show this message and exit
Now I have a nice repo pavelkorolevxyz/bladerunner-desktop in profile. You can find latest bladerunner artifacts on Releases page.
Every push to the main development branch is tested and built with help of Github Actions automagically. For now, it looks pretty finished, in terms of features, documentation, tests. I love this fact.
Finally, the main thing is that now I have my backups folders clean and without tons of clones.