#jdupe

2026-02-13

On Local Photo Management and the Command Line

Reading Time: 5 minutes

Picasa and iPhoto

Picasa and iPhoto were great apps. Both were free. Both allowed you to manage your photos locally and both allowed you to take pictures with photo cameras, or your phone, and sync them then you got home. Over time our phones synced via the cloud to these apps.

We lost the habit of getting home and ingesting photos because everything was done automatically. We took pictures and they appeared in Picasa and iCloud and we didn't think about it too much.

This was the gateway habit that led to some of us having cloud first photo libraries rather than local ones. One of the factors that led to this is that Picasa synced automatically to the cloud, as did iPhoto.

This was excellent because for years our photographs were stored locally and in the cloud. That is, until I libraries grew and became too big for laptop hard drives. At this point we had to choose. Get a laptop with a larger hard drive, delete photos, or spend more on iCloud.At 3 CHF per month spending more on iCloud was an easy choice.

It's later, when the library grew beyond 200GB that Google One became an interesting proposition. You could get two terabytes for 100 CHF per year. That head room is a huge luxury, for as long as you can backup your photos from the cloud back to a local volume.

The issue is that you can't, and you couldn't. It's only recently that I really managed to export my photos from Google Photos and Flickr, and after two weeks of experimenting and learning.

The Issue

Cloud services are great for synching all your photos and videos, as well as all the photos you get from whatsapp, the screenshots and more.They're not good when it comes to re-organising files.

If friends and family share photos via Whatsapp, Signal or you download videos from TikTok or Flickr, they're all combined into your own photos. This makes a lot of noise twice. The first time in iCloud and Google Photos but the second time in your whatsapp history.

If people share photos and videos whatsapp downloads them to its own backup, as well as to your own photo gallery if it allows you to, which I recommend for one reason. Whatsapp has a nasty memory of taking 100MB or more per chat. This noise is from photos, videos, pdfs, gifs and more. You might have a copy in Google Photos, in Apple Photos and potentially Immich, Photoprism and other photo clouds.

Hard to tidy

Google Photos, Apple Photos, Immich and Photoprism are great at automatic cataloguing but not at helping you tidy up the mess they help you create. For a start Immich and Apple photos make a tremendous mess of your photos files and hierarchy if you give them free reign. You go from a neatly organised hierarchy to a machine friendly mess that you need to clean up if you choose to move away from them.

With Apple Photos and Google Photos I find it excruciatingly hard to "spring clean" when storage gets low. With iPhoto I noticed that files are almost immediately backed up to iCloud so that if you migrate to Immich and Photoprism you download an entire library, every time immich or Photoprism crash and need to be repopulated. This often takes a day of keeping the phone's screen on. That's why having a local library is key and why kDrive is a great tool and a better solution

The Local Advantage

With kDrive, as with Google Photos, Photoprism, Immich, iPhoto and others you save your photo to a cloud, but unlike with them, with kDrive you have a hierarchical folder structure that you can download and work on via command line tools for batch operations, or visually for manual tidying tasks.

Exiftool

If your files are fresh from Google Takeout, the immich folder structure or other you can use a command line prompt to reorganise everything chronologically and more.

Jdupe

With Jdupe you can look for duplicates automatically. With Immich I noticed that I had 27,000 duplicates to sort through. In some cases they're triplicates and in other cases the duplicates are thumbnail duplicates. To do this sorting, manually, with the Immich tool would take weeks or months. With Jdupe it takes a few seconds to a few hours depending on how many duplicates there are.

rsync

With rsync you can transfer files between volumes with ease and convenience. The computer does the work in the background, backing up to a local drive, and a remote drive.

The Tailscale Caveat

If you're synching gigabytes of files use the local ip address, rather than tailscale because tailscale will throttle you after a certain amount of data transfer, I suspect. It's also a lot faster to do things locally. If you do sync remotely sync it locally first, and then move the drive to the remote location.

Visual Sorting and Find

While waiting for rsync to complete certain jobs I went through libraries manually and noticed patterns. I asked Gemini to create a command to help move webp, png and mp4 files with one pattern from my photo library to a secondary photo library that I can sort through at another time. In one instance that removed 130 gigabytes of noise.

The Motivating Push

I abandoned iCloud as my Single Source of Truth for my photos when my photos reached more than 200 GB and shifted towards Google Photos. With two gigabytes of storage I enjoyed the luxurious feeling. I enjoyed it until I saw that I could get 6TB for 67 CHF from Infomaniak and that's when I spent a long time migrating off of Google Photos.

They make it very hard because you can't just download a chronological list of folders and files as you can with kDrive.

Almost a Terabyte to Sort

My Apple, Google, and Flick libraries came to almost a terabyte of data, most of it duplicates. Sorting through it by hand would take months. Using the tools above, once I had a workflow prepared, with the tools listed above took days. Now my library is 370-390gb.

And Finally

27,000 Duplicates in Immich

I tried ingesting from mobile phones and an old immich library but in so doing I ended up with 27,000 dulicate pairs that I would have to sort through by hand. This task would take months. By removing all the duplicates, before ingesting into Immich I will save weeks of tedious work.

JDupe and Peace of Mind

My iCloud library hasn't been the single source of truth for years, due to the 200gb limit. For a while Google Photos was, until I downgraded the plan, and then it became a former single source of truth. Now I hope that Flickr will have filled many of the gaps. On a drive or two I have old iPhotolibraries.

If required I can open the package, extract the original. Run exiftool to create a chronological library, and repeat until all my libraries are consolidated, and then I can import them to my main photo library, and ingest them to Immich and Photoprism

Conclusion

With command line tools you can consolidate photo libraries from multiple sources into a single source of truth, and move on. By maintaining this single source of truth, and backing it up to kDrive, Google Drive or even iCloud you ensure that it is complete, and easy for immich, Photoprism, or some other tool to ingest.

#cloud #exiftools #GoogleDrive #iphoto #jdupe #Picasa
Three pillars of rock
2026-02-12

Migrating to kDrive from Flickr, Apple and Google Photo Clouds

Reading Time: 4 minutes

As I write this my consolidated photo album is being uploaded to kDrive, to serve as an offsite backup but the journey to this point took about two weeks, due in part to experimentation and learning to use various tools.

Tools I used

  • rsync
  • Google Takeout
  • Flickr Export
  • jdupe
  • Gemini
  • Euria
  • Le Chat, by Mistral

Work Flow

The first step is to request your data from Google Photos via the Google Takeout Tool, the Flickr Export tool for flickr, and to download all your photos locally from Apple Photos before disconnecting the local library from iCloud. Disconnecting Photos from iCloud gives you 30 days to realise you made a terrible mistake and fix it.

Export and organise

The next step is to unzip the Google Taekout files in one place, and the flickr export in another place. You want to keep the tree structure created by the zips for the next part.

Exiftool

Exiftools is a command line tool. Google Takeout and Flickr Export may detach metadata from your photos and add them to json files. Exiftools writes the exif data back into your photo files. If you ask Gemini or other AI solution for help it will provide you with the command you need to use. Request a dry run, and get the dry run to write to a text file to double check that it does what you expect.

Keep the zip files as they are. If you make a mistake it's good to have them on hand. Downloading 50 GB files from Google Takeout takes time.

With Flickr it's even more critical because Flickr generates 2gb files. I created a script to automatically download my 168 files.

Once you are happy that exiftool is behaving as expected you can run the command for real. Both of these steps take time so let them run in the background.

Google Takeout

Google takeout generates albums in three key ways, by individual names if you used face recognition, event name if you created an album, and by year, automatically. You will have two to three copies of some photos. In some directories you will only find json files.

When exiftool has run you can backup or delete the json files. If you have the zip files, then you're safe.

Flickr

When I expanded the Flickr zips it created a monoolithic directory with all the photos. I ran exiftools to marry json data with the photos.

Apple Photos

If you want to extract photos from Apple Photos quickly the quickest solution is to right click, show package contents, navigate to originals, and copy photos to another directory. You will need to use exiftool to create a directory where they are sorted by year, month day, and then you can run jdupe and add them to your main library.

Looking for Duplicates and Creating Chronological Libraries

With the data added by Exiftool we can now organise the photos chronologically. The issue is that we have event photos in albums, and the same event photos in the year folder. That's where jdupe comes in. It allows us to automatically compare photos within a directory before removing the duplicate copies.

Once this is done we can organise all the photos chronologically. This makes comparing photos much easier. It also adds a human accessible way of organising photos by year, month and day.

We repeat this step for Google Takeout and Flickr so that we end up with two clean chronological libraries.

The next step is to run jdupe again. This time we're comparing Flickr to Google Photos. The reason for this is that in an ideal world we have a perfect mirror, with both libraries being complete. In reality we might have interrupted payment to flickr, or Google photos so we have gaps. That's why we look for duplicates, before merging unique photos into our main photo library.

Tools such as rsync will help you merge the two libraries into the main library, as well as backup the clean library to a second hard drive on an external hard drive or on another device.

The kDrive migration

If you have not already done so, install the kDrive app and log in. Open the app and navigate to your library's folder and tell kdrive to sync the folder. It will then start copying the data to your cloud. Now you wait for it to be done.

Cleanup and Looking Forward

Once the main library is synced to kDrive I can delete two photos folders from kDrive and my local machine. I can tell kDrive on my phone to sync to the new library folder on kDrive.

That Synching Feeling

For now:

  • Photosync adds photos to photoprism
  • immich app adds photos to Immich
  • kDrive app uploads to kDrive storage

Photoprism and Immich Watching

Both Photoprism and Immich allow you to watch an import folder(photoprism) or external library (immich). If you set the main library as a watch folder then new photos uploaded to kdrive will be added to the main library, and photoprism and Immich will add them to their own libraries. Unselect the "move" option to keep the chronological library intact.

And Finally

With jdupe, exiftool and rsync you can go from having three photo libraries wittled down to just one. You can then tell kdrive desktop to watch and sync that folder. You can use rsync to mirror the library to two or three other drives and filesystems. I have APFS, APFS (case sensitive) and ext4. I also have an offsite backup via kDrive.

#Apple #exiftool #Google #infomaniak #jdupe #kdrive #photos #rsync #takeout
2026-02-08

Sorting Photoprism Photos With the Mistral Cat

Reading Time: 3 minutes

I chose to experiment with Le Chat by Mistral, the French AI alternative to Gemini, Claude and CatIFARTED (ChatGPT). For the experiment I copied my Photoprism photos from the drive I use that is connected a Raspberry pi to a laptop before running scripts to sort and remove duplicates. It worked well, with a nice little bonus which I'll expand on later.

Goal: Clean Up Duplicate Photos

My objective was to Remove duplicate photos from a large collection while keeping the best version of each file. I Used jdupes to identify duplicates and a custom script to decide which files to keep.

The sources of duplication were that I imported photos from Google Takeout on one side, as well as from two or three iphones and an android phone. I suspect that Photosync might also contribute by encouraging the creation of a folder per device that we import from.

Custom Rules for Keeping Files

After running jdupe I set up custom rules that looked at file Naming: I told it to Prefer IMG_ or VIRB over hash-named files. iPhones, Android phones and photo cameras never, or rarely, use names that are hashes. These are usually created by Whatsapp and similar apps.
I chose to apply directory priority to Keep files in human-readable directories (e.g., "Spain bike ride") over generic ones (e.g., "Photos from 2018"). Google Takeout creates two or more folders. It creates a primary year folder with all photos from that year, as well as secondary event specific folders based either on the name we chose, for example 'spain bike ride' or date based, if we did not give a specific name.

In the final step I noticed that it seemed to be choosing to delete HEIC files, rather than .JPG/.JPEG files. As HEIC are usually the original I want to keep the original. Eventually I saw that we had duplicate HEIC files, in which case I allowed it to remove duplicates of this file type. Finally I noticed that video files were either kept as mov files or converted. I accepted to have a rule to choose.MOV over .MP4. I used Le Chat (The cat) to help me understand the output from jdupe runs.

Script Development and Testing

As we progressed through the project The Chat offered three types of automation. It suggested Digikam, Pillow, a bespoke python script or using exiftool. In several cases it Wrote a Python script to apply the rules and generate a list of files to delete based on the output from jdupe being run.

Testing and Iteration

Part of collaborating with AI tools is experimentation and iteration. It's about running a command, seeing the output, understand what you see, and then perfecting the command until you get what you want. It's also about seeing opportunities.

One of the scripts I got The Cat to run was to check the "to delete" list and check if they had exif data for the creation date. i.e. the date when photos were taken. When a script confirmed that this was the case, that's when the process of fine tuning the deletion script advanced. These are the rules we mentioned above.

Verification and Safety Checks

We ran a lot of dry runs. When you run dry-run jdupe it checks for duplicates and outputs to the terminal. When you have thousands of duplicates the terminal window forgets plenty of results. that's where writing to a text file helps. It's persistent.

The beauty of these text files is that they're light, and you can share them with The Cat and the cat, in some situations, will actually run the script you discussed with it, rather than outputting the python script. This differs from Gemini in two ways. First it runs the script, so you don't have to, but secondly if it reaches the token limit for script execution then it gives you the python script to run locally.

What is especially nice is that you can still keep "chatting" even if you reach that limit. It just won't run scripts internally.

Backup: Emphasized

Along the way The Cat constantly encourages you to make sure you have a backup before running a command. As I was working from a copy, rather than the primary library I felt safe to experiment. Eventually I did execute the command to delete the duplicates and ran jdupe one last time to ensure the duplicates were gone.

And Finally

While experimenting I hit the limitations of the free plan, first for code execution, and then for chat. I didn't intend for it to run scripts on files I uploaded. I think running scripts locally makes more sense. I uploaded the data for The Cat to get a better understanding of the data.

Hitting the data limit is a feature. It encourages us to take a break and work on something else.

What surprised me, yesterday, but again today, is that I get fatigued from playing with AI, because although large language models do some of the thinking, you still need to babysit them, and understand and supervise what they're doing.

#AI #cat #jdupe #machineLearning #mistral #photoprism
Roadworks on a Foggy Night

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst