#Deduplication

Nicolas Frรคnkel ๐Ÿ‡ช๐Ÿ‡บ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ฌ๐Ÿ‡ชfrankel@mastodon.top
2026-02-06
๐Ÿ…น๐Ÿ…ด๐Ÿ…ณ๐Ÿ…ธ๐Ÿ…ด ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ•Š๏ธjedie@chaos.social
2026-01-15

Hab mein PyHardLinkBackup komplett neu geschrieben. Ursprรผnglich 2015 gestartet und bis 2020 genutzt, schlief es jetzt fast 6 Jahre...

Aber als ich รผber alte damit erstellte backups gestolpert bin, hab ich mir gedacht, das Konzept ist doch ganz nรผtzlich.

Also kompletter rewrite: github.com/jedie/PyHardLinkBac

#backup #OpenSource #Python #deduplication #hardlinks

Screenshort, wenn PyHardLinkBackup gerade lรคuft.Screenshort, wenn es fertig ist, mit einer "Summary"
Schenkl | ๐Ÿณ๏ธโ€๐ŸŒˆ๐Ÿฆ„schenklklopfer@chaos.social
2025-12-24

TIL: #XFS kann #Snapshots aber keine #Compression, aber dafรผr #deduplication, wenn auch noch experimental

Wer ein #snapshot artiges Backup fรผr #Linux sucht, kรถnnte sich #kopia ansehen.
รœber Regeln sehr fein granuliert einstellbar.
Es hat mich allerdings jetzt fast eine Woche gekostet, es so zum laufen zu bringen, wie ich es mir gewรผnscht habe. Aber mit viel #scriptโ€™en hat alles geklappt.
#deduplication und #kompression, schnell und easy.
Sehr zu empfehlen.

Whatisgoingonthemipper
2025-11-17

And once in a while I cleanup the external libraries with

This is an amazing software for of image folders.

github.com/qarmin/czkawka

Kevin Karhan :verified:kkarhan@infosec.space
2025-11-15

@stratosphere it's always interesting what's constantly trying to hack...

Also one little feedback: It would be cool to see some "#deduplication" or rather reduction of entries by using #CIDR notation which should save a lot of lines = table entries compared to the single IPs.

  • Tho I'm shure this is not done for ease of statistics and further research down the line.
Hacker Newsh4ckernews
2025-10-28

Sick: Indexed deduplicated binary storage for JSON-like data structures

github.com/7mind/sick

Python Job Supportpythonjobsupport
2025-10-01

Part 1 : Data Pre-processing Essentials || || Data Cleansing.

Learn PySpark data pre-processing with our tutorial! Learn the art of filtering and deduplication, essential techniques for cleaning ... source

quadexcel.com/wp/part-1-pyspar

Paula Gentle on Friendicagehrke_test@libranet.de
2025-09-20

Ich hab mal versucht, die Speicheroptimierung durch #Deduplication beim #Backup mit #restic zu quantifizieren. Dies nach einer Laufzeit von knapp 2 Jahren.

Herausgekommen ist: 22,4%

# restic stats latest 
repository d989459c opened successfully, password is correct
scanning...
Stats in restore-size mode:
Snapshots processed:   1
   Total File Count:   438037
         Total Size:   23.271 GiB

# restic stats latest --mode raw-data 
repository d989459c opened successfully, password is correct
scanning...
Stats in raw-data mode:
Snapshots processed:   1
   Total Blob Count:   265960
         Total Size:   18.409 GiB

Hoffe, das richtig interpretiert zu haben.

restic.readthedocs.io/en/stablโ€ฆ

Sylvain Lesagesevero
2025-09-05

Some design tests for ๐ŸŸฅ ๐ŸŸฉ diff'ing files hosted on Hugging Face.

With the new storage backend (Xet), similar files share many data "chunks".

As HF provides an API to get the list of these chunks, you can now compute the diff and display a chart in the browser without downloading the files. Very fast.

Test here: observablehq.com/@severo/parqu

Details about the three charts in the responses.

Diff between Parquet files - green: same, red: added. The diff is represented by a square, with 32 rows of changes.Diff between Parquet files - flesh: same, green: added, red: removed. All the changes are shown in two rows: above for the old file, and below for the new file.Diff between Parquet files - flesh: same, green: added, red: removed. The diff is represented by a square, with 32 rows of changes.
otris systems GmbHotrissystems
2025-07-31

๐Ÿ”— ๐—ž๐—ฒ๐—ป๐—ป๐—ฒ๐—ป ๐—ฆ๐—ถ๐—ฒ ๐˜€๐—ฐ๐—ต๐—ผ๐—ป ๐—ฑ๐—ถ๐—ฒ ๐—œ๐—ป๐˜๐—ฒ๐—ด๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜ƒ๐—ผ๐—ป ๐—ฃ๐—ฟ๐—ผ๐˜…๐—บ๐—ผ๐˜… ๐—•๐—ฎ๐—ฐ๐—ธ๐˜‚๐—ฝ ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ (๐—ฃ๐—•๐—ฆ) ๐—บ๐—ถ๐˜ ๐—ฃ๐—ฟ๐—ผ๐˜…๐—บ๐—ผ๐˜… ๐—ฉ๐—ถ๐—ฟ๐˜๐˜‚๐—ฎ๐—น ๐—˜๐—ป๐˜ƒ๐—ถ๐—ฟ๐—ผ๐—ป๐—บ๐—ฒ๐—ป๐˜?

Wer ๐—ฃ๐—ฟ๐—ผ๐˜…๐—บ๐—ผ๐˜… ๐—ฉ๐—˜ nutzt, sollte ๐—ฃ๐—•๐—ฆ einsetzen:
๐Ÿงฉ๐—ก๐—ฎ๐—ต๐˜๐—น๐—ผ๐˜€๐—ฒ ๐—œ๐—ป๐˜๐—ฒ๐—ด๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป
โš™๏ธ๐—•๐—ฎ๐—ฐ๐—ธ๐˜‚๐—ฝ-๐— ๐—ผ๐—ฑ๐—ถ ๐—ณรผ๐—ฟ ๐—บ๐—ฎ๐˜…๐—ถ๐—บ๐—ฎ๐—น๐—ฒ ๐—ž๐—ผ๐—ป๐˜๐—ฟ๐—ผ๐—น๐—น๐—ฒ
๐Ÿ’พ๐——๐—ฒ๐—ฑ๐˜‚๐—ฝ๐—น๐—ถ๐—ฐ๐—ฎ๐˜๐—ถ๐—ผ๐—ป & ๐—ฃ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—บ๐—ถ๐˜ ๐—ฃ๐—•๐—ฆ
๐Ÿš€๐—™๐—น๐—ฒ๐—ฒ๐—ฐ๐—ถ๐—ป๐—ด ๐—ณรผ๐—ฟ ๐—ฉ๐— -๐—•๐—ฎ๐—ฐ๐—ธ๐˜‚๐—ฝ๐˜€
๐Ÿ“ฆ๐—–๐—ผ๐—ป๐˜๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฟ-๐—•๐—ฎ๐—ฐ๐—ธ๐˜‚๐—ฝ๐˜€ ๐—บ๐—ถ๐˜ ๐—–๐—ต๐—ฎ๐—ป๐—ด๐—ฒ ๐——๐—ฒ๐˜๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป
๐Ÿ”๐—ž๐—ผ๐—บ๐—ฝ๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป & ๐—ฅ๐—ฒ๐˜๐—ฒ๐—ป๐˜๐—ถ๐—ผ๐—ป

๐Ÿ“ž ๐—ฆ๐—ถ๐—ฒ ๐—บรถ๐—ฐ๐—ต๐˜๐—ฒ๐—ป ๐—œ๐—ต๐—ฟ๐—ฒ ๐—•๐—ฎ๐—ฐ๐—ธ๐˜‚๐—ฝ-๐—ฆ๐˜๐—ฟ๐—ฎ๐˜๐—ฒ๐—ด๐—ถ๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—ฟ๐—ป๐—ถ๐˜€๐—ถ๐—ฒ๐—ฟ๐—ฒ๐—ป?

Sprechen Sie mit uns ๐Ÿ‘‰ 0441-309197-69

Hacker Newsh4ckernews
2025-07-20

Borg - Deduplicating Archiver with Compression and Encryption

borgbackup.org/

Stรฉphane Kleinstephane_klein@social.coop
2025-06-16

J'ai lu le trรจs bon billet d' @athoune sur #Kloset, moteur de stockage de backup de #Plakar

notes.sklein.xyz/2025-06-16_16

#TIL #Backup #Restic #Deduplication

BLACKVOID โšซ๏ธblackvoid
2025-05-25

Testing some more concurrent operations on the new + as part of the upcoming

Dual remote machines (testing the in the process), along a simple file copy onto an external enclosure.

(yes 3rd party external drives work just fine!)

The NAS is nice and calm with no performance issues at all.

The OpenAIRE GraphOpenAIREGraph
2025-05-21

Happening now! In this month's , Training & Engagement Officer Stefania Amodeo takes us on a journey through the Graph process that uses a combination of advanced algorithms & human expertise. Missed out? All past materials can be found on the Graph portal, today's to be uploaded in the coming days.

Past Calls graph.openaire.eu/community-ca

Screenshot of presentation introductionScreenshot of presentationScreenshot of presentationScreenshot of presentation
2025-04-06

anyone know of an implementation of the CPM-SW deduplication algorithm of this paper?

[edited to change the url to the abstract page rather than direct to the pdf]

#deduplication #algorithm

Peter N. M. Hansteenpitrh
2025-03-14

The Problem Isn't Email, It's Microsoft Exchange -- it turns out my 2011-vintage rant still rings true, now also available trackerless: nxdomain.no/~peter/the_problem

2025-02-24

DB2 Query Deduplication: Optimizing Large Datasets with ROW_NUMBER()
Learn efficient DB2 Query Deduplication using ROW_NUMBER() for large datasets. Optimize your queries with CTEs & indexing for smoother performance.
tech-champion.com/database/db2
Learn how to efficiently deduplicate large DB2 datasets using ROW_NUMBER() and optimize query performance. ...

BLACKVOID โšซ๏ธblackvoid
2025-02-22

Testing and appliance by making the ISO bare metal recovery media.

Let's see how fast this little machine can recover a 50GB setup.

The actual and are working really well.

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst