#DataLad

Michael Hankemih@mas.to
2025-05-31

There is now a #gitAnnex package on #PyPi: pypi.org/project/git-annex/

This should make it simpler to deploy git-annex in Python virtual environments, also as versioned dependencies for software like #Datalad

Packages are built for Linux, Windows, and Mac via GitHub actions: github.com/psychoinformatics-d

Contributions to cover more platforms are most welcome!

Yann Büchau :nixos:nobodyinperson@fosstodon.org
2025-03-17

I want a build system that:

- is as powerful and flexible as #SCons
- as readable and concise as #SnakeMake
- has a fricking progress bar+ETA
- is :datalad: #datalad / :gitannex: #gitannex agnostic (knows that files can be fetched from elsewhere
- remembers how long building things takes
- balances that to decide if rebuilding locally instead of fetching gigabytes via slow internet is favorable
- integrates well with :nixos: #nix for reproducibility

#rdm #dataAnalysis

Michał Szczepanikdoktorpanik@masto.ai
2025-03-11

In the latest DataLad blog post I try out two changes which were introduced in git-annex within the last year: git-remote-annex Git remote helper (this is the big one!) and a small change to enabling WebDAV special remotes. They work brilliantly, and combined they enable read-only data publishing on Nextcloud instances.

blog.datalad.org/posts/annex-n

#datalad #gitAnnex #nextcloud

David Philip MorganDPMorgan@mastodon.world
2025-01-22

✨ Join the next upcoming Mannheim Open Science Meetup! ✨

🗞️ Topic: Reproducible Research Data Management with @datalad
🗣️ Speaker: @lnnrtwttkhn
📅 Date: Wed, Feb 26, 2025
⏰ Time: 2:00 PM
📍 Location: Online, sign up here: uni-mannheim.zoom-x.de/meeting

Why Attend?
✔️ Learn cutting-edge tools like Git, Docker & DataLad
✔️ Boost transparency & reproducibility in research

#OpenScience #ResearchDataManagement #DataLad #Reproducibility

Chris Markiewiczeffigies@mas.to
2025-01-03

Just set up a new Synology NAS box and installed forgejo-aneksajo (a git web UI with built-in git-annex support) on it: effigies.gitlab.io/posts/forge

Just a quick post that highlights what needed to be adapted from this earlier post on the #DataLad blog: mas.to/@mih/112880585950408351

#gitannex

Yann Büchau :nixos:nobodyinperson@fosstodon.org
2024-12-26

My :nixos: #NixOS always wants to bulid :datalad: #datalad from source and the tests take aaaages and neither override{,Python}Attrs doCheck=false nor pytestCheckPhase="" works to prevent it 😩

Simon Tournierzimoun@sciences.re
2024-10-22

@khinsen In the MOOC #ReproducibleResearch II, Do you speak about #DataLad for managing data set?

Well, git-annex is very nice but somehow the plumbing and so which porcelain? ;-)

fun-mooc.fr/en/courses/reprodu

git-scm.com/book/en/v2/Git-Int

Chris Markiewiczeffigies@mas.to
2024-09-19

@nobodyinperson The `datalad-fuse` extension allows you to use `datalad fsspec-head` to achieve this. I believe it uses git-annex to find a remote URL and then Python's `fsspec` to do the actual fetch. #DataLad

Michael Hankemih@mas.to
2024-08-27

In a new article, I take a look at #Forgejo for hosting laaarge #Datalad datasets. I am talking about datasets with millions of files. Or rather millions of #gitAnnex file pointers.

...and...

It works really nice, right out of the box! Millions of files in thousands of datasets. Not even a reason to switch away from a #SQLite database. A dual-core VM with 2-3GB of RAM should be good enough.

blog.datalad.org/posts/forgejo

Michał Szczepanikdoktorpanik@masto.ai
2024-08-19

A new post on DataLad blog: sharing my experiences from implementing a DataLad workflow, inspired by the existing "FAIRly big" paper, to cut and publish conference videos, on a cluster.

Looking back, everything seems streamlined and logical, but getting there involved discovering the fine details of DataLad, git-annex, HTCondor, bash (and also Matroska metadata and video codecs). Hope it's an useful take.

blog.datalad.org/posts/fairly-

#datalad #gitAnnex #HTCondor #metadata #workflow #distribits

Michael Hankemih@mas.to
2024-07-31

Here is another blog post on #Forgejo. This time looking into a user-space deployment with #podman and #systemd.

This combination really rocks! It feels like managing any other non-containerized service. The integration with podman v4.4+ (quadlets) is even better.

blog.datalad.org/posts/forgejo

#selfhosted #rdm #datalad

Yann Büchau :nixos:nobodyinperson@fosstodon.org
2024-07-24

That thing really kicks! Not having to think about where to separately put your large files is a major workflow boost.

I wonder how easy it is to get that fork working in :nixos: #NixOS, where a simple `services.forgejo.enable=true` should already fire up a #Forgejo.

@mih

#gitAnnex #git #DataLad

Michael Hankemih@mas.to
2024-07-23

If you find yourself in need to reduce storage demand with many #gitAnnex repositories that (may) contain identical keys:

```
find /data \
-type d -name 'objects' -wholename '*/annex/objects' \
| rmlint -g -b \
-S 'r</data/original/*>Hma' --no-crossdev -f -T duplicates \
--no-hardlinked -c sh:handler=hardlink - \
&& ./rmlint.sh -d -k -n
```

It hardlinks any key file anywhere under /data (preferably) to its respective "original" under /data/originals.

#datalad #rdm #unix

Michael Hankemih@mas.to
2024-07-22

A post on a missing key piece in the DataLad world: an easy, free and open-source solution for self-hosting DataLad datasets with annex'ed files.

Now there is a #Forgejo variant with built-in git-annex support! Straightforward to deploy (even at home or in a small lab on a #RaspberryPi), and super convenient to use.

blog.datalad.org/posts/forgejo

#datalad #gitAnnex #selfhosted #foss #rdm

2024-07-18

@zimoun You had mentioned #DataLad a few times a while back.

Maybe this article ^^ is as interesting for you to read as it was for me. It's a great simple use case, which explains the basic workflows with #DataLad in a nice way and context.

Michael Hankemih@mas.to
2024-07-18

A post on one step of my journey to get off of commercial cloud services: A personal music streaming service with #Navidrome running on a #RaspberryPi at home that does this on the side. All with minimal overhead and powered by #GitAnnex data logistics.

blog.datalad.org/posts/navidro

#nocloud #blog #musicstreaming #datalad #distributed #foss #diy #smarthome

Yann Büchau :nixos:nobodyinperson@fosstodon.org
2024-06-15

I published tutorial videos to set up :git: #git, :gitannex: #gitAnnex and :datalad: #DataLad on :linux: #Linux (Mint), :windows: #Windows and :mac: #MacOS over on my account @nobodyinperson@tube.tchncs.de :

tube.tchncs.de/w/p/4as7A1ZDo7F

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst