#ReproducibleComputing

Christian Meestersrupdecat@fediscience.org
2025-06-16

The #isc25 is over and I half-recovered from the weekend, too. Time to continue my thread summing up the #SnakemakeHackathon2025 !

To me, an important contribution was from Michael Jahn from the Charpentier Lab: A complete re-design of the workflow catalogue. Have a look: snakemake.github.io/snakemake- - findability of ready-to-use workflows has greatly improved! Also, the description on how to contribute is now easy to find.

A detailed description has been published in the #researchequals collection researchequals.com/collections under doi.org/10.5281/zenodo.1557464

#Snakemake #ReproducibleComputing #ReproducibleResearch #OpenScience

Christian Meestersrupdecat@fediscience.org
2025-06-13

Returning from the #isc25 I will continue this thread with something applicable everywhere, not just on #HPC clusters:

Workflow runs can crash. There are a number of possible reasons. Snakemake offers a `--rerun-incomple` flag (or short `--ri`) which lets a user resume a workflow.

This contribution from Filipe G. Viera describes a small fix to stabilize the feature. Not only will incomplete files be removed after a crash, now it is ensured that all metadata with them are deleted too, before resuming: zenodo.org/records/15490098

#Snakemake #SnakemakeHackathon2025 #ReproducibleComputing #OpenScience

Christian Meestersrupdecat@fediscience.org
2025-06-11

Today tooting from the #ISC25 - the International Supercomputing Conference. What better opportunity to brag about something I've done to facilitate using GPUs with Snakemake?

Here is my contribution, simpler job configuration for GPU jobs:

doi.org/10.5281/zenodo.1555179

Not alone though: Without valuable input of @dryak . Without him, I would have overlooked something crucial.

And when we talk about reproducible AI, my take is that we ought to consider workflow managers, too. Something which protocols what you have done with little effort.

#SnakemakeHackathon2025 #Snakemake #ReproducibleComputing #OpenScience

Christian Meestersrupdecat@fediscience.org
2025-06-10

This morning, I am travelling to the #isc25 and hit a minor bug on #researchequals. Hence, no updates in the collection.

But still a few to describe without adding the latest contributions:

For instance, this one (zenodo.org/records/15490064) by Filipe G. Vieira: a helper function to extract checksums from files to compare with checksums Snakemake was already able to calculate. Really handy!

#Snakemake #ReproducibleComputing #OpenScience

Christian Meestersrupdecat@fediscience.org
2025-06-09

Before I continue uploading - and I do have a couple of more contributions to add to the #ResearchEquals collection - first another contribution by Johanna Elena Schmitz and Jens Zentgraf made at the #SnakemakeHackathon2025

One difficulty when dealing with a different scientific question: Do I need to re-invent the wheel (read: write a Workflow from scratch?) just to address my slightly different question?

Snakemake already allowed to incorporate "alien" workflows, even #Nextflow workflows, into desired workflows. The new contribution allows for a more dynamic contribution - with very little changes.

Check it out: zenodo.org/records/15489694

#Snakemake #ReproducibleComputing #OpenScience

Christian Meestersrupdecat@fediscience.org
2025-06-06

Let's take a look at another contribution of Johanna Elena Schmitz and Jens Zentgraf from the #SnakemakeHackathon2025

Snakemake users probably know that

`$ snakemake [args] --report`

will generate a self-contained HTML report. Including all plots and #metadata a researcher's heart longs for.

Now, why trigger this manually? If the workflow runs successfully, now we can write (or configure):

`$ snakemake [args] --report-after-run`

and Snakemake will autogenerate the same report.

For details see doi.org/10.5281/zenodo.1548976

#Snakemake #ReproducibleComputing
#OpenScience

Christian Meestersrupdecat@fediscience.org
2025-06-05

One important feature implemented in the #SnakemakeHackathon2025 : Snakemake will calculate file checksums to detect changes. If a file changes, the rule producing it needs to be re-executed when a workflow it re-triggered. But what if a file is too big for reasonable checksum calculation? You do not what to wait forever, after all.

This contribution describes the implementation of a threshold users may set: doi.org/10.5281/zenodo.1548940

#Snakemake #ReproducibleComputing #OpenScience

Christian Meestersrupdecat@fediscience.org
2025-06-04

One important bug fix during the #SnakemakeHackathon2025 : the config replacement. Now, users can overwrite existing configurations entirely with `--replace-workflow-config`.

Details: zenodo.org/records/15479268

More at researchequals.com/collections

#Snakemake #ReproducibleComputing #openscience

Christian Meestersrupdecat@fediscience.org
2025-06-02

Did you know? During the #SnakemakeHackathon2025 we had a staggering 194 work items!

It took a while, but now we are gathering contribution reports and present them online as a ResearchEquals (fediscience.org/@ResearchEqual) collection:

researchequals.com/collections

The first 10 are online and I will post some highlights in the coming weeks.

#Snakemake #ReproducibleComputing #ReproducibleResearch #OpenScience

Christian Meestersrupdecat@fediscience.org
2025-04-28

Have been on a little holiday and meanwhile my pull request for the HPC Certification Forum (#HPCCF - we just _need_ a hashtag 😉 ) on workflow management systems (that is the skill definitions) has been approved! Thanks to @jkunkel1 !

To me, it is important that the #HPC community realizes the value of automation in data analysis which, in turn, is a great leap for #ReproducibleResearch !

#OpenScience #ReproducibleComputing

Christian Meestersrupdecat@fediscience.org
2025-03-27

The #SnakemakeHackathon2025 just induced another collaboration for me!

It is always good to work along with like-minded people, interested in #ReproducibleComputing, #HPC and #Bioinformatics

#OpenScience

Christian Meestersrupdecat@fediscience.org
2025-03-12

Today is the day of closed pull request for #Snakemake. The #SnakemakeHackathon2025 participants worked at full speed!

We decided to write a white-paper summarizing our achievements rather than posting individual things. Suffice to say, that also the documentation made a great leap towards better readability!

#OpenScience #ReproducibleComputing #ReproducibleResearch

Christian Meestersrupdecat@fediscience.org
2025-03-10

#SnakemakeHackathon2025 ! We started!

At the CERN for better #ReproducibleComputing and #ReproducibleResearch .

Majority of all participants to the hackathon gathered for a photo at the CERN.
Christian Meestersrupdecat@fediscience.org
2025-02-28

There are many HPC admins who prohibit using considerable CPU time on login nodes. This is understandable.

I want to take this opportunity to provide a data point. My student has measured the accumulated CPU time (user + system) for a 9 h (precise: 33343 s) run of a Snakemake workflow. It was 225 s or about 0.67 % - including jobs which were carried out on this login node, e.g. `mv`, `ln` or download of data.

There is certainly room for improvements. There will ever be room for improvements.

But my dear fellow admins: Running Snakemake on login nodes as a shepherd of jobs, will impair nobodies work.

Over time, I will certainly gather more and different statistics. And will invest time in necessary improvements. Regarding CPU time for checking job status, however, I believe to have demonstrated, that this is a pretty high hanging fruit.

#HPC #Snakemake #reproduciblecomputing

Christian Meestersrupdecat@fediscience.org
2025-02-25

I will continue to find it disturbing, if new #HPC cluster users explicitly instruct a program to use one core/cpu only and then complain that the cluster is so slow. Slower than their basement server.

Usually they do not spot their mistake on their own.

But THIS is actually NOT the disturbing part: such users also tend to always use default parameters. This might or might not be the sensible thing to do for their problem. Also, when reading papers, software parameterization frequently is not reported.

We have a long way to go.

#ReproducibleResearch #reproduciblecomputing

Christian Meestersrupdecat@fediscience.org
2025-01-15

Again, we have a new patch-level release of the #Snakemake executor for #SLURM on #HPC systems.

Turned out, that some clusters do not allow account checking with `sacctmgr` (which was there for historic reasons), hence, we now have a fall-back to `sshare`.

See github.com/snakemake/snakemake for details.

#OpenScience #ReproducibleResearch #ReproducibleComputing

Christian Meestersrupdecat@fediscience.org
2025-01-08

We have a new release for the #SLURM support plugin of #Snakemake !

It's a minor feature release which enables custom log directories and auto-deletion of successful job logs (such that not zillions of zero-meaning files accumulate!).

Check it out at: github.com/snakemake/snakemake

It took a while to get to this release (stress, family, and sickness took a toll). Hopefully, a future release will not take that long to be realized — the feature request list is considerable. 😉

#HPC #ReproducibleResearch #ReproducibleComputing #OpenScience

2024-01-13

Explore the world of reproducible computing with NixOS in Matthew Croughan's talk at #SCaLE21X! Discover how to turn your old phone into a Linux computer, experiment with AI models, and more. #NixOS #ReproducibleComputing socallinuxexpo.org/scale/21x/p

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst