@joeyh I'll have to play with this as soon as it's available in the nightly.
Postdoctoral fellow in plant-pathogen genomics w/ Detlef Weigel, MPI Tübingen. I work with clear fluids in small tubes, buggy code in vim, and muddy boots in search of wild plants and their pathogens. He/him.
Also consulting in HPC, genomics & bioinformatics to non-profit & commercial clients.
Australian, formerly with the Borevitz Lab, ANU, Canberra.
@joeyh I'll have to play with this as soon as it's available in the nightly.
@joeyh This will be *amazing* in combination with e.g. snakemake. "just" set the compute program to `snakemake`, along with an appropriate snakefile, and hey presto!
@Psy_Fer_ @lwpembleton love this idea, and agree this is the best way to learn the praxis of bioinformatics which is less often formally taught (as well as the principles, of course).
I would be more than happy to contribute some A. thaliana examples from here in the Weigel group & friends, which have the advantage of being a massively over-studied system, and to be uncharitable to our work, basically redo the same papers every 5 years with new sequencing tech.
I'm sure you're already aware of this clusterf*@#, but in case not, take a look at https://kdmurray.id.au/post/2022-07-13_conda-bashrc/
It's the only way i've found to have conda work reliably on most clusters.
@lwpembleton sure, but it's *really* dumb: system("libreoffice --calc {tmpfile}")
On KDE/wayland, this recently only works if I already have libreoffice open
The function I mentioned is here, though as I said, it's pretty basic.
https://gist.github.com/kdm9/3d07e320844930a9edc2c87766d7e2d7
@lwpembleton I even have a function that write_csv()'s to a tmp file and opens it in libreoffice, which is super useful for this sort of debugging
@lwpembleton likewise, `write_csv()` and friends from `readr` return their inputs, so you can insert lots of write_csv() at various points of a pipeline to see how things change at multiple points
This has been a huge effort over many years, lead by Luisa Teadale (not on masto), @coevolution, and myself, with major contributions from many in the Weigel Group, so we're very happy to be sharing it at last.
Our main finding is that many uncorrelated mutational processes create NLR diversity, and that there is no single metric that captures on its own the true extent of NLR structural and sequence variation. We also show that NLRs themselves are drivers of elevated pangenome complexity, rather than mere passengers on diverse regions.
3/n
We exhaustively annotated all intact and degraded NLRs within complete genomes, using long read RNAseq and pangenome-wide TE annotation. We use pangenome graphs to explore the diversity of NLRs across accessions, and to define regions containing NLRs in a principled way.
2/n
At last, we're able to share our latest work on #plant #immune system #pangenomes, focusing on #NLR genes in #arabidopsis
"Pangenomic context reveals the extent of intraspecific plant NLR evolution"
1/n
https://biorxiv.org/cgi/content/short/2024.09.02.610789v1
#plantscience #plants #genomics #plant_pathogens #pathogens #coevolution
The big cluster I have access to allows us to run only 8 batch scripts at once, which if you use snakemake's "normal" slurm profile, limits you to 8 possible concurrent jobs (modulo grouping). With this profile, I can now request a single job of say 64 nodes, and have snakemake `srun` each job independently inside that big job, for a *much* faster runtime.
I've decided I'll start a new series of posts: solutions to problems I hope nobody else has.
First installment: A #snakemake profile for #SLURM #HPC clusters which limit the number of distinct jobs you can run, but not the number of CPUs you can use at once: https://github.com/kdm9/snakemake_executor_plugin_mpcdf
(1 of 2ish)
@yokofakun I think just increment the build number, but I'm not 100% sure (it's what I've done in the past, but that may well not be the proper sanctioned approach)
@josephguhlin @PhilippBayer this looks damn awesome @josephguhlin! Can it be indexed by other metadata, especially taxid? If not, any plans to?
@PhilippBayer it sure seems like a pretty cool tool, we're working on getting some models for brassicaceae up and running
@pjacock haven't tried, but does --scheduler greedy work around this?
@zerodivision @PhilippBayer @naturepoker Indeed, biocontainers are great, though I often prefer to make containers with multiple tools in one container for some pipeline stage (e.g. `bwa | samtools`) -- more analogous to a conda environment than a conda package.
@PhilippBayer @naturepoker (see e.g. https://github.com/kdm9/bfx-containers/ if you want a copy-pasteable github actions script to do this yourselves)