Tommi Mäklin

I do statistics, metagenomics/bacterial genomics & bioinformatics software development.
Postdoc @ University of Helsinki

2023-07-15

This vid about optimizing Mario 64 source code just for the fun of it is coincidentally also a pretty neat example of why/when function inlining and loop unrolling are useful youtube.com/watch?v=t_rzYnXEQl

2023-07-13

@rupdecat yep, it's pretty great for the unexpected!

Tommi Mäklin boosted:
Vidar Hokstad / Galaxy Boundvidar@galaxybound.com
2023-07-13

Beating neural approaches to sentence classification for out-of-domain datasets with ... nearest neighbour over *gzip* (finding the ratio between the compressed size of the documents combined vs. smallest compressed size of the inputs being compared).

This is both hilariously simple, and somewhat damning of how the other methods do in this context.

aclanthology.org/2023.findings

#computerscience #programming #machinelearning

Screenshot of a tweet by Riley Goodside @goodside

4h
this is wild — kNN using a gzip-based distance metric outperforms BERT and other neural methods for OOD sentence classification

intuition: 2 texts similar if cat-ing one to the other barely increases gzip size

no training, no tuning, no params — this is the entire algorithm:

Image w/Python code from tweet:

for (x1, _) in test_set:
  Cx1 = len(gzip.compress(x1.encode()))
  distance_from_x1 = []
  for (x2, _) in training_set:
    Cx2 = len(gzip.compress(x2. encode())
    x1x2 = " ".join([x1, x2])
    Cx1x2 = len(gzip.compress(x1x2. encode())
    ncd = (Cx1x2 - min(Cx1,Cx2)) / max(Cx1, Cx2)
    distance_from_x1.append(ncd)
  sorted_idx = np.argsort(np.array(distance_from_x1))
  top_k_class = training_set[sorted_idx[:k], 1]
  predict_class = max(set(top_k_class), key=top_k_class.count)
Tommi Mäklin boosted:
2023-07-13

Hey #bioinformatics , #OpenScience and #computerscience hive minds of mastodon, I have need of your help.

See genomic.social/@MrHedmad/11060

I need a way to make our lab data FAIR without having to explain or use anything very complex. RO-crate does not cut it, for our personal use-case. So I came up with github.com/MrHedmad/data-myr

Can anyone take a look and tell me if:
- It is dumb / useless / makes no sense
- There is something identical to it

and any other feedback. Thank you! Love you! xoxoxoxo

2023-07-13

@rupdecat the Finnish research HPC maintainers actually built a set of wrapper scripts to handle the filesystem load that individual users' conda installations cause github.com/CSCfi/hpc-container

It's unfortunate that conda is so incompatible with common HPC setups but something like this seems like an interesting temporary solution.

2023-07-10

Wrote a blogthing about how chatgpt is pretty bad at being a self-replicating machine maklin.fi/post/computer-scienc

2023-07-09

@apposada For computer hardware there's the Free Software Foundation's "Respects your freedom" scheme which is a bit on the extreme side of views on open hardware (ryf.fsf.org/) and mainly certifies >10 year old hardware. For phones and gadgets the Fairphone/Pinephone/Purism trio promises 5-year software support and repairability for most of their products.

The communities overlap a lot with the open source ones, so there isn't really any one dedicated to hardware.

Tommi Mäklin boosted:
Alberto Perez-Posadaapposada@genomic.social
2023-07-09

People on the #FOSS community: the world of Open Source Software is vast an well documented, with numerous past and present examples of communities commited to promote and educate its philosophy and principles. But what about #hardware ? Is there anything similar about the hardware? I’d like to start reading more on sustainable hardware (i.e. devices that can be repaired, that have long term support, software designed to keep supporting them, sustainable materials, etc). Thanks!

Tommi Mäklin boosted:
2023-07-09

Most baffling thing about 'geek culture' is how ultra-conservative it is. People who grew up shouting "hack the planet" grew up to be averse against the smallest changes, from the small scale ("well it works for *me* so why should we change this 15 step setup") to the large ("i refuse to think about the social impacts of my work")

Tommi Mäklin boosted:
Bede Constantinidesbede@mstdn.science
2023-07-08

Preprint: How best to remove human reads from microbial FASTQs? Our tool Hostile removed >99.6% of human reads while retaining >99.997% and 100% of simulated reads in bacterial and mycobacterial metagenomes
biorxiv.org/content/10.1101/20

Tommi Mäklin boosted:
2023-07-03

2008 v. 2023: I don’t think people realize the extent to which parked cars degrade public space.

Photo from 2008 street full of parked carsPhoto taken by me in 2023 showing a row of people on bikes but no cars.
Tommi Mäklin boosted:
2023-07-03

If the server costs are too much, have they considered #Twitter by mail? One of these bad boys can fit 1440000 / 280 = 5142 tweets! Just mail one out once a week! That’s 734 tweets a day! Problem solved!

Photo of 3 high density 3.5 inch floppy disks labeled with their capacity of 1.44 MB.
2023-07-01

@milotmirdita we really need to figure out how to better support software projects long-term without relying 100% on volunteers or lone devs.

2023-07-01

@milotmirdita debian-med looks interesting, thanks. I don't want to drop bioconda, really, since it's become somewhat standard for many, but rather provide alternatives for when problems occasionally pop up. Arm64 support is definitively a headache, though.

2023-07-01

@bgruening oh I didn't realize that issue is fixed, thanks! I admit I've been a bit frustrated with osx builds lately but it's mostly about how Apple does things rather than (bio)conda.

Thanks for putting your time into what's a really valuable resource for many people!

2023-07-01

bonus points if they have better (any) documentation for creating the "recipes"

2023-07-01

bioconda seems to be having lots of issues with builds failing recently, are there any alternatives that people use?

Tommi Mäklin boosted:
Adam Dalliancepre@boing.world
2023-07-01

Today
* Reddit ends free API access.
* Twitter turns off anonymous reading.
* Youtube is talking about banning ad-blocking users.

The tech industry was living on cheep money and low interest rates, and now they're all afraid to let their precious content get used for AI training.

The walls are going up, the lawful corporate web is collapsing in on itself.

2023-06-30

@pence surely that is the way the devs intended it

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst