Noel O'Boyle

Guided by the science.

A contributor to Open Source and commercial #cheminformatics software for many years. Now working in a biotech leading the cheminformatics line as part of a computational chemistry group.

Noel O'Boyle boosted:
OggCampoggcamp
2026-02-13

Hey! We're doing an open source and free culture unconference in Manchester on April 25-26. Affordable, family-friendly, lots of fun maker and podcast-y and coder-ish stuff to discuss. Have you got your ticket yet? There's still time to submit a talk for the main track, as well.

Tell your friends!

oggcamp.org/

2026-01-17

Blogpost that looks at how LLMs have improved at 'reasoning' over time. This is a key capability that enables many scientific workflows.

baoilleach.blogspot.com/2026/0

Noel O'Boyle boosted:
2026-01-15

Ensembl is hiring!!

We are on the lookout for a Senior Platform Developer to join our team.

“In this role, you will help shape the Ensembl platform’s technical direction, applying your expertise to build reliable, scalable systems and guide best practices across teams.”

Based in South Cambridgeshire, UK

Please boost or apply!

embl.wd103.myworkdayjobs.com/e

#jobs #getFediHired #python #devops #science #fediHire

Office building at sunset. The sunlight is reflecting off the glass. It’s very pretty.
Noel O'Boyle boosted:
Emil Jacobs - Collectifissioncollectifission@greennuclear.online
2026-01-15

The rise and fall of Stack Overflow is a case in point of the parasitic nature of LLMs. LLMs feed their models on places like Stack Overflow to be useful to users, so users flock to them to avoid the eternal snarky comments and just get an answer to their problem right away. But this is a dead end. No new answers will be generated if no one uses Stack Overflow or similar places.

What goes for Stack Overflow goes essentially for the whole internet. Like a mold growing on food, consuming it, and dying once the food is gone - LLMs will kill large parts of the 'old' internet before long.

Graph showing rise and fall of Stack Overflow usage between 2008 and 2026 in numbers of questions asked per month. There's a clear bell curve visible coming from 0 and going to 0.
Noel O'Boyle boosted:
Egon Willighagenegonw@social.edu.nl
2026-01-06

looking at @dalke's "Superimposed Coding of Count Fingerprints to Binary Fingerprints" doi.org/10.26434/chemrxiv-2026

"This paper proposes a novel method based on random superimposed coding to convert count fingerprints to binary fingerprints such that the binary Tanimoto similarity between two binary fingerprints better approximates the multiset Tanimoto similarity between their original count form."

#cheminformatics

2026-01-05

Last chance (closing dates Jan 11) to apply for open positions in my team:
ebi.ac.uk/about/teams/chemical

The first is Technical Lead for the team - this is suitable for someone with relevant experience with either a scientific or computing background.

We also have two positions between ourselves and Open Targets as part of a collaboration to develop a resource that captures drug side effect information. This is advertised as NLP Data Scientist/Scientific Data Engineer.

Boosts appreciated!

2026-01-03

@egonw @rupdecat I haven't described the problem I'm working on, but I need a test set identified via an orthogonal approach (e.g. manually) to evaluate performance.

Noel O'Boyle boosted:
Andrew Dalkedalke@toots.nu
2026-01-03

ChemRXiv has accepted my #cheminformatics preprint "Superimposed Coding of Count Fingerprints to Binary Fingerprints". It is available at chemrxiv.org/engage/chemrxiv/a .

2026-01-03

@egonw @rupdecat yes to Christian (in progress, will post here) and for Egon, assessment of model predictions is no different for this than for anything else. You need a gold standard and of course inspect the results.

2026-01-02

Spent Christmas playing with OpenAI API for first time. With careful use of dictionary filters and a less accurate model (gpt-5-nano) to gate keep, I've essentially run all PubMed abstracts through a classification prompt with GPT5.2 for <$20. Skipping the nano model and running directly would still be <$200.

The hardest part is dealing with the batch API, rate limits, etc. There's probably a business in this somewhere, a website that allows biologists to run these analyses over PubMed.

Noel O'Boyle boosted:
2025-10-27

Ontologies4Chem Workshop

Registration for on-site participation is closed.
However, we are offering the opportunity to participate online.
Details refer to the agenda.
t1p.de/83bom

#chemistry #Chemie #rdm #researchdata #Forschungsdaten #fairdata #workshop #openscience #ontology

2025-09-05

@dalke @cthoyt even so. Just flag it up.

2025-09-05

@dalke @cthoyt Hi Andrew. Can you email help with any unusual situations? Would be great to get problems ironed out as we need to transition to ChEBI 2 in a compressed timeframe.

Noel O'Boyle boosted:
Charles Tapley Hoytcthoyt@scholar.social
2025-08-26

I used chembl-downloader to create some nice charts on how the number of compounds, assays, activities, and other entities in ChEMBL have grown over time

📖 cthoyt.com/2025/08/26/chembl-h

#chembl #chemistry #chemometrics #chemoinformatics #cheminformatics #rdkit #cdk #proteochemometrics

2025-08-17

@egonw I would worry that the stereo variants may not be possible due to a plane of symmetry, or a ring preventing the inversion.

Noel O'Boyle boosted:
Andrew Dalkedalke@toots.nu
2025-08-15

It's official - the upcoming chemfp 5.0 release will have limited support sparse count #cheminformatics fingerprints, in addition to the normal binary fingerprints.

The new format is "FPC", a variant of the FPS format. Details at chemfp.com/fpc_format/.

There will also be "rdkit2fpc" for the four #RDKit count fingerprint generators.

Plus "fpc2fps" with several methods to convert sparse count features -> binary.

And "fps2fpc" for the reverse (it's just a list of on-bit indices.)

2025-08-02

Blog post on "A new job, a postdoc opportunity, an open biological curator role, and a user group meeting "

baoilleach.blogspot.com/2025/0

#cheminformatics #chemjobs #chempostdocs

Noel O'Boyle boosted:
Andrew Dalkedalke@toots.nu
2025-05-25

Did my first timings of chemfp's "shardsearch" for searching ~1 billion #cheminformatics fingerprints by aggregate search of smaller shards.

Was annoyed that k=3 NN search of 1024-bit Morgan fingerprints took 10 minutes on my desktop. It should have been much faster, like less than a minute.

Then realized "wc shard*.fpb" takes *23 minutes*.

I'm gonna need a faster disk. Have spinning rust for 7 TB of space, not speed.

Zstd should help. It uses 1/4 the space. I'll need smaller shards to test.

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst