#cheminformatics

2025-05-05

A comprehensive article on reaction prediction. macinchem.org/2025/05/05/react #cheminformatics.

2025-05-01

The 2025_03_1 release of #RDKit release includes my contribution to speed up part of getting 2D fingerprints for a molecule by ~75x! So if you generate #chemical fingerprints, now is a good time to upgrade.

Reminder that I'm #OpenToWork so if you're hiring for #cheminformatics or #scientificSoftware development, let's talk.

#chemistry #DrugDiscovery #pharma #PythonForChemists

github.com/rdkit/rdkit/release

2025-04-29

thanks to the @fosstodon admins for giving statements. Not all our #fosstodon answers have been answered.

We live in difficult times where tensions run high and where independent justice on social media is absent. @fosstodon welcomed our project, with hesitance, not knowing who is behind this account or who is behind the Blue Obelisk movement. This brings risks, courage, and misuse.

We like to thank @fosstodon for allowing us to share our #openscience #cheminformatics news here for 2.5 years

2025-04-28

@jhylin I also find that #cheminformatics project ideas evolve as I work on them. I sometimes start out with one idea, then when I solve it in code I realize that it opened a vista to another problem that I also need to solve to address the goal of the blog post.

2025-04-27

Here's the expanded CYP-ADRs dataset on adverse drug reactions for cytochrome P450 substrates (drugs) with ideas behind this work.

Dataset: github.com/jhylin/Adverse_drug

Ideas: jhylin.github.io/Data_in_life_

(I seem to be working in reverse lately... where project ideas are only more fully formed after having partially worked on it)

#prescription_drugs #cytochromep450 #AdverseReactions #cheminformatics

Egon Willighagenegonw@social.edu.nl
2025-04-27

itching to put these 500+ experimental boiling points in @wikidata ... but this 2004 paper does not have SMILES, but this shorthand notation (screenshot). Should be doable, but also is a nice B.Sc. student project, I guess. doi.org/10.1021/ci049802u

#cheminformatics

Screenshot of the supplementary info of the article linked in the article, zoomed in on the "Structure" column, with entries like this:

CH3-CH2-CH2Cl
CH2F2
CH3-CF2-CF3
etc
Andrew Dalkedalke@toots.nu
2025-04-25

Woo-hoo! My #ICCS2025 poster was accepted. I'm going to Noordwijkerhout in a few months.

Like the last couple of times, I'll be going there by train.

Anyone on the route (Trollhättan→Copenhagen→Hamburg→Noordwijkerhout and vice versa) want me to visit? I can talk about SMILES, #cheminformatics history, and fingerprint similarity for hours. :)

Or perhaps interested in licensing chemfp?

For that matter, I've also available to modernize old in-house cheminformatics code.

2025-04-13

Identify PDB id associated with Uniprot id vortex script
macinchem.org/2025/04/13/unipr #cheminformatics

2025-04-08

I'm excited to present "Finding Tautomers" at the first North American #RDKit User Group Meeting in the #Boston #MA area on Friday April 11!

Reminder that I'm #OpenToWork so if you're in the area and hiring for #cheminformatics or #scientificSoftware development, let me know and we can meet to discuss your needs.

Finding Tautomers title slide
Egon Willigh☮gen 🟥egonw
2025-04-06

it seems I just released my first Pypi package every.

pyBacting 2.14 with Bacting 1.0.5 is now out: pypi.org/project/pybacting/0.2

This gives you access in Python to (some of) the functionality of the Chemistry Development Kit, OPSIN, ChemSpider, PubChem, InChI, Excel files, BridgeDb, and BioJava

2025-04-05

#openscience #cheminformatics dates back to the late nineties with the emerging collaborative development of JChemPaint, Jmol, and the Chemical Markup Language. Sketch of the history by Chris Steinbeck: "The evolution of open science in cheminformatics: a journey from closed systems to collaborative innovation" jcheminf.biomedcentral.com/art

Andrew Dalkedalke@toots.nu
2025-03-31

I finally finished implementing the newest chemfp feature - similarity histograms, both full comparison and sampled, and both NxN (upper triangular) and NxM (two datasets)

$ chemfp simhist chembl_34.fpb --bins 10 --num-samples 1000000 --no-metadata
start end count percent
0.0 0.1 413799 41.380
0.1 0.2 561503 56.150
0.2 0.3 23685 2.369
0.3 0.4 887 0.089
0.4 0.5 105 0.011
0.5 0.6 13 0.001
0.6 0.7 5 0.001
0.7 0.8 1 0.000
0.8 0.9 1 0.000
0.9 1.0 1 0.000

Next, update the docs.

#cheminformatics

2025-03-30

CMLXOM 4.11 has been released: doi.org/10.5281/zenodo.1510877

"Minor release, reverting to (the newer) xml-apis 1.4.01, updating to Joda time 2.14, and removing unused imports, updating deprecated code, and minimal added JavaDoc."

CMLXOM is a Java library for reading and writing Chemical Markup Language files

#xml #chemistry #cheminformatics #openscience

Andrew Dalkedalke@toots.nu
2025-03-26

@egonw @wdscholia

#cheminformatics advertisement - chemfp has a pretty fast Butina clustering implementation, and implements several variations for handling singletons and pruning the number of clusters.

chemfp.com/docs/chemfp_butina_

With last year's release you can compute and save the NxN matrix (for a given threshold), and quickly re-cluster using the matrix as a staring point.

Andrew Dalkedalke@toots.nu
2025-03-24

@cfeldmann

25,000 samples should easily be enough.

Select 100,000 #cheminformatics fingerprints from ChEMBL at random. Compute the histogram of all 49,99,950,000 pairs in the upper triangle. 100 bins, shown as a bar chart.

Then sample sizes N∈{5K, 10K, 15K, 20K, and 50K}, each for 20 times to get a distribution of samplings for each point, shown as a boxplot of percentages, one boxplot per bin.

Here's the result. 50K doesn't seem all that much better than 20K.

Select 100,000 fingerprints from ChEMBL at random. Compute the histogram of all 49,99,950,000 pairs in the upper triangle. 100 bins, shown as a bar chart. The peak is at 0.15 Tanimoto similarity. The plot only goes up to 0.25 similarity as the tail gets very small.

Then sample sizes N∈{5K, 10K, 15K, 20K, and 50K}, each for 20 times to get a distribution of samplings for each point, shown as a boxplot of percentages, one boxplot per bin.

Here's the result. 50K doesn't seem much better than 20K. The boxplot dividers are about the same for both samples sizes, while the clearly wider for 15K and smaller.
Egon Willigh☮gen 🟥egonw
2025-03-23

I like to remind people, if you want less of my human rights, future, open science, and other opiniatied posts, follow my and account at @egonw@social.edu.nl

Of course, I love you to stay here too, because I honestly believe in a better future and am hopeful. But hoping is not enough, hence my posts here.

The Chemistry Development Kitcdk@fosstodon.org
2025-03-22

Jonas Schaub: "Last week, I presented my work on algorithmic substructure extraction (scaffolds, functional groups, and aglycones) at the Chemistry Development Kit User Group Meeting (#CDK25UGM) in Maastricht.

You can now find my slides on Zenodo: doi.org/10.5281/zenodo.1505800"

#cheminformatics #openscience

Prateek Yadavprateekcmi1
2025-03-21

Cheminformatics: Revolutionizing Drug Discovery, Molecular Design, and Chemical Research Paradigms

In the rapidly evolving landscape of scientific research and technological innovation, cheminformatics emerges as a groundbreaking interdisciplinary field that bridges the gap between chemistry, computer science.

Cheminformatics - articlescad.com/revolutionizin

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst