A comprehensive article on reaction prediction. https://macinchem.org/2025/05/05/reaction-prediction/ #cheminformatics.
A comprehensive article on reaction prediction. https://macinchem.org/2025/05/05/reaction-prediction/ #cheminformatics.
The 2025_03_1 release of #RDKit release includes my contribution to speed up part of getting 2D fingerprints for a molecule by ~75x! So if you generate #chemical fingerprints, now is a good time to upgrade.
Reminder that I'm #OpenToWork so if you're hiring for #cheminformatics or #scientificSoftware development, let's talk.
#chemistry #DrugDiscovery #pharma #PythonForChemists
https://github.com/rdkit/rdkit/releases/tag/Release_2025_03_1
thanks to the @fosstodon admins for giving statements. Not all our #fosstodon answers have been answered.
We live in difficult times where tensions run high and where independent justice on social media is absent. @fosstodon welcomed our project, with hesitance, not knowing who is behind this account or who is behind the Blue Obelisk movement. This brings risks, courage, and misuse.
We like to thank @fosstodon for allowing us to share our #openscience #cheminformatics news here for 2.5 years
@jhylin I also find that #cheminformatics project ideas evolve as I work on them. I sometimes start out with one idea, then when I solve it in code I realize that it opened a vista to another problem that I also need to solve to address the goal of the blog post.
Here's the expanded CYP-ADRs dataset on adverse drug reactions for cytochrome P450 substrates (drugs) with ideas behind this work.
Dataset: https://github.com/jhylin/Adverse_drug_reactions/blob/main/Data/cyp_substrates_adrs.csv
Ideas: https://jhylin.github.io/Data_in_life_blog/posts/22_Simple_dnn_adrs/0_Ideas.html
(I seem to be working in reverse lately... where project ideas are only more fully formed after having partially worked on it)
#prescription_drugs #cytochromep450 #AdverseReactions #cheminformatics
itching to put these 500+ experimental boiling points in @wikidata ... but this 2004 paper does not have SMILES, but this shorthand notation (screenshot). Should be doable, but also is a nice B.Sc. student project, I guess. https://doi.org/10.1021/ci049802u
Woo-hoo! My #ICCS2025 poster was accepted. I'm going to Noordwijkerhout in a few months.
Like the last couple of times, I'll be going there by train.
Anyone on the route (Trollhättan→Copenhagen→Hamburg→Noordwijkerhout and vice versa) want me to visit? I can talk about SMILES, #cheminformatics history, and fingerprint similarity for hours. :)
Or perhaps interested in licensing chemfp?
For that matter, I've also available to modernize old in-house cheminformatics code.
Identify PDB id associated with Uniprot id vortex script
https://macinchem.org/2025/04/13/uniprot-to-pdb-id-vortex-script/ #cheminformatics
I'm excited to present "Finding Tautomers" at the first North American #RDKit User Group Meeting in the #Boston #MA area on Friday April 11!
Reminder that I'm #OpenToWork so if you're in the area and hiring for #cheminformatics or #scientificSoftware development, let me know and we can meet to discuss your needs.
A vortex script for getting PDB ligand structures. https://macinchem.org/2025/04/06/vortex-script-for-getting-pdb-ligand-structures/ #cheminformatics
it seems I just released my first Pypi package every.
pyBacting 2.14 with Bacting 1.0.5 is now out: https://pypi.org/project/pybacting/0.2.14/
This gives you access in Python to (some of) the functionality of the Chemistry Development Kit, OPSIN, ChemSpider, PubChem, InChI, Excel files, BridgeDb, and BioJava
#openscience #cheminformatics dates back to the late nineties with the emerging collaborative development of JChemPaint, Jmol, and the Chemical Markup Language. Sketch of the history by Chris Steinbeck: "The evolution of open science in cheminformatics: a journey from closed systems to collaborative innovation" https://jcheminf.biomedcentral.com/articles/10.1186/s13321-025-00990-w
I finally finished implementing the newest chemfp feature - similarity histograms, both full comparison and sampled, and both NxN (upper triangular) and NxM (two datasets)
$ chemfp simhist chembl_34.fpb --bins 10 --num-samples 1000000 --no-metadata
start end count percent
0.0 0.1 413799 41.380
0.1 0.2 561503 56.150
0.2 0.3 23685 2.369
0.3 0.4 887 0.089
0.4 0.5 105 0.011
0.5 0.6 13 0.001
0.6 0.7 5 0.001
0.7 0.8 1 0.000
0.8 0.9 1 0.000
0.9 1.0 1 0.000
Next, update the docs.
23 April Cambridge Cheminformatics Network Meeting https://macinchem.org/2025/03/31/cambridge-cheminformatics-meeting-on-23-april-2025/ #cheminformatics
CMLXOM 4.11 has been released: https://doi.org/10.5281/zenodo.15108779
"Minor release, reverting to (the newer) xml-apis 1.4.01, updating to Joda time 2.14, and removing unused imports, updating deprecated code, and minimal added JavaDoc."
CMLXOM is a Java library for reading and writing Chemical Markup Language files
#cheminformatics advertisement - chemfp has a pretty fast Butina clustering implementation, and implements several variations for handling singletons and pruning the number of clusters.
https://chemfp.com/docs/chemfp_butina_command.html
With last year's release you can compute and save the NxN matrix (for a given threshold), and quickly re-cluster using the matrix as a staring point.
25,000 samples should easily be enough.
Select 100,000 #cheminformatics fingerprints from ChEMBL at random. Compute the histogram of all 49,99,950,000 pairs in the upper triangle. 100 bins, shown as a bar chart.
Then sample sizes N∈{5K, 10K, 15K, 20K, and 50K}, each for 20 times to get a distribution of samplings for each point, shown as a boxplot of percentages, one boxplot per bin.
Here's the result. 50K doesn't seem all that much better than 20K.
I like to remind people, if you want less of my human rights, future, open science, and other opiniatied posts, follow my #cheminformatics and #bioinformatics account at @egonw@social.edu.nl
Of course, I love you to stay here too, because I honestly believe in a better future and am hopeful. But hoping is not enough, hence my posts here.
Jonas Schaub: "Last week, I presented my work on algorithmic substructure extraction (scaffolds, functional groups, and aglycones) at the Chemistry Development Kit User Group Meeting (#CDK25UGM) in Maastricht.
You can now find my slides on Zenodo: https://doi.org/10.5281/zenodo.15058008"
Cheminformatics: Revolutionizing Drug Discovery, Molecular Design, and Chemical Research Paradigms
In the rapidly evolving landscape of scientific research and technological innovation, cheminformatics emerges as a groundbreaking interdisciplinary field that bridges the gap between chemistry, computer science.
Cheminformatics - https://articlescad.com/revolutionizing-drug-discovery-and-chemical-research-the-transformative-power-of-cheminformatics-5631.html
#Cheminformatics #DrugDiscovery #ComputationalChemistry #MolecularModeling #MachineLearning #CoherentMarketInsights