ChEMBL 36 is out! if you're using chembl-downloader for all of your ChEMBL needs, then you just have to re-run your reproducible workflows and get arbitrarily new and better results, you beautiful nerd
ChEMBL 36 is out! if you're using chembl-downloader for all of your ChEMBL needs, then you just have to re-run your reproducible workflows and get arbitrarily new and better results, you beautiful nerd
I used chembl-downloader to create some nice charts on how the number of compounds, assays, activities, and other entities in ChEMBL have grown over time
📖 https://cthoyt.com/2025/08/26/chembl-history.html
#chembl #chemistry #chemometrics #chemoinformatics #cheminformatics #rdkit #cdk #proteochemometrics
new blog: "One Million IUPAC names #4: a lot is happening" https://chem-bla-ics.linkedchemistry.info/2025/08/09/one-million-iupac-names-4.html
"A lot is happening. If you have been following this project more closesly, you may have already seen some interesting updates, but I will post it here too."
replies to this post become blog comments.
Most cheminformatics code that queries ChEMBL struggles with reproducibility.
chembl-downloader can help:
>>> import chembl_downloader as cd
>>> df = cd.query("""
SELECT chembl_id, pref_name
FROM molecule_dictionary
WHERE pref_name IS NOT NULL
""")
It's even sneaking its way into @wpwalters and @dr_greg_landrum blogs :)
Code/Docs: https://github.com/cthoyt/chembl-downloader
Preprint: https://arxiv.org/pdf/2507.17783
#cheminformatics #chemoinformatics #chembl #reproducibility #chemistry #openscience
Here's a new post on my first encounter with building a simple deep learning model on manually-compiled adverse drug reactions data (thanks to @baoilleach for feedback) - https://jhylin.github.io/Data_in_life_blog/posts/22_Simple_dnn_adrs/2_ADR_regressor.html
Notes re. data - https://jhylin.github.io/Data_in_life_blog/posts/22_Simple_dnn_adrs/1_ADR_data.html
At the ChEBI 2.0 workshop, Muhammad Arsalan is presenting how ChEBI is using the Bioregistry to standardize its cross-references, generate URLs on their front-end, and more
An update on an older post looking at saving a relatively large csv file (although may not be considered large by some) as a Parquet file first (to be followed by 3 other smaller posts later detailing the use of Polars with scikit-learn without using Pandas at all)
#Scikit_Learn #Polars #parquet #Python #ChEMBL #Cheminformatics
Here are some snapshots from the #ChEMBL symposium! Dr. Samantha attended & delivered a wonderful talk about the #SemanticWeb! You can find the slides right here ➡️ https://zenodo.org/records/13882075
Don't forget to catch Dr. Samantha's talk about the topic: The Semantic Web is dead, Long live the Semantic Web - The future of Semantics in the Physical Sciences at 14:00 BST at the @chembl symposium. Join virtually using the link in the agenda: https://t.ly/H20xH
#PSDI #CheMBL#event #talk #semantics2024
Dr. Samantha Pearman-Kanza, is speaking at the @chembl 15 year symposium!
In 2024 European Bioinformatics Institute | @emblebi celebrated the 15th anniversary of the first public release of the #ChEMBL database as well as the 10th anniversary of #SureChEMBL.
somewhere in the next months I am going to try to repeat this: https://github.com/egonw/chembl.rdf #chembl #cheminformatics #rdf
I have finally grown more trees leading to this new post on boosted trees - re. chaining Scikit-mol's transformers along with AdaBoost and XGBoost via Scikit_learn's interface and pipelines
https://jhylin.github.io/Data_in_life_blog/posts/19_ML2-3_Boosted_trees/1_adaboost_xgb.html
#cheminformatics #chembl #rdkit #python #ml #xgboost #adaboost #sklearn #scikit_mol
Blog post on "Every ChEMBL everywhere, all at once"
https://baoilleach.blogspot.com/2024/06/every-chembl-everywhere-all-at-once.html
#chembl #cheminformatics
In an attempt to complete the random forest (RF) series, here's another follow-up post on RF classifier with more on imbalanced dataset - https://jhylin.github.io/Data_in_life_blog/posts/17_ML2-2_Random_forest/2_random_forest_classifier.html
#cheminformatics #ml #rf #chembl #chembl_downloader #scikit_mol #rdkit #Scikit_Learn #ghostml #Python
A follow-up on the decision tree series leading to a random forest this time with details on model building, imbalanced dataset, feature importances & hyperparameter tuning - https://jhylin.github.io/Data_in_life_blog/posts/17_ML2-2_Random_forest/1_random_forest.html
Jupyter notebook link: https://github.com/jhylin/ML2-2_random_forest/blob/main/1_random_forest.ipynb
Post updated to show a different max_features used for regression task (thanks @dr_greg_landrum for pointing this out)
#ml #randomforest #scikitlearn #pandas #seaborn #matplotlib #python #cheminformatics #chembl #drugdiscovery
okay, the `curl` command is not correct yet (after shopping/dinner), but the "Run" and "Edit" links are now working for all SPARQL endpoints :) https://bigcat-um.github.io/PRA3006-SPARQL/wikipathways.html #wikidata #wikipathways #chembl #AOPWiki
my lightning talk from the #RDKitUGM2023 is now on YouTube - all about making your work that uses datasets derived from ChEMBL more reproducible
📺 Video: https://youtu.be/PY-xaoRoSOY?list=PLugOo5eIVY3ExzpyKll6GGz4FRgBD2qzN&t=28
📜 Slides: https://bit.ly/cth-rdkit-ugm-2023
🤖 Code/Docs: https://github.com/cthoyt/chembl-downloader
Get started with: pip install chembl-downloader
Shiny app in R - https://lnkd.in/gGwVYQ2r - this post walks through the process of making a simple Shiny app (without the help of any LLMs). #shiny #rstats #rladies #chembl #cheminformatics