Lmst

The 2025_03_1 release of #RDKit release includes my contribution to speed up part of getting 2D fingerprints for a molecule by ~75x! So if you generate #chemical fingerprints, now is a good time to upgrade.

Reminder that I'm #OpenToWork so if you're hiring for #cheminformatics or #scientificSoftware development, let's talk.

#chemistry #DrugDiscovery #pharma #PythonForChemists

https://github.com/rdkit/rdkit/releases/tag/Release_2025_03_1

I'm excited to present "Finding Tautomers" at the first North American #RDKit User Group Meeting in the #Boston #MA area on Friday April 11!

Reminder that I'm #OpenToWork so if you're in the area and hiring for #cheminformatics or #scientificSoftware development, let me know and we can meet to discuss your needs.

Interested in #MPI and #OpenMP parallel programming to speed up your scientific applications written in #C, #Cpp, #Fortran or #Python (with #numpy)?

Attend our course in #Mainz at the Johannes Gutenberg University (#JGU) for a 4-day course from 1. April to 4. April 2025!

See our announcement page for further details and to register: https://indico.zdv.uni-mainz.de/event/34/

Note, it is an on-site course.

#RSE #HPC #scientificsoftware

The #Energy #Climate & #Environment program at #IIASAVienna had its quarterly meeting last Friday (~100 researchers), so I had to reflect on our role as community data hub and what to present on behalf of the #ScenarioServices & #ScientificSoftware team.

We developed a new #ScenarioExplorer front-end last year, and we made a lot of progress with our #opensource packages for scenario analysis, validation & data-management.

Step by step towards #OpenScience and reusable, reproducible analysis...

Screenshot of the page "Key indicators by chapter", an entrypoint to representative figures and key indicators from the State of CDR report (2024).

Data Explorer in the SHAPE Scenario Explorer https://shape.apps.ece.iiasa.ac.at/explorer, showing emissions across the scenarios developed in SHAPE

Social media preview of the pyam package for scenario analysis and data visualization, including status badges and logos of tools: GitHub, Read The Docs, Groups.io, Slack

Preview from the GitHub repository "common-definitions" https://github.com/IAMconsortium/common-definitions for model comparison projects

Working with #NUTS administrative EU 🇪🇺 regions is one of the little nuisances in #energysystems modelling and scenario analysis.

So the #IIASA #ScenarioServices team put together a little #opensource #python utility package so that modelers can focus on #freethemodels and don’t have to spend too much time on data-wrangling…
#pysquirrel #ScientificSoftware
https://github.com/iiasa/pysquirrel

Here's an ~ official ~ release announcement for #numpydantic

repo: https://github.com/p2p-ld/numpydantic
docs: https://numpydantic.readthedocs.io

Problems: @pydantic is great for modeling data!! but at the moment it doesn't support array data out of the box. Often array shape and dtype are as important as whether something is an array at all, but there isn't a good way to specify and validate that with the Python type system. Many data formats and standards couple their implementation very tightly with their schema, making them less flexible, less interoperable, and more difficult to maintain than they could be. The existing tools for parameterized array types like nptyping and jaxtyping tie their annotations to a specific array library, rather than allowing array specifications that can be abstract across implementations.

numpydantic is a super small, few-dep, and well-tested package that provides generic array annotations for pydantic models. Specify an array along with its shape and dtype and then use that model with any array library you'd like! Extending support for new array libraries is just subclassing - no PRs or monkeypatching needed. The type has some magic under the hood that uses pydantic validators to give a uniform array interface to things that don't usually behave like arrays - pass a path to a video file, that's an array. pass a path to an HDF5 file and a nested array within it, that's an array. We take advantage of the rest of pydantic's features too, including generating rich JSON schema and smart array dumping.

This is a standalone part of my work with @linkml arrays and rearchitecting neurobio data formats like NWB to be dead simple to use and extend, integrating with the tools you already use and across the experimental process - specify your data in a simple yaml format, and get back high quality data modeling code that is standards-compliant out of the box and can be used with arbitrary backends. One step towards the wild exuberance of FAIR data that is just as comfortable in the scattered scripts of real experimental work as it is in carefully curated archives and high performance computing clusters. Longer term I'm trying to abstract away data store implementations to bring content-addressed p2p data stores right into the python interpreter as simply as if something was born in local memory.

plenty of todos, but hope ya like it.

#linkml #python #NewWork #pydantic #ScientificSoftware

[This and the following images aren't very screen reader friendly with a lot of code in them. I'll describe what's going on in brackets and then put the text below.

In this image: a demonstration of the basic usage of numpydantic, declaring an "array" field on a pydantic model with an NDArray class with a shape and dtype specification. The model can then be used with a number of different array libraries and data formats, including validation.]

Numpydantic allows you to do this:

from pydantic import BaseModel
from numpydantic import NDArray, Shape

class MyModel(BaseModel):
array: NDArray[Shape["3 x, 4 y, * z"], int]

And use it with your favorite array library:

import numpy as np
import dask.array as da
import zarr

# numpy
model = MyModel(array=np.zeros((3, 4, 5), dtype=int))
# dask
model = MyModel(array=da.zeros((3, 4, 5), dtype=int))
# hdf5 datasets
model = MyModel(array=('data.h5', '/nested/dataset'))
# zarr arrays
model = MyModel(array=zarr.zeros((3,4,5), dtype=int))
model = MyModel(array='data.zarr')
model = MyModel(array=('data.zarr', '/nested/dataset'))
# video files
model = MyModel(array="data.mp4")

[Further demonstration of validation and array expression, where a Union of NDArray specifications can specify a more complex data type - eg. an image that can be any shape in x and y, an RGB image, or a specific resolution of a video, each with independently checked dtypes]

For example, to specify a very special type of image that can either be

a 2D float array where the axes can be any size, or

a 3D uint8 array where the third axis must be size 3

a 1080p video

from typing import Union
from pydantic import BaseModel
import numpy as np

from numpydantic import NDArray, Shape

class Image(BaseModel):
array: Union[
NDArray[Shape["* x, * y"], float],
NDArray[Shape["* x, * y, 3 rgb"], np.uint8],
NDArray[Shape["* t, 1080 y, 1920 x, 3 rgb"], np.uint8]
]

And then use that as a transparent interface to your favorite array library!
Interfaces
Numpy

The Coca-Cola of array libraries

import numpy as np
# works
frame_gray = Image(array=np.ones((1280, 720), dtype=float))
frame_rgb = Image(array=np.ones((1280, 720, 3), dtype=np.uint8))

# fails
wrong_n_dimensions = Image(array=np.ones((1280,), dtype=float))
wrong_shape = Image(array=np.ones((1280,720,10), dtype=np.uint8))

# shapes and types are checked together, so this also fails
wrong_shape_dtype_combo = Image(array=np.ones((1280, 720, 3), dtype=float))

[Demonstration of usage outside of pydantic as just a normal python type - you can validate an array against a specification by checking it the array is an instance of the array specification type]

And use the NDArray type annotation like a regular type outside of pydantic – eg. to validate an array anywhere, use isinstance:

array_type = NDArray[Shape["1, 2, 3"], int]
isinstance(np.zeros((1,2,3), dtype=int), array_type)
# True
isinstance(zarr.zeros((1,2,3), dtype=int), array_type)
# True
isinstance(np.zeros((4,5,6), dtype=int), array_type)
# False
isinstance(np.zeros((1,2,3), dtype=float), array_type)
# False

[Demonstration of JSON schema generation using the sort of odd case of an array with a specific dtype but an arbitrary shape. It has to use a recursive JSON schema definition, where the items of a given JSON array can either be the innermost dtype or another instance of that same array. Since JSON Schema doesn't support extended dtypes like 8-bit integers, we encode that information as maximum and minimum constraints on the `integer` class and add it in the schema metadata. Since pydantic renders all recursive schemas like this in the same $defs block, we use a blake2b hash against the dtype specification to keep them deduplicated.]

numpydantic can even handle shapes with unbounded numbers of dimensions by using recursive JSON schema!!!

So the any-shaped array (using nptyping’s ellipsis notation):

class AnyShape(BaseModel):
array: NDArray[Shape["*, ..."], np.uint8]

is rendered to JSON-Schema like this:

{
"$defs": {
"any-shape-array-9b5d89838a990d79": {
"anyOf": [
{
"items": {
"$ref": "#/$defs/any-shape-array-9b5d89838a990d79"
},
"type": "array"
},
{"maximum": 255, "minimum": 0, "type": "integer"}
]
}
},
"properties": {
"array": {
"dtype": "numpy.uint8",
"items": {"$ref": "#/$defs/any-shape-array-9b5d89838a990d79"},
"title": "Array",
"type": "array"
}
},
"required": ["array"],
"title": "AnyShape",
"type": "object"
}

The Chapel team at HPE is looking for scientists to collaborate with.

Are you doing computation for science using #python or similar tools? Interested in trying something different, to run faster or scale further?

Let's make the world a better place together!

See this #blog post for details:
https://chapel-lang.org/blog/posts/python-science-collabs/

Boosts / reposts / etc greatly appreciated.

#ScientificSoftware #OpenSource #OpenScience #science #hpc

Time for a re-#introduction !

I'm a #scicomm enthusiast and board member of #Fediscience. My background is in #Biophysics, done a Postdoc in #GeneticEpidemiology, industry detour, now working in #HPC for some years.

Interested in #HPC, #bioinformatics, #OpenScience, #workflows (#snakemake), #RDM, #scientificsoftware and #sciencecommunication

My blog can be found here: blogs.fediscience.org and my more political me can be found at @rupdecat@mastodon.m1234.de.

The #Energy #Climate & #Environment program at #IIASA is looking for an #IntegratedAssessment modeler to join the #MESSAGEix team!

Come work with us on policy-relevant scenarios and state-of-the-art #opensource #ScientificSoftware tools...
📝 https://iiasa.ac.at/employment/job-openings?jh=9vvlm7a64e2n7qejcxg3n0dhyqx528y

Screenshot of the vacancy announcement for an Integrated Assessment Modeler

The Integrated Assessment and Climate Change (IACC) Research Group within the Energy, Climate, and Environment (ECE) Program at IIASA seeks a strong candidate to help develop and extend the current global modeling framework used within the group. The successful candidate will work with a team of international scientists who also develop and use the framework to perform state-of-the-art scenario modeling and assessment.

It's so nice when #code that you have written more than a year ago just works 🎇 ☺️

#scientificsoftware #tdd

Why does most academic software, at least in medical centers, still look like Windows 95?

#science #software #windows95 #gui #aesthetics #researchlife #research #scientificsoftware

You have spent countless hours on your code. It is time for your hard work to pay off.
Document and publish your code as a software report with Seismica and let it shine!

Find out more at:

https://seismica.library.mcgill.ca/author-guidelines/#publication-types

#Seismology #EarthquakeScience #OpenAcces #Scientificjournals #DiamondOpenAccess #scientificsoftware #peerreviewed

Are your a #SoftwareDeveloper that would like to work on #Climate-related #opensource tools & #dataviz solutions?

Or are your an #EarlyCareer #Researcher that prefers programming to paper-writing?

The #ScenarioServices team at the #IIASA #Energy #Climate & #Environment program is hiring a #ScientificSoftware developer to implement new (#opensource) user interfaces & #dataviz features for our #ScenarioExplorer infrastructure.

👉https://iiasa.ac.at/employment/job-openings?jh=oj8uz8mtiehwnwyhls5u7b355mdre6b

esearch Software Developer

The successful candidate will join a small team of software developers working with researchers in the IIASA Energy, Climate, and Environment (ECE) Program and collaborating institutions on further developing and continuously improving tools to support energy- and climate-related research in ECE and in the wider academic community.

An interest in the scientific content of the software tools – climate change and energy transition research –will help communication within the multi- disciplinary teams and developing a deeper understanding of the user needs.

On an (un)related topic, does anybody have recommendations on a journal whose primary scope is #HPC #scientificSoftware? Preference for #CFD, but more generalist recommendation are welcome too. This manuscript looking for a home is more focused on the actual code (programming strategies etc) than the physics, which makes it a poor fit for journals like CPC and JCP.

🚨Job alert🚨

The #ScenarioServices team at the #IIASA #Energy #Climate & #Environment program is hiring a #ScientificSoftware developer to implement new (#opensource) user interfaces & #dataviz features for our #ScenarioExplorer infrastructure.
👉https://iiasa.ac.at/employment/job-openings?jh=oj8uz8mtiehwnwyhls5u7b355mdre6b

IIASA Vacancy Announcement 31-2023

Research Software Developer

The successful candidate will join a small team of software developers working with researchers in the IIASA Energy, Climate, and Environment (ECE) Program and collaborating institutions on further developing and continuously improving tools to support energy- and climate-related research in ECE and in the wider academic community.

An interest in the scientific content of the software tools – climate change and energy transition research –will help communication within the multi- disciplinary teams and developing a deeper understanding of the user needs.

🎉 We just released the #python package #ixmp4, an #opensource reimplementation of the #scientificsoftware database management behind the #IIASA #ScenarioExplorer infrastructure & the #MESSAGEix #IntegratedAssessment modelling framework.
https://github.com/iiasa/ixmp4

@danielskatz FWIW, here‘s a non-paywalled link: https://rdcu.be/c6uMN

I think it‘s great to emphasise the lack of incentives for #scientificsoftware development and #openscience more generally), but I believe that the solutions proposed by the authors are too narrow… We need to focus on an incentive structure that focuses on the usability of the tools!

And yet, the authors fail to imagine incentives & structures for #ScientificSoftware development that go beyond the current (ineffective) status quo.

Software shouldn't be coerced into an article format or a new journal; instead, we need proper #funding & dedicated #career paths for the foundation of #OpenScience.

More and more scientific disciplines realize the importance of (#opensource) #ScientificSoftware fore excellent research - and the lack of recognition & incentives to ensure maintenance of crucial pillars of our work. Commentary in #Nature #ecology & #evolution https://rdcu.be/c6uMN

@bjenquist

And yet...

#scientificSoftware

Client Info