"A recent patch to glibc6 to add #POWER9 VSX strncpy was a whopping 250 hand-crafted assembly instructions, where its equivalent using Cray Vector principles is around 14."
https://archive.fosdem.org/2021/schedule/event/the_libresoc_project_simple_v_vectorisation/
(haven't watched yet)
This reminds me of that #MillComputing talk about how they automatically parallelize strlen.
I think it was this one:
https://redirect.invidious.io/watch?v=JS5hCjueqQ0&list=PLFls3Q5bBInj_FfNLrV7gGdVtikeGoUc9
cc #theFoundry