Lmst

Ah, the #tangled #web of #SIMD vector functions! 🤯 Who knew optimizing #code could be so messy, like trying to untangle your headphones while wrestling a grizzly bear 🐻. But fear not, a #workshop in Aurora promises to save the day, because nothing says "fun weekend" like #vectorization with strangers! 🎉
https://johnnysswlab.com/the-messy-reality-of-simd-vector-functions/ #optimization #fun #weekend #HackerNews #ngated

The messy reality of SIMD (vector) functions

https://johnnysswlab.com/the-messy-reality-of-simd-vector-functions/

#HackerNews #SIMD #SIMDFunctions #VectorProgramming #TechTrends #CodingInsights

Оптимизация языковой модели Mamba для выполнения на CPU

Как оптимизировать модель Mamba для выполнения на CPU? Ускоряем код в 20 раз по сравнению с PyTorch, нарушая в процессе все правила оптимизации.

https://habr.com/ru/articles/925460/

#mamba #simd #векторизация #оптимизация_кода

💾✨ Behold, the #epic #saga of #SIMD #vector functions—a real page-turner that makes watching paint dry seem thrilling! 🎨🤯 Dive deep into the labyrinth of #buzzwords and workshops, only to discover it's as "messy" as a toddler with #spaghetti. 🍝🤦‍♂️
https://johnnysswlab.com/the-messy-reality-of-simd-vector-functions/ #functions #programming #humor #tech #messiness #HackerNews #ngated

The messy reality of SIMD (vector) functions

https://johnnysswlab.com/the-messy-reality-of-simd-vector-functions/

#HackerNews #SIMD #SIMDFunctions #VectorProgramming #TechReality #CodingChallenges

We investigate vector functions, more specifically, how to make your vector function available to the compiler's autovectorizer!

#vectorfunctions #simd #openmp #omp

https://johnnysswlab.com/the-messy-reality-of-simd-vector-functions/

Встреча ISO C++ в Софии: С++26 и рефлексия

Привет! На связи Антон Полухин из Техплатформы Городских сервисов Яндекса, и сейчас я расскажу о софийской встрече Международного комитета по стандартизации языка программирования C++, в которой принимал активное участие. Это была последняя встреча, на которой новые фичи языка, с предодобренным на прошлых встречах дизайном, ещё могли попасть в C++26. И результат превзошёл все ожидания: compile-time-рефлексия рефлексия параметров функций аннотации std::optional<T&‍> параллельные алгоритмы Об этих и других новинках расскажу в посте

https://habr.com/ru/companies/yandex/articles/920470/

#c++29 #с++29 #c++26 #с++26 #с++ #c++ #reflection #constexpr #exception #simd #safety #security #undefined_behavior #annotations #parallel_programming #executor #executors #ranges #coroutines

Встреча ISO C++ в Софии: С++26 и рефлексия

Привет! На связи Антон Полухин из техплатформы городских сервисов Яндекса, и сейчас я расскажу о софийской встрече Международного комитета по стандартизации языка программирования C++, в которой принимал активное участие. Это была последняя встреча, на которой новые фичи языка, с предодобренным на прошлых встречах дизайном, ещё могли попасть в C++26. И результат превзошёл все ожидания: compile-time-рефлексия рефлексия параметров функций аннотации std::optional<T&‍> параллельные алгоритмы Об этих и других новинках расскажу в посте

https://habr.com/ru/companies/yandex/articles/920470/

#c++29 #с++29 #c++26 #с++26 #с++ #c++ #reflection #constexpr #exception #simd #safety #security #undefined_behavior #annotations #parallel_programming #executor #executors #ranges #coroutines

О векторном вычислении экспоненциальной функции

Как вычислить экспоненциальную функцию быстро и с минимальной погрешностью? Пишем векторизованный код.

https://habr.com/ru/articles/923234/

#Simd #avx512 #параллельное_программирование #векторизация

Finding a billion factorials in 60 ms with SIMD

https://codeforces.com/blog/entry/143279

#HackerNews #Finding #a #billion #factorials #in #60 #ms #with #SIMD #codeforces #SIMD #performance #factorials #computing

And we just renamed it again. Who would have thought that we can name something 'vec' when we already have 'vector' 😅.
It'll be std::simd::vec<T, N> and std::simd::mask<T, N> in C++26.
Also vec and mask are (read-only) ranges now (range-based for works) and we got permutations, gather & scatter, compress & expand as well as mask conversions to and from bitset and unsigned. 🥳
Lot's of implementation and optimization work ahead for me now.

#cpp26 #simd #cplusplus #cpp

Зажигаем миллиард цветов миллионом строк

Надругательство над C# , C++ и HLSL , игрища с булками и буферами, тройная полиглотность, SIMD , пепекторы, DirectX , экономия 800 Тб ОЗУ, быстрая степень и многое другое. В этой части я расскажу и покажу, как делал софт на собственном фреймворке, который управляет ядерной подсветкой и механической видеостеной. Осторожно, трафик!

https://habr.com/ru/articles/902040/

#c# #net #C++ #hlsl #directx #подсветка #мониторы_и_тв #simd #ненормальное_программирование #программирование

Зажигаем миллиард цветов миллионом строк

Надругательство над C# , C++ и HLSL , игрища с булками и буферами, тройная полиглотность, SIMD , пепекторы, DirectX , экономия 800 Тб ОЗУ, быстрая степень и многое другое. В этой части я расскажу и покажу, как делал софт на собственном фреймворке, который управляет ядерной подсветкой и механической видеостеной. Осторожно, трафик!

https://habr.com/ru/articles/902040/

#c# #net #C++ #hlsl #directx #подсветка #мониторы_и_тв #simd #ненормальное_программирование #программирование

New blog post "Sep 0.11.0 - 9.5 GB/s CSV Parsing Using ARM NEON SIMD on Apple M1 🚀"

🛠️ New #ARM #NEON #SIMD parser based on @geofflangdale bulk move mask

📈 Sep #performance up from 7 GB/s on #Apple #M1 and 1.5x faster on #Microsoft #Cobalt 100 (4 GB/s to 6 GB/s)

🧑‍💻 #csharp SIMD and #ARM assembly on #dotnet 9.0

👇
https://nietras.com/2025/06/17/sep-0-11-0/

more #chess bit tricks
https://87flowers.com/gf2p8affineqb-piece-directions/

#simd #x86

🌕 子字串搜尋的SIMD友善演算法
➤ 如何利用SIMD指令集提升字串搜尋效率
✤ http://0x80.pl/notesen/2016-11-28-simd-strfind.html
本文探討了在現代CPU架構下，針對字串搜尋問題的效能優化。傳統字串搜尋演算法（如Knuth-Morris-Pratt、Boyer Moore、Karp-Rabin）假設單個字元比較是廉價操作，但現代CPU的SIMD指令集允許同時比較多個字元，使得傳統假設不再成立。文章介紹了兩種利用SIMD指令集優化字串搜尋的方法，並提供了從SWAR到AVX512F等不同實現的效能測試結果，證明瞭SIMD在提升搜尋效率上的潛力。
+ 這篇文章深入淺出地解釋了SIMD運算在字串搜尋中的應用，對於想提升程式效能的開發者來說很有幫助。
+ 實際的效能測試數據讓人信服，SIMD確實能在某些情況下大幅提升搜尋速度。
#演算法 #SIMD #程式效能

I'm putting a talk together about #programming Mandelbrot image generator with insight into profiling and optimisation. Main part will be normal optimisations, #simd, #multithreading, and possibly gpu acceleration.

I'll also show micro benchmarking, hotspot/perf, intel advisor, and also inspecting assembly code.

Any other interesting bits I should look into putting into my talk?

#cpp #cplusplus

SIMD-friendly algorithms for substring searching

http://0x80.pl/notesen/2016-11-28-simd-strfind.html

#HackerNews #SIMD #algorithms #substring #searching #performance #optimization

Believe it or not, it’s 2025 and I just implemented the first “standard FIR” class in my #DSP library for #PédaleVite (#DIY guitar/bass #multiFX). Such a basic processing neglected for years…

It’s optimized using #SIMD instruction sets (NEON and SSE) so I can run a 4096-tap impulse at 2.7 % CPU load per #audio channel on a #RaspberryPi5. This means a decent cabinet simulation without any of these complex zero-latency partitioned convolution algorithms.

https://gitlab.com/EleonoreMizo/pedalevite/-/blob/master/src/mfx/dsp/fir/Fir.h

A screen capture of a terminal displaying the speed performance corresponding to various settings for the convolution algorithm.

@Methylzero @diehlpk @hpcnotes I've been asking that question since 2009, when I started doing #SIMD. It's not just GPUs where this makes a difference. FP32 is significantly more efficient on CPUs since a long time.

#simd

Client Info