#simd

2026-01-27

Glaze tiếp tục nhanh hơn nhờ tối ưu SIMD và xử lý khoảng trắng thông minh. Cập nhật từ GitHub PR #2270 và thảo luận trên Reddit.
#Programming #Glaze #SIMD #Bảncậpnhật #Vietnam #CộngđồngMastodon

reddit.com/r/programming/comme

M. H. Neifermhneifer
2026-01-22

"So here is an introduction on how to write SIMD-accelerated code in pure Rust (no nightly required), after all we all benefit when software goes faster." by Sylvain Kerkour

kerkour.com/introduction-rust-

2026-01-19

Autovectorization seems like a cool way to write cross platform SIMD code. But does anyone know of solutions to the insight issue? If I were to write a function which relies on autovectorization, wouldn't I literally have to 1) compile with every compiler + compiler settings + CPU arch + platform I wanna support, 2) disassemble all resulting binaries, 3) read analyze the assembly code to verify that it's vectorized how I expect, and 4) repeat for every change?

#programming #simd #compilers #plt

2026-01-14

🚀 Đã ra mắt Lunary – bộ phân tích NASDAQ TotalView‑ITCH 5.0 tốc độ cao bằng Rust. Sử dụng ZeroCopyParser để tránh sao chép, SIMD (AVX2/AVX‑512/SSE2) tăng 2‑4× throughput, thiết kế lock‑free, API an toàn nhưng cho phép tối ưu unsafe. Mời cộng đồng góp ý về thiết kế API, abstractions unsafe, xử lý SIMD và benchmark. #Rust #NASDAQ #HiệuNăng #PhânTích #SIMD #ZeroCopy

github.com/lunyn-hft/lunary

2026-01-08

So here is an annoying thing:

You can't directly use the x86/ARM/RISC-V AES instructions as a quick scramble for hashing (e.g. meowhash), because they all mix in the round key slightly differently.

#simd

x86:
a[127:0] := InvShiftRows(a[127:0])
a[127:0] := InvSubBytes(a[127:0])
a[127:0] := InvMixColumns(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]

ARM NEON:
bits(128) result = operand1 EOR operand2;
result = AESInvShiftRows(result);
V[d, 128] = AESInvSubBytes(result);

RISC-V RVV:
let sr    : bits(128) = aes_shift_rows_inv(state);
let sb    : bits(128) = aes_subbytes_inv(sr);
let ark   : bits(128) = sb ^ rkey;
let mix   : bits(128) = aes_mixcolumns_inv(ark);
2026-01-08

📢 New ORC 0.4.42 release of the Optimised inner loop Runtime Compiler!

This release contains both bug fixes and new features.

discourse.gstreamer.org/t/orc-

#gstreamer #orc #assembly #release #opensource #simd

Screenshot of some random SSE assembly with syntax highlighting
2026-01-08

텍스트 청킹 164GB/s로 만들기: memchr와 SIMD로 기존 라이브러리 96,000배 빠르게

RAG 파이프라인용 텍스트 청킹을 164GB/s로 처리하는 memchunk. SIMD와 룩업 테이블로 기존 라이브러리보다 최대 96,000배 빠른 속도를 달성한 방법을 소개합니다.

aisparkup.com/posts/7996

nietras 👾nietras
2026-01-04

It's 2026, and I still don't know of a (free or cheap) way to do AVX-512 testing for oss on github (except self-hosted runner ofc).

2026-01-02

Spent 1h today trying to implement an equivalent of vpermilps (_mm_permutevar_ps) in SSE, only to find that my "solution" used a per-lane shift (vpsrlvd)… which is only available in AVX2 🙄 SIMD on Intel is really the Swiss cheese of APIs; so difficult to do anything without an extensive knowledge of all the quirks and holes in the API. In the end the correct solution was to use pshufb, which is probably obvious if you’re familiar enough with SIMD but requires jumping through hoops. #simd #sse

nietras 👾nietras
2026-01-02

preview of some hobby work I am doing

N-gated Hacker Newsngate
2025-12-29

Ah, the pinnacle of human achievement: yet another ++ claiming to "crush" the competition with some newfangled sorcery. 🚀 Apparently, it's the stuff of legends that will revolutionize table scanning (or bore a room full of engineers to tears). 🤓 But hey, at least the can admire its own self-awareness while it "writes better code." 😂
github.com/Cranot/grouped-simd

N-gated Hacker Newsngate
2025-12-27

🚀Welcome to "SIMD ," where we auto-vectorize your boredom into an endless parade of buzzwords! 🤖 Matt Godbolt takes us on a riveting through optimizations—because who needs excitement when you can have "sophisticated" math?🔢 Spoiler: it's really just big words for feeding numbers into arrays. 🌽👈
xania.org/202512/20-simd-city

:rss: Qiita - 人気の記事qiita@rss-mstdn.studiofreesia.com
2025-12-24
2025-12-20

Моё знакомство с процессором Эльбрус-8СВ. Оптимизирую сложение массива байтов

Месяц назад мне в телеграм написал человек и предложил доступ к системе с процессором Эльбрус-8СВ. И, конечно же, я согласился. Так как мне интересно. Не каждый день неизвестные люди в Интернете предлагают доступ к удалённым хостам. Разве может что-то пойти не так?

habr.com/ru/articles/978730/?u

#эльбрус8св #эльбрус #e2k #vliw #simd #интринсики #ассемблер #си #оптимизация_кода

2025-12-20

Моё знакомство с процессором Эльбрус-8СВ. Оптимизирую сложение массива байтов

Месяц назад мне в телеграм написал человек и предложил доступ к системе с процессором Эльбрус-8СВ. И конечно же я согласился. Так как мне интересно. Не каждый день неизвестные люди предлагают доступ к удалённым хостам. Разве может что-то пойти не так?

habr.com/ru/articles/978730/

#эльбрус8св #эльбрус #e2k #vliw #simd #интринсики #ассемблер #си #оптимизация_кода

Boozook 🦀 :playdate:boozook@mastodon.gamedev.place
2025-12-12

Hey all! 👋🏻
I’m looking for some shader-like pipeline/#rendering system/library/framework for 1-bit graphics with 2x #framebuffer (double-buffered — actual & previous) with #blitting on #SIMD and #SWAR? CPU-only, mostly targeting ARM32/64/Thumb1.
I understand that it’s rare and mostly impossible to exist, so I just need some source-based guidance/hints of oldschool/demoscene- tricks and algorithms which I don’t know yet (I know a lot already, I’m 40)) and of course i can port.

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst