Glaze tiếp tục nhanh hơn nhờ tối ưu SIMD và xử lý khoảng trắng thông minh. Cập nhật từ GitHub PR #2270 và thảo luận trên Reddit.
#Programming #Glaze #SIMD #Bảncậpnhật #Vietnam #CộngđồngMastodon
Glaze tiếp tục nhanh hơn nhờ tối ưu SIMD và xử lý khoảng trắng thông minh. Cập nhật từ GitHub PR #2270 và thảo luận trên Reddit.
#Programming #Glaze #SIMD #Bảncậpnhật #Vietnam #CộngđồngMastodon
"So here is an introduction on how to write SIMD-accelerated code in pure Rust (no nightly required), after all we all benefit when software goes faster." by Sylvain Kerkour
Autovectorization seems like a cool way to write cross platform SIMD code. But does anyone know of solutions to the insight issue? If I were to write a function which relies on autovectorization, wouldn't I literally have to 1) compile with every compiler + compiler settings + CPU arch + platform I wanna support, 2) disassemble all resulting binaries, 3) read analyze the assembly code to verify that it's vectorized how I expect, and 4) repeat for every change?
🚀 Đã ra mắt Lunary – bộ phân tích NASDAQ TotalView‑ITCH 5.0 tốc độ cao bằng Rust. Sử dụng ZeroCopyParser để tránh sao chép, SIMD (AVX2/AVX‑512/SSE2) tăng 2‑4× throughput, thiết kế lock‑free, API an toàn nhưng cho phép tối ưu unsafe. Mời cộng đồng góp ý về thiết kế API, abstractions unsafe, xử lý SIMD và benchmark. #Rust #NASDAQ #HiệuNăng #PhânTích #SIMD #ZeroCopy
So here is an annoying thing:
You can't directly use the x86/ARM/RISC-V AES instructions as a quick scramble for hashing (e.g. meowhash), because they all mix in the round key slightly differently.
📢 New ORC 0.4.42 release of the Optimised inner loop Runtime Compiler!
This release contains both bug fixes and new features.
텍스트 청킹 164GB/s로 만들기: memchr와 SIMD로 기존 라이브러리 96,000배 빠르게
RAG 파이프라인용 텍스트 청킹을 164GB/s로 처리하는 memchunk. SIMD와 룩업 테이블로 기존 라이브러리보다 최대 96,000배 빠른 속도를 달성한 방법을 소개합니다.Spent 1h today trying to implement an equivalent of vpermilps (_mm_permutevar_ps) in SSE, only to find that my "solution" used a per-lane shift (vpsrlvd)… which is only available in AVX2 🙄 SIMD on Intel is really the Swiss cheese of APIs; so difficult to do anything without an extensive knowledge of all the quirks and holes in the API. In the end the correct solution was to use pshufb, which is probably obvious if you’re familiar enough with SIMD but requires jumping through hoops. #simd #sse
Ah, the pinnacle of human achievement: yet another #C++ #hash #table claiming to "crush" the competition with some newfangled #SIMD sorcery. 🚀 Apparently, it's the stuff of legends that will revolutionize table scanning (or bore a room full of engineers to tears). 🤓 But hey, at least the #GitHub #Copilot can admire its own self-awareness while it "writes better code." 😂
https://github.com/Cranot/grouped-simd-hashtable #innovation #programming #humor #HackerNews #ngated
High-performance C++ hash table using grouped SIMD metadata scanning
https://github.com/Cranot/grouped-simd-hashtable
#HackerNews #HighPerformance #C++ #HashTable #SIMD #MetadataScanning #TechnologyOptimization #GitHub
simd-prng, a small webassembly library
🚀Welcome to "SIMD #City," where we auto-vectorize your boredom into an endless parade of buzzwords! 🤖 Matt Godbolt takes us on a riveting #journey through #compiler optimizations—because who needs excitement when you can have "sophisticated" math?🔢 Spoiler: it's really just big words for feeding numbers into arrays. 🌽👈
https://xania.org/202512/20-simd-city #SIMD #optimizations #tech #buzzwords #math #HackerNews #ngated
SIMD City: Auto-Vectorisation
https://xania.org/202512/20-simd-city
#HackerNews #SIMD #City #Auto-Vectorisation #SIMD #City #Vectorisation #Tech #News #Programming #Insights
Intel CPUのAVX-512ユニットの5番ポートは物理的に遠い:文献調査編
https://qiita.com/Terminus-IMRC/items/659d4fd502a96baab9c5?utm_campaign=popular_items&utm_medium=feed&utm_source=popular_items
Моё знакомство с процессором Эльбрус-8СВ. Оптимизирую сложение массива байтов
Месяц назад мне в телеграм написал человек и предложил доступ к системе с процессором Эльбрус-8СВ. И, конечно же, я согласился. Так как мне интересно. Не каждый день неизвестные люди в Интернете предлагают доступ к удалённым хостам. Разве может что-то пойти не так?
https://habr.com/ru/articles/978730/?utm_source=habrahabr&utm_medium=rss&utm_campaign=978730
#эльбрус8св #эльбрус #e2k #vliw #simd #интринсики #ассемблер #си #оптимизация_кода
Моё знакомство с процессором Эльбрус-8СВ. Оптимизирую сложение массива байтов
Месяц назад мне в телеграм написал человек и предложил доступ к системе с процессором Эльбрус-8СВ. И конечно же я согласился. Так как мне интересно. Не каждый день неизвестные люди предлагают доступ к удалённым хостам. Разве может что-то пойти не так?
https://habr.com/ru/articles/978730/
#эльбрус8св #эльбрус #e2k #vliw #simd #интринсики #ассемблер #си #оптимизация_кода
Hey all! 👋🏻
I’m looking for some shader-like pipeline/#rendering system/library/framework for 1-bit graphics with 2x #framebuffer (double-buffered — actual & previous) with #blitting on #SIMD and #SWAR? CPU-only, mostly targeting ARM32/64/Thumb1.
I understand that it’s rare and mostly impossible to exist, so I just need some source-based guidance/hints of oldschool/demoscene- tricks and algorithms which I don’t know yet (I know a lot already, I’m 40)) and of course i can port.