#AVX512

David JONESdrj@typo.social
2025-06-18

So here's an idea i had that i'm almost certainly not going to do anything with (so you should). With AVX-512 we have 16 x 32-bit registers. Let's pretend that's a 16-deep stack. The permute instruction let us do a DROP and DUP (except, you'd probably want to ROLL them, but whatever). I'm imaging that top-of-stack would always be register 0; PUSHing something permutes all the registers 1-higher and replaces register 0. Now implement a FORTH.
#AVX512 #FORTH

2025-06-09

Детальный обзор полей Галуа

"Попросите Якоби или Гаусса публично высказать своё мнение — не о истинности, а о важности этих теорем. Позже, я надеюсь, найдутся люди, которым будет выгодно разобраться во всём этом хаосе." Этими словами заканчивалось письмо Эвариста Галуа, написанное для своего друга Огюста Шевалье за два дня до его смерти от полученных на дуэли ран на 21 году жизни. Ни Якоби, ни Гаусс в его теоремах не разобрались, зато спустя 15 лет разобрался Жозеф Лиувилль и опубликовал работы Галуа, ставшие впоследствии фундаментом современной алгебры, известные сейчас как теория Галуа. В статье расскажу про одну из частей этой теории - поля Галуа, получившая настолько повсеместное применение в криптографии и избыточном кодировании, что Intel и AMD выпустили набор процессорных расширений для эффективной реализации операций над этими полями. Заметка! Если вам довелось использовать/реализовывать поля Галуа, то большая часть статьи для вас скорее всего будет не интересна, но возможно в последних разделах будет что-то для вас новое.

habr.com/ru/articles/916740/

#галуа #конечные_поля #avx512 #ридсоломон #aes

Benjamin Carr, Ph.D. 👨🏻‍💻🧬BenjaminHCCarr@hachyderm.io
2025-05-14

#AMD #EPYC #4565P & #4585PX #Benchmarks Against #Xeon #6369P
For "conventional" #server workloads like web serving and databases, the EPYC 4005 series dominates.
With up to 16C/32TH, #AVX512, DDR5-5600 memory and other advantages, the EPYC 4005 series is the very easy answer for those that may be looking for affordable #HPC
The AMD #EPYC4005 series #CPU deliver excellent generational uplift over the EPYC 4004 series and outright obliterating the #Xeon6300 series
phoronix.com/review/amd-epyc-4

nietras 👾nietras
2025-05-09

New blog post "Sep 0.10.0 - 21 GB/s CSV Parsing Using SIMD on AMD 9950X 🚀"

📈 Sep from 7 GB/s to 21 GB/s over last two years
🧑‍💻 and assembly on 9.0
🛠️ Tweaks and new -to-256 parser
🔢 Lots of benchmarks

👇
nietras.com/2025/05/09/sep-0-1

Sep performance progression from 7 GB/s to 21 GB/s.
Jesper Stemann Andersenstemann
2025-05-01

5 lines of is all you need to get to get the most of your hardware via , , or other microarchitecture-specific binaries instead of plain / / , / binaries!

… and the best part, you don’t even have to write those five lines - they’re here for you (and have been for a long time):

github.com/JuliaPackaging/Yggd

nietras 👾nietras
2025-04-14

code gen is unfortunately not great in the face of masks which Sep uses heavily. This means AVX-512 is slower than AVX2 🤔

cc @tannergooding can this be improved?

PS: While I understand the arguments for not having explicit mask types in dotnet I still think it will never be great, since it will be an endless whack-a-mole around code gen... compared to letting devs be able to do what they want.

Benjamin Carr, Ph.D. 👨🏻‍💻🧬BenjaminHCCarr@hachyderm.io
2025-04-04

#AMD #Ryzen9000 vs. #Intel #CoreUltra #ArrowLake On #Linux For Q1-2025 In ~400 Benchmarks
In cases where #AVX512 can be utilized, the Ryzen 9000 series is the definitive winner over the Intel Core Ultra Series 2 desktop processors. In some HPC applications the Core Ultra 9 285K with 24 physical cores does well in scenarios where SMP isn't leveraged.
Overall the #Zen5 based #Ryzen9 #9950X straight-up won 50% of the time with a first place finish.
phoronix.com/review/ryzen9000-

Benjamin Carr, Ph.D. 👨🏻‍💻🧬BenjaminHCCarr@hachyderm.io
2025-03-05

The Compelling #AVX512 Performance Advantage On #AMD #EPYC 9005 "Turin"
Workloads tested on this #EPYC9655 Supermicro server, with AVX-512 yielded 1.57x the performance of the same hardware/software but with AVX-512 forced off.
phoronix.com/review/amd-epyc-t

Scalable Analysesscalable@fosstodon.org
2025-03-02

Meet *Einsum Trees*, an abstraction for optimizing the
execution of tensor expressions: scalable.uni-jena.de/research/

#tensor #compiler #einsum #neon #avx512

2025-02-27

How many xmm/ymm/zmm registers did x86 have vs. x86-64? I'm seeing conflicting information on Google, claiming that x86 had zmm0-zmm31 while others claim it only had up to xmm15/ymm15/zmm15. (This might be because some webpages are confusing x86-64 in 32bit mode as being the same as the x86 architecture proper.)

#asm #assembly #x86 #x86_64 #avx512

阿宏伯浮浪貢丸🐈hiroshiyui@g0v.social
2025-02-20

bazelisk-linux-amd64 build //tensorflow/tools/pip_package:wheel --repo_env=WHEEL_NAME=tensorflow_cpu

這麼簡單的一行命令,就藏在官方文件裡:

tensorflow.org/install/source?

Stack Overflow 與 GitHub 上一堆過時無效的命令引導我不斷撞牆。

#TensorFlow #CPU #AVX512

2024-12-02

Насколько ПК удобнее смартфона

Ноутбуки со свободной прошивкой Libreboot Вот и выросло первое поколение «продвинутых пользователей смартфонов», которые никогда не работали за компьютером. Сейчас они заканчивают университет и начинают искать работу. Люди вытворяют на смартфоне удивительные вещи. Но не понимают, насколько убоги эти устройства на фоне полноценного компьютера. Смартфон действительно незаменим за пределами дома или офиса, в походе или поездке: для навигации, фото- и видеосъёмки, для срочных сообщений и др. Но при наличии нормального компьютера использовать смартфон по большей части глупо.

habr.com/ru/companies/ruvds/ar

#FFmpeg #AVX512 #ассемблер #оптимизация_софта #тяжеловесные_приложения #Google_Play #Apple_AppStore #монополия #Spotube #DeskHop #Slack_Dumper #тачскрины #смартфонизация #нормисы #ruvds_статьи

2024-11-18

Scaling an RGB image: godbolt.org/z/vMojsrhcG

GCC can only vectorize it on RVV and generates nice code with three indexed loads and a three segment segmented store. It fails for AVX512 /NEON.

clang manages something with AVX512, but you can barely call it vectorization.
The RVV codegen looks better, but it uses fixed length vectorization and seems to have miscalculated the best LMUL choice, which causes it to spill. You get better codegen if you set -mllvm --riscv-v-fixed-length-vector-lmul-max=4.

#RVV #AVX512 #NEON #gcc #llvm

Benjamin Carr, Ph.D. 👨🏻‍💻🧬BenjaminHCCarr@hachyderm.io
2024-11-05

#FFmpeg devs boast of up to 94x performance boost after implementing handwritten #AVX512 assembly code
The developers have created an optimized code path using the AVX-512 instruction set to accelerate specific functions within the FFmpeg multimedia processing library. By leveraging AVX-512, they were able to achieve significant performance improvements -- from three to 94 times faster -- compared to standard implementations.
tomshardware.com/pc-components

2024-11-05
Benjamin Carr, Ph.D. 👨🏻‍💻🧬BenjaminHCCarr@hachyderm.io
2024-10-24

#Intel #CoreUltra 9 285K "#ArrowLake" Delivers Strong #Linux Performance Review
Power efficiency improvements with Arrow Lake are real. Core Ultra 9 285K on average was at 136W, right inline with 137W Ryzen 9 9950X and much lower than 156W average with the Core i9 14900K. Core Ultra 9 285K was very competitive but if running a lot of #AVX512 workloads and areas where Zen 5 was delivering striking wins, Ryzen 9 9950X and the ~$429 Ryzen 9 9900X can deliver great value.
phoronix.com/review/intel-core

Radio AzureusRadioAzureus
2024-08-20

Important quote

On vs. off, the Ryzen 9 9950X impressively gained 56% more performance on average across all benchmarks compared to having acceleration turned off. The 7950X similarly saw a still impressive 41% performance improvement with AVX-512 acceleration turned on vs off

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst