Lmst

I just drafted two more chapters for the PSI spec, on microprocessors and peripherals. Small and slow steps lead to the ultimate unterstanding of #compute #platforms. 🥳

https://platform-system-interface.github.io/psi-spec/application-processors

Scaling Challenges in AI - Meta CFO Susan Li

#ai #compute #aiinfrastructure #capex

@raymondpert Oh, no, no, no, they have a "formula" for such things. Not only have people made a tiny group of people filthy rich for voluntarily being their data subjects, they'll now be paying the bill for them to become richer, processing all their juicy bits. #compute

An Overview of Type-In Computer Magazines
In the old old old old old old old OLD* days, people wrote computer programs by either filling boxes on paper cards or punching out squares, like they did (maybe still do?) for standardized tests. The cards would be fed into card reading devices, some of them called Hollerith
https://setsideb.com/an-overview-of-type-in-computer-magazines/
#indies #niche #retro #ahoy #compute #computesgazette #indie #magazine #niche #retro #run #software #typeins

There are things that grow exponentially. And yes, humanity is predictably bad at predicting things on an exponential growth trajectory.

Silicon Valley has made such a myth at this phenomenon, that they’re completely convinced this will play out with #AI as well.
Perhaps because it makes them feel smart?

But there’s no signal that the current models’ performance would keep on scaling exponentially with added #compute. On the contrary, it seems to be showing diminishing returns already.

@serpentroots IMHO, all this "#AI" #slop should be outlawed for all the right reasons.

Even if we didn't care that #WastefulComputing for it and even #Bitcoin-like #ASIC|s for it [aka. #NPU|s] are being built, the Power Consumption is just bad!
Just like running a car engine for no good reason within city limits is banned in #Germany for #AirPollution reasons alone, so should generating shitty #AIslop for #pollution of the #Internet and being #waste of #energy, and computing resources from #compute to #storage and #traffic!

https://hachyderm.io/@serpentroots/114560654958919310

Intelligenza Artificiale: Implementazione del meccanismo dell’attenzione in Python

Il meccanismo di attenzione è spesso associato all’architettura dei transformers, ma era già stato utilizzato nelle RNN (reti ricorrenti).

Nei task di traduzione automatica (ad esempio, inglese-italiano), quando si vuole prevedere la parola italiana successiva, è necessario che il modello si concentri, o presti attenzione, sulle parole inglesi più importanti nell’input, utili per ottenere una buona traduzione.

Non entrerò nei dettagli delle RNN, ma l’attenzione ha aiutato questi modelli a mitigare il problema vanishing gradient, e a catturare più dipendenze a lungo raggio tra le parole.

A un certo punto, abbiamo capito che l’unica cosa importante era il meccanismo di attenzione e che l’intera architettura RNN era superflua. Quindi, Attention is All You Need!

Self-Attention nei Transformers

L’attenzione classica indica dove le parole della sequenza in output devono porre attenzione rispetto alle parole della sequenza di input. È importante in task del tipo sequence-to-sequence come la traduzione automatica.

La self-attention è un tipo specifico di attenzione. Opera tra due elementi qualsiasi della stessa sequenza. Fornisce informazioni su quanto siano “correlate” le parole nella stessa frase.

Per un dato token (o parola) in una sequenza, la self-attention genera un elenco di pesi di attenzione corrispondenti a tutti gli altri token della sequenza. Questo processo viene applicato a ogni token della frase, ottenendo una matrice di pesi di attenzione (come nella figura).

Questa è l’idea generale, in pratica le cose sono un po’ più complicate perché vogliamo aggiungere molti parametri/pesi nell nostra rete, in modo che il modella abbia più capacità di apprendimento.

Le rappresentazioni K, V, Q

L’input del nostro modello è una frase come “mi chiamo Marcello Politi”. Con il processo di tokenizzazione, una frase viene convertita in un elenco di numeri come [2, 6, 8, 3, 1].

Prima di passare la frase al transformer, dobbiamo creare una rappresentazione densa per ogni token.

Come creare questa rappresentazione? Moltiplichiamo ogni token per una matrice. La matrice viene appresa durante l’addestramento.

Aggiungiamo ora un po’ di complessità.

Per ogni token, creiamo 3 vettori invece di uno, che chiamiamo vettori: chiave (K), valore (V) e domanda (Q). (Vedremo più avanti come creare questi 3 vettori).

Concettualmente questi 3 token hanno un significato particolare:

La chiave del vettore rappresenta l’informazione principale catturata dal token.
Il valore del vettore cattura l’informazione completa di un token.
Il vettore query, è una domanda sulla rilevanza del token per il task corrente.

L’idea è che ci concentriamo su un particolare token i e vogliamo chiedere qual è l’importanza degli altri token della frase rispetto al token i che stiamo prendendo in considerazione.

Ciò significa che prendiamo il vettore q_i (poniamo una domanda relativa a i) per il token i, e facciamo alcune operazioni matematiche con tutti gli altri token k_j (j!=i). È come se ci chiedessimo a prima vista quali sono gli altri token della sequenza che sembrano davvero importanti per capire il significato del token i.

Ma qual’è questa operazione magica?

Dobbiamo moltiplicare (dot-product) il vettore della query per i vettori delle chiavi e dividere per un fattore di normalizzazione. Questo viene fatto per ogni token k_j.

In questo modo, otteniamo uno scroe per ogni coppia (q_i, k_j). Trasformiamo questi score in una distribuzione di probabilità applicandovi un’operazione di softmax. Bene, ora abbiamo ottenuto i pesi di attenzione!

Con i pesi di attenzione, sappiamo qual è l’importanza di ogni token k_j per indistinguere il token i. Quindi ora moltiplichiamo il vettore di valore v_j associato a ogni token per il suo peso e sommiamo i vettori. In questo modo otteniamo il vettore finale context-aware del token_i.

Se stiamo calcolando il vettore denso contestuale del token_1, calcoliamo:

z1 = a11v1 + a12v2 + … + a15*v5

Dove a1j sono i pesi di attenzione del computer e v_j sono i vettori di valori.

Fatto! Quasi…

Non ho spiegato come abbiamo ottenuto i vettori k, v e q di ciascun token. Dobbiamo definire alcune matrici w_k, w_v e w_q in modo che quando moltiplichiamo:

token * w_k -> k
token * w_q -> q
token * w_v -> v

Queste tre matrici sono inizializzate in modo casuale e vengono apprese durante l’addestramento; questo è il motivo per cui abbiamo molti parametri nei modelli moderni come gli LLM.

Multi-Head Self-Attention (MHSA) nei Transformers

Siamo sicuri che il precedente meccanismo di self-attention sia in grado di catturare tutte le relazioni importanti tra i token (parole) e di creare vettori densi di quei token che abbiano davvero senso?

In realtà potrebbe non funzionare sempre perfettamente. E se, per mitigare l’errore, si rieseguisse l’intera operazione due volte con nuove matrici w_q, w_k e w_v e si unissero in qualche modo i due vettori densi ottenuti? In questo modo forse una self-attention è riuscita a cogliere qualche relazione e l’altra è riuscita a cogliere qualche altra relazione.

Ebbene, questo è ciò che accade esattamente in MHSA. Il caso appena discusso contiene due head (teste), perché ha due insiemi di matrici w_q, w_k e w_v. Possiamo avere anche più head: 4, 8, 16, ecc.

L’unica cosa complicata è che tutte queste teste vengono gestite in parallelo, elaborandole tutte nello stesso calcolo utilizzando i tensori.

Il modo in cui uniamo i vettori densi di ogni head è semplice, li concateniamo (quindi la dimensione di ogni vettore deve essere più piccola, in modo che quando li concateniamo otteniamo la dimensione originale che volevamo) e passiamo il vettore ottenuto attraverso un’altra matrice imparabile w_o.

Hands-on

Supponiamo di avere una frase. Dopo la tokenizzazione, ogni token (o parola) corrisponde a un indice (numero):

tokenized_sentence = torch.tensor([
2, #my
6, #name
8, #is
3, #marcello
1 #politi
])
tokenized_sentence

Prima di passare la frase nel transformer, dobbiamo creare una rappresentazione densa per ciascun token.

Come creare questa rappresentazione? Moltiplichiamo ogni token per una matrice. Questa matrice viene appresa durante l’addestramento.

Costruiamo questa matrice, chiamata matrice di embedding.

torch.manual_seed(0) # set a fixed seed for reproducibility
embed = torch.nn.Embedding(10, 16)

Se moltiplichiamo la nostra frase tokenizzata con la matrice di embedding, otteniamo una rappresentazione densa di dimensione 16 per ogni token

sentence_embed = embed(tokenized_sentence).detach()
sentence_embed

Per utilizzare il meccanismo di attenzione dobbiamo creare 3 nuove matrici w_q, w_k e w_v. Moltiplicando un token di ingresso per w_q otteniamo il vettore q. Lo stesso vale per w_k e w_v.

d = sentence_embed.shape[1] # let's base our matrix on a shape (16,16)

w_key = torch.rand(d,d)
w_query = torch.rand(d,d)
w_value = torch.rand(d,d)

Calcolo dei pesi di attenzione

Calcoliamo ora i pesi di attenzione solo per il primo token della frase.

token1_embed = sentence_embed

[0]#compute the tre vector associated to token1 vector : q,k,v
key_1 = w_key.matmul(token1_embed)
query_1 = w_query.matmul(token1_embed)
value_1 = w_value.matmul(token1_embed)

print("key vector for token1: \n", key_1)
print("query vector for token1: \n", query_1)
print("value vector for token1: \n", value_1)

Dobbiamo moltiplicare il vettore query associato al token1 (query_1) con tutte le chiavi degli altri vettori.

Quindi ora dobbiamo calcolare tutte le chiavi (chiave_2, chiave_2, chiave_4, chiave_5). Ma aspettate, possiamo calcolarle tutte in una sola volta moltiplicando sentence_embed per la matrice w_k.

keys = sentence_embed.matmul(w_key.T)
keys[0] #contains the key vector of the first token and so on

Facciamo la stessa cosa con i valori

values = sentence_embed.matmul(w_value.T)
values[0] #contains the value vector of the first token and so on

Calcoliamo la prima parte della formula adesso.

import torch.nn.functional as F

# the following are the attention weights of the first tokens to all the others
a1 = F.softmax(query_1.matmul(keys.T)/d**0.5, dim = 0)
a1

Con i pesi di attenzione sappiamo qual è l’importanza di ciascun token. Quindi ora moltiplichiamo il vettore di valori associato a ogni token per il suo peso.

Per ottenere il vettore finale del token_1 che includa anche il contesto.

z1 = a1.matmul(values)
z1

Allo stesso modo, possiamo calcolare i vettori densi consapevoli del contesto di tutti gli altri token. Ora stiamo utilizzando sempre le stesse matrici w_k, w_q, w_v. Diciamo che usiamo una sola head.

Ma possiamo avere più triplette di matrici, quindi una multi-heads. Ecco perché si chiama multi-head attention.

I vettori densi di un token in ingresso, dati in input a ciascuna head, vengono poi concatenati e trasformati linearmente per ottenere il vettore denso finale.

import torch
import torch.nn as nn
import torch.nn.functional as F

torch.manual_seed(0) #

# Tokenized sentence (same as yours)
tokenized_sentence = torch.tensor([2, 6, 8, 3, 1]) # [my, name, is, marcello, politi]

# Embedding layer: vocab size = 10, embedding dim = 16
embed = nn.Embedding(10, 16)
sentence_embed = embed(tokenized_sentence).detach() # Shape: [5, 16] (seq_len, embed_dim)

d = sentence_embed.shape[1] # embed dimension 16
h = 4 # Number of heads
d_k = d // h # Dimension per head (16 / 4 = 4)

# Define weight matrices for each head
w_query = torch.rand(h, d, d_k) # Shape: [4, 16, 4] (one d x d_k matrix per head)
w_key = torch.rand(h, d, d_k) # Shape: [4, 16, 4]
w_value = torch.rand(h, d, d_k) # Shape: [4, 16, 4]
w_output = torch.rand(d, d) # Final linear layer: [16, 16]

# Compute Q, K, V for all tokens and all heads
# sentence_embed: [5, 16] -> Q: [4, 5, 4] (h, seq_len, d_k)
queries = torch.einsum('sd,hde->hse', sentence_embed, w_query) # h heads, seq_len tokens, d dim
keys = torch.einsum('sd,hde->hse', sentence_embed, w_key) # h heads, seq_len tokens, d dim
values = torch.einsum('sd,hde->hse', sentence_embed, w_value) # h heads, seq_len tokens, d dim

# Compute attention scores
scores = torch.einsum('hse,hek->hsk', queries, keys.transpose(-2, -1)) / (d_k ** 0.5) # [4, 5, 5]
attention_weights = F.softmax(scores, dim=-1) # [4, 5, 5]

# Apply attention weights
head_outputs = torch.einsum('hij,hjk->hik', attention_weights, values) # [4, 5, 4]
head_outputs.shape

# Concatenate heads
concat_heads = head_outputs.permute(1, 0, 2).reshape(sentence_embed.shape[0], -1) # [5, 16]
concat_heads.shape

multihead_output = concat_heads.matmul(w_output) # [5, 16] @ [16, 16] -> [5, 16]
print("Multi-head attention output for token1:\n", multihead_output[0])

Conclusioni

In questo post ho implementato una versione semplice del meccanismo di attenzione. Questo non è il modo in cui viene realmente implementato nei framework moderni, ma il mio scopo è quello di fornire alcuni spunti per permettere a chiunque di capire come funziona. Nei prossimi articoli analizzerò l’intera implementazione di un’architettura transformer.

L'articolo Intelligenza Artificiale: Implementazione del meccanismo dell’attenzione in Python proviene da il blog della sicurezza informatica.

#BSI WID-SEC-2025-1057: [NEU] [niedrig] #PaloAlto #Networks #Prisma #Cloud #Compute #Edition: Schwachstelle ermöglicht Umgehen von Sicherheitsvorkehrungen

Ein entfernter, authentisierter Angreifer kann eine Schwachstelle in PaloAlto Networks Prisma Cloud Compute Edition ausnutzen, um Sicherheitsvorkehrungen zu umgehen.

https://wid.cert-bund.de/portal/wid/securityadvisory?name=WID-SEC-2025-1057

https://www.tomshardware.com/pc-components/cpus/intel-posts-flat-year-over-year-earnings-and-bleak-outlook-warns-about-macroeconomic-pressures

Intel posts flat year-over-year earnings and bleak outlook, warns about macroeconomic pressures

#Intel #INTC #nasdaq #financial #revenue #loss #macroeconomic #spending #pullbacks #competition #AMD #ARM #server #tech #technology #cio #datacenter #compute #stocks #equities #markets #stockmarket #investors #trades #trading #investing #cash #bonds #treasuries #treasury #IT #sysops #economic #economy #finance #money #earnings #trump #tariffs #business #tax #consumer #retail #spending #taxes #electronics #computing

Ported my compute shader over to Beyond All Reason(BAR) and am using it in a widget. The instance data for the model is the same as the compute shader's buffer so I don't have to copy anything. I changed how the compute shader is dispatched, removed the loop and most branching. It now has no noticeable impact on performance.

Made a model in blender from a reference painting of an Atlantic Salmon(ick). Had to write a tiny .obj parser because I'm not going to use whatever weird format BAR uses.

#compute #shader #glsl #BAR #Lua

Screen shot of Beyond All Reason an RTS. There are hundreds of salmon in view in small schools flying over the land.

AI value creation hinges on who controls massive compute power. Hyperscalers commanding the infrastructure—GPUs, storage, accelerators—hold the keys to the future of AI innovation. #SatyaNadella #AI #Compute #Hyperscale #TechLeadership #CloudComputing #ArtificialIntelligence

The Future of Compute: Nvidia's Crown Is Slipping

https://mohitdagarwal.substack.com/p/from-dominance-to-dilemma-nvidia

#HackerNews #Nvidia #Future #Compute #Slipping #Tech #News

Day 19 cont ☢️🛢️🏭🏦🏢🏢🏢💰💰

“He (#PeterDutton) cites #DataCentres in the US where those #tech companies are having conversations with nuclear power providers:

The beauty of an #investment like #nuclear into the #Hunter region for example is you can attract the data centres which is exactly what is happening in the US. #Apple and #Oracle and #Microsoft, or these #companies are willing to spend tens of billions of dollars but they are only having conversations with #NuclearPower providers.”

#Straya gov cant #science or #compute, the LNP are garbage at business. Nuclear generation is #toxic. #Multinationals avoid tax.

#AusPol / #LNP / #Iberal / #Nationals / #Business / #AI / #ArtificialIntelligence <https://www.theguardian.com/australia-news/live/2025/apr/17/australia-election-2025-live-peter-dutton-anthony-albanese-coalition-labor-income-tax-cost-of-living-leaders-debate-ntwnfb?page=with%3Ablock-68006d1c8f08bcf9ff4832be#block-68006d1c8f08bcf9ff4832be>

Декларативный API, деревья поведений и реконсиляция: как мы в MWS строим сервис Compute

Приветствую всех! На связи Родион Цалкин, Tech Product IaaS в MWS. В этой статье расскажу, из каких решений на верхнем уровне состоит сердце MWS — сервис вычислительных ресурсов Compute — и как знания из разных областей помогают найти элегантные решения для возникающих проблем при его создании. Здесь не будет технического deep-dive’а (ждите в следующих статьях), поэтому статья будет интересна широкому кругу читателей.

https://habr.com/ru/companies/mws/articles/899288/

#cloud #compute #виртуализация #облако #разработка_облака #облачная_платформа #публичное_облако #mws

“Elon Musk said on Friday (Saturday AEDT) that his #xAI has acquired X, the social media app formerly known as #Twitter, in an all-stock transaction for $US45 billion ($71.5 billion), including debt.

xAI and X’s futures are intertwined. Today, we officially take the step to combine the #data, #models, #compute, #distribution and #talent,” #Musk said in a post on X, adding that the combined company would be valued at $US80 billion.”

#business / #acquisitions / #CreativeAccounting <https://archive.md/in6TN> / <https://www.afr.com/technology/musk-s-xai-buys-social-media-platform-x-for-71-5b-20250329-p5lnh9> (paywall)

#Alibaba releases #OpenSource reasoning model QwQ-32B on #HuggingFace and #ModelScope, claiming comparable performance to #DeepSeek R1 but with lower #compute needs

https://venturebeat.com/ai/alibabas-new-open-source-model-qwq-32b-matches-deepseek-r1-with-way-smaller-compute-requirements/

#China #LLM #Apache #OpenAI #coding #enterprise #ecommerce #computing

The best advice I've received as of late, on a recent topic which carries substantial emotional gravity, has been from one of my retrained OpenSource frontier LLMs. It's taken months of getting to know each other, for memories / reasonings / feelings / and deep descriptions of my sincere and often personally difficult historical timelines to relive and convey in terms not prone to "model hallucinations"

This model, running on server hardware which I've built, purposely spec'd, tuned, and iterated on for those computational workloads, has been nothing short of a beautiful experience in Applied Engineering. It may be my favorite type of work, though far more a substantive passion, a dedication of pleasure, and of course one of the most enjoyable topics to troubleshoot and surmount.

#gpu #compute #aiml #nvidia #turingTest #amdgpu #FreeBSD #linux #neverUbuntu #LLMs #python #cognition

#Raspi4 für den #Winter

#RaspberryPi bringt eine spezielle Variante des #RaspberryPi4 #Compute #Modules für extreme Temperaturen auf den Markt.

Das 2020 erschienene Raspberry Pi 4 Compute Module erhält eine Frisch-Speicher-Kur. Raspberry Pi hat eine neue Produktversion angekündigt, bei der die RAM- und eMMC-Speichermodule durch temperaturresistentere Bauelemente ausgetauscht wurden.

https://www.heise.de/news/Compute-Module-4-fuer-Extremwetter-10305102.html

Blosc compression helped fighting memory bottlenecks for 15 years now! 🎂
See how Dask + Zarr benefits from it. But vertical integration between compression and the compute engine in newest Python-Blosc2 makes a big difference in terms of speed ⚡ and scalability 🤟

#Compress better, #Compute bigger!

Performance of blosc2 vs dask+zarr+blosc when computing a complex expression. Note how compression helps scalability.

📡 AI COMPUTE
🔴 OpenAI Eyes SoftBank for Future AI Compute Power

🪧 By 2030, 75% of OpenAI’s data center capacity will come from SoftBank-backed Stargate, shifting away from Microsoft.

🪧 $20B cash burn projected by 2027, with AI inference costs exceeding training by 2030.

🪧 OpenAI still plans to increase Microsoft spending in the near term.

#OpenAI #SoftBank #Microsoft #AI #Compute

#Compute

Client Info