Lmst

Ever wonder how neural networks actually learn? It all starts with a simple but powerful concept from calculus: the chain rule.
📚 Dive in and level up your math game:
👉 https://machinelearningmastery.com/the-chain-rule-of-calculus-for-univariate-and-multivariate-functions/
#MachineLearning #DeepLearning #Backpropagation #NeuralNetworks #DataScience #AI

🚀 Behold, the groundbreaking revelation that neural networks can be trained without #backpropagation or forward-propagation! 😲 Why bother with actual #science when you can just wave your hands and hope for the best? 🤦‍♂️ Thank you, #arXiv, for this enlightening display of 🤡 #innovation.
https://arxiv.org/abs/2503.24322 #neuralnetworks #machinelearning #HackerNews #ngated

NoProp: Training Neural Networks without Back-propagation or Forward-propagation

https://arxiv.org/abs/2503.24322

#HackerNews #NoProp #Neural #Networks #Training #Backpropagation #Innovation #AI

NoProp: Реальный опыт обучения без Backprop – от провала к 99% на MNIST

Всем привет! Обучение нейронных сетей с помощью обратного распространения ошибки (backpropagation) — это стандарт де‑факто. Но у него есть ограничения: память, последовательные вычисления, биологическая неправдоподобность. Недавно я наткнулся на интересную статью « NOPROP: TRAINING NEURAL NETWORKS WITHOUT BACK‑PROPAGATION OR FORWARD‑PROPAGATION » (Li, Teh, Pascanu, arXiv:2403.13 502), которая обещает обучение вообще без сквозного backprop и даже без полного прямого прохода во время обучения ! Идея показалась захватывающей, и мы (я и ИИ‑ассистент Gemini) решили попробовать ее реализовать на PyTorch для MNIST. В этой статье я хочу поделиться нашим путешествием: как мы пытались следовать описанию из статьи, с какими трудностями столкнулись, как анализ связанных работ помог найти решение (которое, правда, отличается от оригинала) и каких впечатляющих результатов удалось достичь в итоге. Спойлер: получилось интересно, совсем не так, как ожидалось, но результат превзошел ожидания от процесса отладки. Дисклеймер 1: Это рассказ об учебном эксперименте. Результаты и выводы основаны на нашем опыте и могут не полностью отражать возможности оригинального метода при наличии всех деталей реализации.)

https://habr.com/ru/articles/900186/

#нейронные_сети #нейронные_сети_и_машинное_обучение #машинное_обучение #deep_learning #noprop #DDPM #backpropagation #research #искусственный_интеллект

Алгоритм Backpropagation на Python

Привет, Хабр! Алгоритм backpropagation , или обратное распространение ошибки, является некой базой для тренировки многослойных перцептронов и других типов искусственных нейронных сетей. Этот алгоритм впервые был предложен Полем Вербосом в 1974 году, а позже популяризирован Дэвидом Румельхартом, Джеффри Хинтоном и Рональдом Уильямсом в 1986 году.

https://habr.com/ru/companies/otus/articles/816667/

#ml #otus #python #backpropagation

I read #Rumelhart's #backpropagation paper in 1986. It was a stunner. It changed my life.

http://www.cs.utoronto.ca/~hinton/absps/naturebp.pdf

And I read #Vaswani's #transformer in 2017. It was a groundbreaking paper. It changed the world.

https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

It could be argued that transformers, through their use in #LLMs, have had far greater impact upon society, compared to backpropagation. On the other hand, there would be no modern #ML, but for backpropagation. So, it's a toss-up.

But to me, Rumelhart's paper is superior to Vaswani's, at least in terms of clarity, concision, coherence, and other indicia of writing style.

Backpropagation для взрослых: простое обьяснение

Перед обсуждением обратного распространения ошибки давайте рассмотрим, что такое нейронная сеть? Концептуально — что она делает — она пытается преобразовать ряд входных данных (например, изображения) в ряд выходных данных (ответы на вопросы, например, есть ли на этих картинках собаки) посредством процесса трансформации этих изображений, пропуская их через сеть нейронов. Изображения — это просто массивы байтов, так как это происходит?

https://habr.com/ru/companies/raft/articles/811371/

#backpropagation #обратное_распространение_ошибки #градиентный_спуск #нейросети #описание

Today's lesson in #machinelearning : you can't analytically differentiate a physical process.

#backpropagation failed.
Trying to apply reinforcement learning to control a cooling fan now.

https://github.com/YuhangSong/Prospective-Configuration
https://www.nature.com/articles/s41593-023-01514-1
#brain #Prospective-Configuration #backpropagation #learning #AI

RE: https://mastodon.social/users/eLearningworld/statuses/111890765589418177

On #biological vs #artificialintelligence and #neuralnetworks
Just skimmed through "Inferring neural activity before plasticity as a foundation for learning beyond backpropagation" by Yuhang Song et al. https://www.nature.com/articles/s41593-023-01514-1

Quite interesting but confusing, as I come from #backpropagation DL.
If I got it right, the authors focus on showing how and why biological neural networks would benefit from being Energy Based Models for Predictive Coding, instead of Feedforward Networks employing backpropagation.
I struggled to reach where they explain how to optimize a ConvNet in PyTorch as an EB model, but they do: there is an algorithm and formulae, but I'm curious about how long and stable training is, and whether all that generalizes to typical computer vision architectures (ResNets, MobileNets, ViTs, ...).
Code is also #opensource at https://github.com/YuhangSong/Prospective-Configuration

I would like to sit a few hours at my laptop and try to better see and understand, but I think in the next days I will go to Modern #HopfieldNetworks. These too are EB and there's an energy function that is optimised by the #transformer 's dot product attention.
I think I got what attention does in Transformers, so I'm quite curious to get in what sense it's equivalent to consolidating/retrieving patterns in a Dense Associative Memory. In general, I think we're treating memory wrong with our deep neural networks. I see most of them as sensory processing, shortcut to "reasoning" without short or long term memory surrogates, but I could see how some current features may serve similar purposes...

Why the Godfather of A.I. Fears What He’s Built

https://www.newyorker.com/magazine/2023/11/20/geoffrey-hinton-profile-ai #AI #NeuralNetworks #Connectionism #LearningTheory #Backpropagation #Feelings

Neural Networks
(1991) : Freeman, James A. Skapura, Dav...
isbn: 0201513765
#text_book #neural_network #simulated_annealing #algorithm #backpropagation #my_bibtex

Neural Networks
(1991) : Freeman, James A. Skapura, Dav...
isbn: 0201513765
#backpropagation #neural_network #simulated_annealing #text_book #algorithm #my_bibtex

Neural Networks
(1991) : Freeman, James A. Skapura, Dav...
isbn: 0201513765
#backpropagation #text_book #algorithm #simulated_annealing #neural_network #my_bibtex

Neural Networks
(1991) : Freeman, James A. Skapura, Dav...
isbn: 0201513765
#simulated_annealing #algorithm #neural_network #text_book #backpropagation #my_bibtex

A new type of #neuralnetworks and #AI 1/3

I've been thinking that #backpropagation based neural networks will reach their peak (if they haven't already), and it may be interesting to search for a new learning method. Some observations and ideas:

The two main modes of #neuralnetworks - training when weights are adjusted, and prediction when states change should be merged. After all real-life brains do prediction and learning at the same time, and they are not restarted for every task. ...

Concept backpropagation: An Explainable AI approach for visualising learned concepts
https://arxiv.org/abs/2307.12601

* concept detection method (concept backpropagation) for analysing how information representing a concept is internalised in a neural network

* allows visualisation of the detected concept directly in the input space of the model, to see what information the model depends on for representing the described concept

#NeuralNetworks #ML #MachineLearning #backpropagation

Extending the Forward Forward Algorithm
https://arxiv.org/abs/2307.04205

The Forward Forward algorithm (Geoffrey Hinton, 2022-11) is an alternative to backpropagation for training neural networks (NN)

Backpropagation - the most widely successful and used optimization algorithm for training NN - has 3 important limitations ...

Hinton's paper: https://www.cs.toronto.edu/~hinton/FFA13.pdf
Discussion: https://bdtechtalks.com/2022/12/19/forward-forward-algorithm-geoffrey-hinton
...

#GeoffHinton #ForwardPropagation #NeruralNetworks #parametrization #BackproPagation #LossFunction

Absorbing Phase Transitions in Artificial Deep Neural Networks
https://arxiv.org/abs/2307.02284

To summarize, we believe that the this work places the order-to-chaos transition in the initialized artificial deep neural networks in the broader context of absorbing phase transitions, & serves as the first step toward the systematic comparison between natural/biological & artificial neural networks.
...

#NeuralNetworks #MeanFieldTheory #SignalPropagation #PhaseTransitions #backpropagation #feedforward

A perspective on #chatGPT (or Large Language Models #LLMs in general): #Hype or milestone?

[Rodney Brooks (https://spectrum.ieee.org/amp/gpt-4-calm-down-2660261157) tells us that

What large language models are good at is saying what an answer should sound like, which is different from what an answer should be.

For a nice in-depth technical analysis, see this blog post by Stephen Wolfram (himself!) on "What is ChatGPT Doing ... and Why Does It Work? ". Worth reading -even for non-experts- in a non-trivial effort to make the whole process explainable. The different steps are:

#LLMs compute probabilities for the next word. To do this, they aggregate huge datasets of text so that they create a function that, given a sequence of words, computes for all possible words in the dictionary the probability that adding this new word is statistically congruent with past words. Interestingly, this probability, conditioned on what has been observed so far, falls of as a power law, just like the global probability of words in the dictionary,
These #probabilities are computed by a function that leans on the dataset to generate the best approximation. Wolfram makes a minute description of how to do such an approximation, starting from linear regression to using non-linearities. This leads to deep learning methods and their potential for universal function approximators,
Crucial is how these #models are trainable, in particular by way of #backpropagation. This leads the author to describe the process, but also to point out some limitations of the trained model, especially, as you might have guessed, compared to potentially more powerful systems, like #cellularautomata of course...
This now brings us to #embeddings, the crucial ingredient to define "words" in these #LLMs models. To relate "alligator" to "crocodile" vs. a "vending machine," this technique computes distances between words based on their relative distance in the large dataset of text corpus, so that each word is assigned an address in a high-dimensional space, with the intuition that words that are syntactically closer should be closer in the embedding space. It is highly non-trivial to understand the geometry of high-dimensional spaces - especially when we try to relate it to our physical 3D space - but this technique has proven to give excellent results, I highly recommend the #cemantix puzzle to test your intuition about word embeddings: https://cemantle.certitudes.org
Finally, these different parts are glued together by a humongous #transformer network. A standard #NeuralNetwork could perform a computation to predict the probabilities for the next word, but the results would mostly give nonsensical answers... Something more is needed to make this work. Just as traditional Convolutional Neural Networks #CNNs hardwire the fact that operations applied to an image should be applied to nearby pixels first, transformers do not operate uniformly on the sequence of words (i.e., embeddings), but weight them differently to ultimately get a better approximation. It is clear that much of the mechanism is a bunch of heuristics selected based on their performance - but we can understand the mechanism as giving different weights to different tokens - specifically based on the position of each token and its importance in the meaning of the current sentence. Based on this calculation, the sequence is reweighted so that a probability is ultimately computed. When applied to a sequence of words where words are added progressively, this creates a kind of loop in which the past sequence is constantly re-processed to update the generation.
Can we do more and include syntax? Wolfram discusses the internals of #chatGPT, and in particular how it trained iOS to "be a good bot" - and adds another possibility, which is to inject the knowledge that language is organized grammatically, and whether #transformers are able to learn such rules. This points to certain limitations of the architecture and the potential of using graphs as a generalization of geometric rules. The post ends with a comparison of #LLMs, which just aim to sound right, with rule-based models, a debate reminiscent of the older days of AI...

#Backpropagation

Client Info