How does training shape the Riemannian geometry of deep neural network representations?
Excited to (belatedly) share our updated preprint, with Shang Yang, Julian Rubinfien , and @cpehlevan: https://arxiv.org/abs/2301.11375!
Harvard Junior Fellow, interested in natural and artificial intelligence.
How does training shape the Riemannian geometry of deep neural network representations?
Excited to (belatedly) share our updated preprint, with Shang Yang, Julian Rubinfien , and @cpehlevan: https://arxiv.org/abs/2301.11375!
Thanks again to the great team who worked on this! Check out our preprint https://arxiv.org/abs/2301.11375 for details, and please reach out with any questions or comments! (15/n, n = 15)
…and also possibly for enhancing the interpretability of data pruning methods that rely on measuring similarities in terms of a pretrained embedding, like https://arxiv.org/abs/2303.09540 (14/n)
We think this idea opens up many interesting avenues for future investigation, including whether or not this area expansion is generally helpful for generalization (i.e., when do networks become too sensitive to small perturbations, as in adversarial attacks), … (13/n)
We can also apply this analysis to feature maps trained with self-supervised learning; for Barlow Twins (https://arxiv.org/abs/2103.03230) we observe broadly similar behavior. (11/n)
Formally, this framework requires differentiability of the feature map — the ResNets above have GELU activations—but we can apply the same analysis to ReLU networks, ignoring points of non-differentiability, and we see that the picture is qualitatively the same. (10/n)
…and two-dimensional slices. (9/n)
What about more realistic networks and tasks?
We can visualize low-dimensional slices through image space for ResNets trained on CIFAR-10 images, and we observe qualitatively consistent behavior, both for one dimensional… (8/n)
We then turn to trained networks. If we train a single-hidden-layer MLP on the same toy task considered by Amari and Wu, we see that a similar expansion of volume elements emerges over training (7/n)
Here, we ask whether magnification of areas near decision boundaries emerges through training in standard deep nets.
As a baseline, we characterize the geometry induced by simple neural networks in the kernel regime, extending work by Cho and Saul https://arxiv.org/abs/1112.3712 (6/n)
To achieve this, Amari and Wu hand-designed an iterative kernel learning algorithm.
Since then, deep neural networks trained simply with empirical risk minimization have largely eclipsed SVMs and hand-tuned kernel learning methods in popularity. (5/n)
They proposed that, for classification tasks, it’s desirable to have a kernel such that the spatial resolution is increased near the decision boundary — in Riemannian terms, that the volume element is enlarged there — so that the separability of classes is improved (4/n)
Back in 1999, Amari and Si Wu proposed a method to improve SVMs by learning a data-adaptive kernel: https://sciencedirect.com/science/article/abs/pii/S0893608099000325 (3/n)
In our paper, we view neural network representations as inducing a Riemannian metric — a local measure of distance — on input space. This perspective goes back (at least) to work by Christopher Burges and Shun’ichi Amari on kernel machines, e.g., https://dl.acm.org/doi/10.5555/299094.299100 (2/n)
How does training shape the Riemannian geometry of deep neural network representations?
Excited to (belatedly) share our updated preprint, with Shang Yang, Julian Rubinfien , and @cpehlevan: https://arxiv.org/abs/2301.11375!
If you’re attending #cosyne, come check out Poster II-119 from Hamza Chaudhry, me, Dima Krotov, and @cpehlevan on how modern Hopfield networks with asymmetric Hebbian learning rules can be used to store large sequences of patterns!
Here is a recent review that pays homage to Mark Stokes.
Buschman, T. J., & Miller, E. K. (2022). Working memory is complex and dynamic, like your thoughts.
https://doi.org/10.1162/jocn_a_01940
Part of a Mark Stokes tribute issue of JOCN
https://direct.mit.edu/jocn/issue/35/1
PDF available on our lab webpage:
https://ekmillerlab.mit.edu/publications/
#NeuroPaperThread #NeuroNewPaper
1) Our article “The geometry of cortical representations of touch in rodents” with @chrisXrodgers Randy Bruno and @StefanoFusi is finally out! In brief, we found that whisker contacts in mice S1 are represented in approximately orthogonal subspaces https://www.nature.com/articles/s41593-022-01237-9 🧵👇
🧠 Internship at NTT Research at Harvard! 🤖
Want to solve cutting-edge problems in deep learning by theory-guided algorithm design?
Want to apply ideas in ML to understand the brain?
Come join us this summer at the Center for Brain Science at Harvard!
Our young group funded by NTT Physics and Informatics Lab (https://ntt-research.com/phi/) uniquely bridges industry and academia, focusing on the intersection of physics, neuroscience, and machine learning.
see works here: https://sites.google.com/view/htanaka/home
My (Mastodon-less) colleague Shanshan Qin's paper with @cpehlevan, Mitya Chklovskii, and others on models for representational drift is finally published! Check it out: https://www.nature.com/articles/s41593-022-01225-z!