Animesh Garg

Machine Learning for Perception and Control in Robotics.
Assistant Professor in AI Georgia Tech and Univ of Toronto. Also at @NvidiaAI.
Here via @StanfordAILab, @berkeley_ai

2023-01-27

@mayankm155@twitter.com @UofTRobotics@twitter.com @leggedrobotics@twitter.com @nvidiaomniverse@twitter.com Importantly, this is an open effort!

We provide exemplar integrations in workflows
isaac-orbit.github.io/orbit/so

We welcome community contributions in both new environments and interfaces to algorithmic implementations (both Learning and Planning).

2023-01-27

RT @DrJimFan@twitter.com

Data is the new oil. But the physical world is too slow for robots to collect massive training data.

So let’s just speeding up reality 1,000x. In simulation. With GPUs. RTX on!

@NVIDIAAI@twitter.com introduces ORBIT on IsaacSim, a GPU-powered virtual Gym for robots to work out:

1/🧵

🐦🔗: twitter.com/DrJimFan/status/16

2023-01-27

I am proud to share Isaac Orbit
isaac-orbit.github.io/

Orbit aims to unify all of the environments in IsaacSim ecosystem while providing an intuitive multi-functional API.

@mayankm155@twitter.com co-led this effort with a team from @UofTRobotics@twitter.com @leggedrobotics@twitter.com & @nvidiaomniverse@twitter.com

🧵👇

2023-01-26

SlotFormer is a wonderful collaboration with @Dazitu_616@twitter.com, @Dvornik_Nikita@twitter.com, Klaus Greff, & @thomaskipf@twitter.com
Paper: arxiv.org/abs/2210.05861
Project page: slotformer.github.io/
Code: github.com/pairlab/SlotFormer

Hoping to see folks in Rwanda at @iclr_conf@twitter.com
@VectorInst@twitter.com @UofTCompSci@twitter.com

(end)

2023-01-26

* How does SlotFormer improve downstream action planning tasks? *

We also apply SlotFormer as the world model to the action planning benchmark PHYRE. We achieve competitive results with supervised task-specific baselines.

(12/N)

2023-01-26

* How does SlotFormer improve downstream VQA tasks? *

SlotFormer is task-agnostic, and turns out it can serve as the dynamics simulator to achieve SOTA performance on both CLEVRER and Physion VQA datasets.

(11/N)

2023-01-26

* How does SlotFormer perform in video prediction? *

SlotFormer performs the best in both visual quality and object dynamics.

(10/N)

2023-01-26

* Why is it important? *
Unsupervised dynamics learning not only enables generation tasks such as video prediction, but also can transfer the learned knowledge to improve downstream tasks such as VQA.

(3/N)

2023-01-26

The graphic below is a step-by-step explanation of SlotFormer

(9/N)

2023-01-26

*What is the task?*
Visual dynamics learning aims to learn scene dynamics from raw videos without any supervision
(2/N)

2023-01-26

*What is SlotFormer*?

It is a Transformer-based autoregressive dynamics model that performs joint spatio-temporal reasoning over slots.

SlotFormer applies Transformer to slots from multiple timesteps, and predict the slots of the next step.

(8/N)

Differences between SlotFormer and previous works
2023-01-26

* Why object-centric representations? *

Humans perceive the world with discrete concepts. We usually abstract visual inputs to concepts such as objects, events, and then reason over these concepts.

(6/N)

2023-01-26

* Global vs object-centric representation *

Instead, our model builds upon object-centric representations, which can unsupervisedly decompose a scene into separate object features (slots), and explicitly model the interactions between them.

(Video credit: SAVi.)

(5/N)

2023-01-26

* Pixel-space video prediction *

Common solution: video prediction in the pixel space, which applies RNNs over global CNN feature maps. However, they usually produce degenerated results.
(4/N)

2023-01-26

Object-oriented world models is *the* key for reasoning.
But, unsupervised task-agnostic methods are hard!

SlotFormer, at ICLR2023, is an unsupervised video prediction model that also works for tasks: VQA and model-based planning
slotformer.github.io/

Read on for more!

2023-01-26

* How is previous object-centric dynamics models? *

They typically separate spatial interaction (via GNN or Transformer) and temporal dynamics (via LSTM). However, the single-step context window of LSTM still leads to inconsistent long-term rollout.

(7/N)

Animesh Garg boosted:
2023-01-25

Last was an enthralling talk by @animesh_garg on building generally autonomous #robots at #CSAIL. The argument that making truly autonomous systems requires embodiment, pre-defined causal and world models, and data is explored from a variety of perspectives here, demonstrating some impressive results. Highly recommend youtube.com/watch?v=jHf8ysoqyC (10/10)

2023-01-25

here is a story rolling out in real time.
What the broader community thinks of as AI advances are not only due to the original inventions, but through iterative tinkering and discovery!

@markchen90@twitter.com from @OpenAI@twitter.com has a similar take on it. twitter.com/markchen90/status/

2023-01-25

I completed PhD, spent years in postdoc and finally got a job!
Getting a real job as I nearly turned 30 felt like an achievement.
Albeit a short lived one!
‘coz Bay Area!

This 🧵is 🔥
…a family of four with dual incomes of ~200K each is just making do!

2023-01-25

RT @rachelmetz@twitter.com

For a family of four in the Bay Area today, what do you think two adults *each* need to make, on average, to afford a mortgage (don’t forget property tax), childcare for 2 young kids, and general life expenses in the Bay Area today? (Excluding any “fun” stuff like a vacation.)

🐦🔗: twitter.com/rachelmetz/status/

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst