Research
My research interests are generally in deep learning (self-supervision, reasoning, scaling) and their applications
to computer vision and embodied systems.
|
|
Scaling Properties of Diffusion Models For Perceptual Tasks
Rahul Ravishankar*, Zeeshan Patel*,
Jathushan Rajasegaran, Jitendra Malik
project page /
arXiv /
code
We show how diffusion models benefit from scaling training and test-time compute for perceptual tasks and unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation.
|
|
An Empirical Study of Autoregressive Pre-training from Videos
Jathushan Rajasegaran,
Ilija Radosavovic, Rahul Ravishankar,
Yossi Gandelsman, Christoph Feichtenhofer,
Jitendra Malik
project page /
arXiv /
code [coming soon]
We trained LLaMA models up to 1 billion parameters on 1 trillion visual tokens. The resulting model can do diverse tasks including image and video recognition, video tracking, action prediction, and robotics. We also study the scaling properties of these family of models.
|
|