Industry Talks

Alexey Dosovitskiy
Google AI Brain, Berlin, Germany

Towards non-convolutional architectures for recognition and generation

Abstract: Convolutional networks are the workhorses of modern computer vision, thanks to their efficiency on hardware accelerators and the inductive biases suitable for processing and generating images. However, ConvNets spend an equal amount of compute at each location in the input, which makes them convenient to implement and train, but can be extremely computationally inefficient, especially on high-dimensional inputs such as video or 3D data. Moreover, representations extracted by ConvNets lack interpretability and systematic generalization. In this talk, I will present our recent work towards models that aim to avoid these shortcomings by modeling the sparse structure of the real world. On the image recognition front, we are investigating architectures for learning object-centric representations either with or without supervision, as well as ways to scale these models from simple synthetic settings towards real-world data. For image generation, we scale a recent implicit-3D-based neural rendering approach, Neural Radiance Fields, from controlled small-scale datasets to noisy large scale real-world data.

Bio: Alexey Dosovitskiy is a Senior Research Scientist at Google Brain. He received MSc and PhD degrees in mathematics (functional analysis) from Moscow State University in 2009 and 2012 respectively. Alexey then spent 2013-2016 as a postdoctoral researcher with Prof. Thomas Brox at the Computer Vision Group of the University of Freiburg, working on various topics in deep learning, including self supervised learning (Exemplar CNN), image generation with neural networks, motion and 3D structure estimation (FlowNet, DeMoN). In 2017 Alexey joined Intel Visual Computing Lab led by Dr. Vladlen Koltun and spent two years working on applications of deep learning to sensorimotor control, including autonomous driving (CARLA simulator) and robotics. In April 2019 Alexey joined Google Brain in Berlin. His current research interests are focused on exploring non-convolutional architectures for recognition and generation.