Hierarchical models of natural scenes

Published:

A long-standing interest of mine is to develop hierarchical probabilistic models of natural scenes. While representing real-world scenes in terms of true probabilities is a mathematician’s nightmare, approximating such a representation has proven to be an extremely effective way to model scenes in an efficient and robust way. Perhaps just as importantly, so far it has shown a high degree of correspondence with what we see in real brains. Most of my works so far have built on the Sparse Coding framework, which represents a leading candidate for computation in the early vision system of mammals.

Nonlinear disentanglement with variational probabilistic neural networks
Most recently, I led a team to conduct a research project on understanding disentanglement in neural networks. The goal of disentanglement research is to learn an encoding function that can map inputs, such as images or videos, to the underlying factors that varied to produce those images, such as an object’s position, identity, or orientation. We used neural networks that approximate probabilistic inference to encode the images. You can find a short description with figures in our tweet or get all of the details in the paper.

A nonlinear, hierarchical subspace sparse coding network
I also recently published an alternative hierarchical sparse coding model that learns to represent natural scenes in terms of subspaces. These subspaces share properties, such as the orientation of an edge, but also allow for some variation, such as the position of an edge. By learning the subspaces directly from natural image data, we are able to make concrete statements about what invariances could exist in a brain. You can learn more about it from my talk at the Neurally Inspired Computational Elements workshop or check out the paper.

The sparse manifold transform
The Sparse Manifold Transform is an exciting development in hierarchical modeling of natural scenes that was developed by my colleague Yubei Chen. The network explicitly represents videos in terms of a hierarchy of composable elements, and is heavily inspired by recent ideas in neuroscience about how the brain might represent scene information using smooth manifolds. I co-authored the paper, which gives all of the details.

Deep deconvolutional sparse coding for scene spatial frequency decomposition
While working in Gar Kenyon’s lab, I developed a model that decomposes visual scenes into spatial frequency bands using a hierarchical convolutional sparse coding model. You can read all about it from the paper. The work has been extended by Edward Kim et al. in this paper, which presents a very interesting model of higher level brain processing.