Combining multiple visual processing streams for locating and classifying objects in video

Published in IEEE Southwest Symposium on Image Analysis and Interpretation, 2012

Paper link

Abstract:
Automated, invariant object detection has proven itself to be a substantial challenge for the artificial intelligence research community. In computer vision, many different benchmarks have been established using whole-image classification based on datasets that are too small to eliminate statistical artifacts. As an alternative, we used a new dataset consisting of ~62GB (on the order of 40,000 2Mpixel frames) of compressed high-definition aerial video, which we employed for both object classification and localization. Our algorithms mimic the processing pathways in primate visual cortex, exploiting color/texture, shape/form and motion. We then combine the data using a clustering technique to produce a final output in the form of labeled bounding boxes around objects of interest in the video. Localization adds additional complexity not generally found in whole-image classification problems. Our results are evaluated qualitatively and quantitatively using a scoring metric that assessed the overlap between our detections and ground-truth.

Recommended citation:
DM Paiton, SP Brumby, GT Kenyon, GJ Kunde, KD Perterson, MI Ham, PF Schultz, JS George, “Combining multiple visual processing streams for locating and classifying objects in video,” IEEE Southwest Symposium on Image Analysis and Interpretation, 2012, pp. 49-52, doi: 10.1109/SSIAI.2012.6202450.

@INPROCEEDINGS{paiton2012combining,
  author={Paiton, DM and Brumby, SP and Kenyon, GT and Kunde, GJ and Peterson, KD and Ham, MI and Schultz, PF and George, JS},
  booktitle={2012 IEEE Southwest Symposium on Image Analysis and Interpretation},
  title={Combining multiple visual processing streams for locating and classifying objects in video},
  year={2012},
  volume={},
  number={},
  pages={49-52},
  doi={10.1109/SSIAI.2012.6202450}
}