Counting People in Crowds with a Real-Time Network of Simple Image Sensors

Danny B. Yang    Héctor H. González-Baños     Leonidas J. Guibas

In Proc. IEEE International Conference on Computer Vision, Nice, France, Oct. 2003

Abstract

Estimating the number of people in a crowded environment is a central task in civilian surveillance. Most vision-based counting techniques depend on detecting individuals in order to count, an unrealistic proposition in crowded settings. We propose an alternative approach that directly estimates the number of people. In our system, groups of image sensors segment foreground objects from the background, aggregate the resulting silhouettes over a network, and compute a planar projection of the scene's visual hull. We introduce a geometric algorithm that calculates bounds on the number of persons in each region of the projection, after phantom regions have been eliminated. The computational requirements scale well with the number of sensors and the number of people, and only limited amounts of data are transmitted over the network. Because of these properties, our system runs in real-time and can be deployed as an untethered wireless sensor network. We describe the major components of our system, and report preliminary experiments with our first prototype implementation.

ICCV paper


Results

An experimental run counting 5 people entering and exiting the area.
Count for 5 people (Please see paper for details).
This video shows the data sent from the cameras and the computed projections of the visual hull during the run with 5 people. The windows on the left show the occluded scanlines reported by the cameras. The bottom right window shows the projected visual hull. The upper right window shows the visual hull after pruning. Polygons with lower bound of at least 1 are colored bright green. Additionally, circles show the optional localization step following counting.
Video of projection for 5 people

An experimental run counting 8 people entering and exiting the area.
Count for 8 people
The corresponding video.
Video of projection for 8 people

An experimental run localizing 4 people. 7 views were used for the localization, and the predicted locations were projected into an 8th unused view. The following video shows the predicted locations in this unused view. The height of the horizontal bar corresponds to the predicted distance from the camera.
Localization video
The corresponding video of the projected visual hull showing the localization of the 4 people.
Localization of 4 people


Danny Yang / dbyang@stanford.edu