York researcher improves video surveillance systems

Every time you look around, there seem to be more surveillance cameras in public and private spaces. Yet these devices may offer a false sense of security, since existing commercial cameras have various technical shortcomings.

A new vision system developed by York Professor James Elder and his team in the Centre for Vision Research may soon make surveillance systems more accurate and efficient by improving the way security cameras capture images.

Right: Professor James Elder

The new system “was inspired by the human visual system,” says Elder, who teaches in York’s Department of Psychology, Faculty of Health and in the Department of Computer Science & Engineering, Faculty of Science & Engineering. His research in this field has earned him the Premier’s Research Excellence Award and the Young Investigator Award from the Canadian Imaging Processing and Pattern Recognition Society.

"For visual perception," says Elder, "it’s important to have two views." One is "a high-resolution or attentive view which allows you to make fine discriminations”, while the other is a large field or "pre-attentive" view. The former allows us to focus in on an object, such as a page of a book when reading or on a person we recognize in the distance. It is also used when we are performing detailed tasks, such as threading a needle. The pre-attentive view, on the other hand, allows us to be aware of our surroundings, for example, to perceive when someone is approaching us.

“But these two goals of vision are conflicting,” says Elder. To get a high resolution view, our eye requires many light-sensing elements, called photoreceptors. For us to see at a high resolution across a large field of view, says Elder, we would need a vast number of photoreceptors and neurons in the brain in order to process all the detailed information. “The brain would need to be 10 times as big as it is,” he says. “It wouldn’t be possible for the human shoulders to hold it up.”

Right: The new vision system developed by Elder combines two cameras; the top gives a high-resolution, attentive view, the bottom, a wide or pre-attentive view

The human visual system has evolved an effective hybrid solution to this problem. Although the human field of view is almost hemispheric, only within a small central area of the retina, called the fovea, is photoreceptor density high enough to provide the resolution needed for fine visual tasks. Photoreceptor density falls off dramatically outside the retina, providing a much lower resolution view in the periphery.

In humans, “these two systems, the foveal and peripheral, work together through a fast-orientating mechanism,” says Elder. While we are using our foveal system to focus on the road ahead when driving, our peripheral system can detect when a car suddenly approaches to pass on one side.  We may then quickly reorient our gaze to gather more information if necessary.

Artificial visual systems used commercially do not have the capacity to coordinate views this way. If a surveillance camera is zoomed-in to get a high-resolution view of a doorway, a criminal act may be taking place just outside the field of view, yet the camera will be unable to register it.

But the new system Elder and his team have developed may change this as it successfully replicates the way in which the human visual system combines foveal and peripheral views.

The system consists of two cameras mounted one on top of the other. The bottom (preattentive) camera is equipped with a wide-angle lens, providing a low-resolution view of the entire scene, while the top (attentive) camera is equipped with a narrower lens, providing a higher-resolution view of a selected portion of the scene.  The attentive camera is mounted on a motorized pan-tilt unit, so that the portion of the scene to be viewed at high-resolution can be selected through a computer.

The crux of the system is a set of computer algorithms that can instruct the attentive camera where to look in the wide-view scene if there is something out of the ordinary taking place. The patented system can merge the two views so security personnel monitoring an area can see over a wide-field view while focusing in on a specific point. “The idea of the sensor is that you have two components: one that is constantly aware of the whole visual scene and another that can get detailed information. That does not exist in any commercial system,” says Elder.

Left: The vision system’s algorithms allow it to focus in on a specific part of a scene

A face-detection component is another of its unique features.

Many practical applications in surveillance, security and crime-prevention come to mind, says Elder. The system could potentially be used in large buildings such as airports, factories or nuclear power plants to both monitor vast areas and focus in quickly on disturbances or trespassers in specific areas. For forensic purposes, the system could also make surveillance more efficient. It can potentially note unusual events and register when they occurred, saving personnel the trouble of viewing many hours of video footage after a crime has taken place.

There are also applications in other fields, such as remote learning or teleconferencing, says Elder. For example, a professor in Sudbury remotely teaching a class based at York would be able to see a wide-view of all the students, then focus in on the face of one particular student who raises his or her hand to ask a question.

Architectural planning is another other area where it could be useful, says Elder. Using the vision system, architects could track how many people are in a city square at different times of day and the routes they take. This information could help them design buildings and public spaces more efficiently. Retailers could also use this type of flow-management information to design store layouts in a way that maximizes profits.

A number of companies have expressed interest in the project. The Department of National Defence has also seen the potential of the vision system’s algorithms for search and rescue operations. “We’re just on the cusp in Canada of introducing advanced imaging technologies, such as thermal infrared TV, that could potentially improve our ability to detect search and rescue targets from the air,” says Elder. Deployment of these technologies, coupled with assisted target detection algorithms, may mean that hiking and boating accident victims will have a better chance of being located in time.

The next step for Elder and his research team is to develop the system further so that it recognizes individuals.  He is collaborating with colleagues at University College London in the UK on this portion of the project.

Elder credits the excellent graduate students, postdoctoral fellows and research scientists, as well as the general research environment at York, in producing the invention, which combines research in two academic disciplines. “The Centre for Vision Research is an excellent environment to combine interdisciplinary knowledge – from the study of human perception to these technologies, which are an engineering endeavour," says Elder. "That is something that is brave and unique about York – the University supports that kind of interdisciplinary mix.”

This article was written by Olena Wawryshyn, York communications officer.