Event Camera as Region Proposal Network

The human eye consists of two types of photoreceptors, rods and cones. Rods are responsible for monochrome vision, and cones for color vision. The number of rods is much higher than the cones, which means that most human vision processing is done in monochrome. An event camera reports the change in pixel intensity and is analogous to rods. Event and color cameras in computer vision are like rods and cones in human vision. Humans can notice objects moving in the peripheral vision (far right and left), but we cannot classify them (think of someone passing by on your far left or far right, this can trigger your attention without knowing who they are). Thus, rods act as a region proposal network (RPN) in human vision. Therefore, an event camera can act as a region proposal network in deep learning Two-stage object detectors in deep learning, such as Mask R-CNN, consist of a backbone for feature extraction and a RPN. Currently, RPN uses the brute force method by trying out all the possible bounding boxes to detect an object. This requires much computation time to generate region proposals making two-stage detectors inconvenient for fast applications. This work replaces the RPN in Mask-RCNN of detectron2 with an event camera for generating proposals for moving objects. Thus, saving time and being computationally less expensive. The proposed approach is faster than the two-stage detectors with comparable accuracy.

  • Published in:
    arXiv
  • Type:
    Article
  • Authors:
    Awasthi, Shrutarv; Gouda, Anas; Lodenkaemper, Julian Richard; Roidl, Moritz
  • Year:
    2023

Citation information

Awasthi, Shrutarv; Gouda, Anas; Lodenkaemper, Julian Richard; Roidl, Moritz: Event Camera as Region Proposal Network, arXiv, 2023, https://arxiv.org/abs/2305.00718, Awasthi.etal.2023a,

Associated Lamarr Researchers

lamarr institute person Gouda Anas - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

Anas Gouda

Autor to the profile