M.S. Thesis Defense - Tianshun Miao

“A Video Processing Framework Based on Entity Detection, Tracking-Learning-Detection, State Extraction and Decision-making”

January 7, 2015
12:30 pm - 2:30 pm
Location
105 Cummings Hall
Sponsored by
Thayer School
Audience
Public
More information
Daryl Laware

Thesis Committee

Laura Ray, Ph.D. (Chair)

Richard Granger, Ph.D.

Eugene Santos Jr., Ph.D.

Abstract

 

In this thesis, a video processing framework is developed to make decisions by extracting and clustering moving entities’ state information in a video. This framework aims to detect and track moving entities in a video, so as to extract “state” information regarding the entities without formally identifying or classifying the entities. The moving entities are then differentiated for decision-making by clustering their state information.

The framework is based on entity detection, multi-object Tracking-Learning-Detection (TLD), state extraction and a decision system. The entity detection uses frame differencing and blob detection to provide candidate patches of moving entities that the TLD framework subsequently tracks. The state extraction system generates the state vector based on the information from the TLD framework. The elements of the state vector are derived from the position information of moving entities in the video frames. It is hypothesized that using only information gleaned from moving entity detection and tracking, many tasks can be accomplished or decisions made concerning the video contents. In this thesis, states identified by the detection, tracking and state extraction serve as inputs for training a decision system, such as a Bayesian network, to make inference about a high-level state or characterization of the video as directed by a specific task or question. Additionally, the output of the TLD framework and/or Bayesian network can be projected back to the input of the entity detection, the TLD, or state extraction in order to reinforce the performance of the framework, and to direct attention away from portions of the frame providing no useful state information for the decision process.

The framework is tested on a video dataset comprising of an hour of video with natural street scenes, including the appearance of pedestrians, cars, traffic lights, pedestrian walk signal lights and other objects. The procedures and instruments for gathering this dataset are covered in this thesis. The functionality of the subsystems and the feedback mechanism framework is analyzed based on the performance of the experiments.

Location
105 Cummings Hall
Sponsored by
Thayer School
Audience
Public
More information
Daryl Laware