Challenge Description

The objective of the "Wide-Area Activity Search and Recognition Challenge" is to search a video given a short query clip in a wide-area surveillance scenario. Our intention is to encourage the development of activity recognition strategies that are able to incorporate information from a network of cameras covering a wide-area. The UCR-Videoweb dataset provided in this challenge has activities that are viewed from 4-8 cameras and allows us to test performance in a camera network. For each query, a clip video containing a specific activity is provided, and the contestants are expected to search for similar videos. The contestants are free to analyze the query automatically and get training data from other sources to build suitable classifiers, but they should NOT assume that the query is manually labeled to get such data.

  •     Wide Area Activity Sample Snapshot

Dataset

The Videoweb dataset consists of about 2.5 hours of video observed from 4-8 cameras. The data is divided into a number of scenes that were collected over many days. Each scene is observed by a camera network where the actual number of cameras changes by scene due the nature of the scene. For each scene, the videos from the cameras are available. Annotation is available for each scene and the annotation convention is described in the dataset. It identifies the frame numbers and camera ID for each activity that is annotated. The videos from the cameras are approximately synchronized. The videos contain several types of activities including throwing a ball, shaking hands, standing in a line, handing out forms, running, limping, getting into/out of a car, and cars making turns. The number for each activity varies widely.

In order to obtain the Videoweb dataset, you must go to its website: http://vwdata.ee.ucr.edu/, and follow its protocol.


Performance Evaluation Methodology

Each contestant must identify some query clips from some scenes in the dataset and use them to retrieve similar scenes in the other parts of the dataset. There should be no manual labeling of the query for building classifiers from other data. However, if the contestant is able to analyze the query automatically to get more extensive training data, that is acceptable. Results should be reported as:

  • Activities used in query
  • Number of correctly identified clips for each query in the database as a percentage of the total number of retrieved clips.
  • Number of correctly identified clips for each query in the database as a percentage of the total number of such clips as per the annotation file.
A correctly identified clip is one in which the overlap in the range of frame numbers obtained by the search engine for an activity is at least 50% of the range in the annotation and not more than 150% of that range. Performance numbers should be reported as a function of the number of cameras and a result is correct only when it is correct in all the cameras. For example, an activity may be viewed in 2 out of 4 cameras. If the recognition algorithm has successfully retrieved the clip in these 2 cameras, the retrieval is successful. If the algorithm is able to retrieve in only one camera, it is successful when reporting results for 1 camera, but is unsuccessful when reporting results for 2 or more cameras. Thus the above metrics should be reported by considering 1 to N cameras. The combination of cameras chosen in each set is not a criterion at this point.

The database on which the search is run must be at least 15 minutes in length (users are free to choose which portion to work on), should not overlap with the portion used to select the queries and processing must be completely automatic. Query refinement is not considered at this stage.

Citation

If you make use of the Videoweb dataset in any form, please cite the following reference.

@misc{UCR-Videoweb-Data,
      author = "C. Ding and A. Kamal and G. Denina and H. Nguyen and A. Ivers and B. Varda and C. Ravishankar and B. Bhanu and A. Roy-Chowdhury",
      title = "{V}ideoweb {A}ctivities {D}ataset, {ICPR} contest on {S}emantic {D}escription of {H}uman {A}ctivities ({SDHA})",
      year = "2010",
      howpublished = "http://cvrc.ece.utexas.edu/SDHA2010/Wide\_Area\_Activity.html"
}