UTKinect-FirstPerson Dataset


This dataset was collected as part of research work on first person activity recognition from RGB-D videos. The research is described in detail in paper Robot-Centric Activity Recognition from First-Person RGB-D Videos


We collected two separate datasets. The first dataset is collected using Kinect mounted on top of a humanoid robot. The second dataset is collected using a non-humanoid robot. For each dataset, we invited 8 subjects, between the ages of 20 to 80, to perform a variety of interactions with our robot. We ask each subject to perform 7-9 continuous sequences of activities, each sequence contains 2-5 actions, in a few different background settings. We provide the continuous sequences of depth, rgb, and skeleton, together with the segment labels for each sequence. There are 9 action types in the humanoid robot dataset: stand up, wave, hug, point, punch, reach, throw, run, shake hands. There are 9 action types in the non-humanoid robot dataset: ignore, pass by the robot, point at the robot, reach an object, run away, stand up, stop the robot, throw at the robot, and wave to the robot.

The data is saved with Kinect for Windows SDK. You may refer to it if you want to manupulate the data, e.g. calibrate images, convert skeleton coordinates.

Dataset1 (humanoid robot): download (3.42G)

Dataset2 (non-humanoid robot): download (3.33G)

Each dataset contains 5 parts:


If you make use of the UTKinect-FirstPerson dataset in any form, please cite the following reference.

      title={Robot-centric activity recognition from first-person rgb-d videos},
      author={Xia, Lu and Gori, Ilaria and Aggarwal, J.K. and Ryoo, Michael S},
      booktitle={Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on},

If you have any problems, questions, or suggestions regarding the dataset, please contact Lu Xia

by Lu Xia 2015