This dataset was collected as part of research work on first person activity recognition from RGB-D videos. The research is described in detail in paper Robot-Centric Activity Recognition from First-Person RGB-D Videos
We collected two separate datasets. The first dataset is collected using Kinect mounted on top of a humanoid robot. The second dataset is collected using a non-humanoid robot. For each dataset, we invited 8 subjects, between the ages of 20 to 80, to perform a variety of interactions with our robot. We ask each subject to perform 7-9 continuous sequences of activities, each sequence contains 2-5 actions, in a few different background settings. We provide the continuous sequences of depth, rgb, and skeleton, together with the segment labels for each sequence. There are 9 action types in the humanoid robot dataset: stand up, wave, hug, point, punch, reach, throw, run, shake hands. There are 9 action types in the non-humanoid robot dataset: ignore, pass by the robot, point at the robot, reach an object, run away, stand up, stop the robot, throw at the robot, and wave to the robot.The data is saved with Kinect for Windows SDK. You may refer to it if you want to manupulate the data, e.g. calibrate images, convert skeleton coordinates.
Dataset1 (humanoid robot): download (3.42G)
Dataset2 (non-humanoid robot): download (3.33G)
Each dataset contains 5 parts:
(a) RGB images(.jpg), the resolution is 480x640.
(b) Depth images(.png) the resolution is 320x240.
(c) Calibrated depth images is also provided for your convenience(.png), the resolution is 320x240. This is the calibrated depth image so that it is aligned with the (half-sized) color image.
(d) Sketetal joint Locations (.txt). Each row contains the data of one frame, the format is: frame number, frame count, skeletonId, (x,y,z) locations of joint 1-20. The x, y, and z are the coordinates relative to the sensor array, in meters. Detailed description of the coordinates can be found here The index of the joints are described here.
(e) Labels of action sequence (.txt)
If you make use of the UTKinect-FirstPerson dataset in any form, please cite the following reference.@inproceedings{xia2015robot,
      title={Robot-centric activity recognition from first-person rgb-d videos},
      author={Xia, Lu and Gori, Ilaria and Aggarwal, J.K. and Ryoo, Michael S},
      booktitle={Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on},
      pages={357--364},
      year={2015},
      organization={IEEE}
}If you have any problems, questions, or suggestions regarding the dataset, please contact Lu Xia