About Research People Publications Contact
 
Current Research

Human Activity Recognition
The goal of this project is to construct a general methodology that is applicable for the recognition of human activities. We especially focus on the semantic-level analysis, toward recognition of high-level activities with complex temporal, spatial, and logical structure.



Human-Human Interactions
We have developed a new language-based approach for the recognition of high-level interactions between two persons. A human activity is represented by decomposing it into multiple sub-events and by specifying their necessary relationships(temporal, spatial, and logical ), and is recognized by matching the representation with input videos. Sub-events of one activity may be composed of multiple sub-events of itself, capturing the hierarchical structure of human activities. As a result, continued and recursive activities such as 'fighting', 'greeting', 'assault', and 'pursuit' of two persons are recognized.

M. S. Ryoo and J. K. Aggarwal, "Recognition of Composite Human Activities through Context-Free Grammar based Representation", Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2, pp. 1709-1719, New York, NY, 2006. M. S. Ryoo and J. K. Aggarwal, "Semantic Understanding of Continued and Recursive Human Activities", Proceedings of 18th International Conference on Pattern Recognition (ICPR), Vol. 1, pp. 379~382, Hong Kong, 2006

Human-Object Interactions
The framework is extended for the recognition of interactions between humans and multiple objects. Human activities involving objects, such as "a person stealing another's suitcase", are recognized by considering objects and their motion. Ability to probabilistically compensate for the failure of its components (object recognition for example) is given to the system for more reliable recognition.

M. S. Ryoo and J. K. Aggarwal, "Hierarchical Recognition of Human Activities Interacting with Objects", Proceedings of 2nd International Workshop on Semantic Learning Applications in Multimedia (SLAM), in conjunction with CVPR, Minneapolis, MN, June 2007.

Fence Climbing
We propose a concept of stable contact for human motion analysis. Simply speaking, we find extreme points from a human contour first. Those extreme points that don't move (with a maximum position deviation w) for a long enough period (with a minimum duration threshold t) are detected as stable contacts. We use such a concept to characterize walking, running, climbing (fence, rock) etc. The rational behind is simple. A walking is a sequence where the number of stable contacts alternates between 1 and 2, in running it is usually 0 and 1, while in climbing it has some chances to be 3 or more.

Elden Yu, J. K. Aggarwal: Detection of Fence Climbing from Monocular Video. ICPR (1) 2006: 375-378

Human Computer Interaction



Intelligent Workspaces
We constructed an intelligent environment which visually observes tasks of users to help the users complete their tasks. The system is designed to analyze the status of ongoing tasks and to generate appropriate feedback guiding the user.

M. S. Ryoo and J. K. Aggarwal, "Robust Human-Computer Interaction System Guiding a User by Providing Feedback", Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007.
Content-Based Image Retrieval



CIRES: Image Similarity Search Engine
Content-based image retrieval

Qasim Iqbal, Jake K. Aggarwal: Content-Based Retrieval in Digital Image Databases using Structure, Color and Texture. PRIS 2004: 1-2 Qasim Iqbal, Jake K. Aggarwal: Image Retrieval via Isotropic and Anisotropic Mappings, IAPR Workshop on Pattern Recognition in Information Systems (PRIS 2001), July 6-8, 2001, Setubal, Portugal, pages 34 - 49. Qasim Iqbal, Jake K. Aggarwal: Perceptual Grouping for Image Retrieval and Classification, 3rd IEEE Computer Society Workshop on Perceptual Organization in Computer Vision, July 8, 2001, Vancouver, Canada, pp. 19.1-19.4.
Tracking



Temporal-Spatio Velocity (TSV) Transform
Temporal spatio-velocity (TSV) transform extracts pixel velocitities from images sequences, and groups pixels in similar velocities into a blob. As a result, blobs are segmented ven in severe occlusion based on their velocity information, and are tracked throughout the sequence. TSV has been applied for the tracking pedestrian, cars, and soccer players.

Koichi Sato, J. K. Aggarwal: Temporal spatio-velocity transform and its application to tracking and interaction. Computer Vision and Image Understanding 96(2): 100-128 (2004)