您的瀏覽器不支援JavaScript語法,網站的部份功能在JavaScript沒有啟用的狀態下無法正常使用。

中央研究院 資訊科學研究所

活動訊息

友善列印

列印可使用瀏覽器提供的(Ctrl+P)功能

學術演講

:::

Self-Supervised Learning of 3D Human Pose; Recovering full 3D from Partial views using Optical Flow and Capsule Networks

  • 講者Chinghang Chen 研發科學家 (Amazon)
    邀請人:廖弘源
  • 時間2023-11-27 (Mon.) 10:00 ~ 12:00
  • 地點資訊所新館101演講廳
摘要
Could 3d human pose be learned from 2d pose alone? We tackle this by a self-supervised learning approach to recover 3D human pose from 2D skeletal joints extracted from a single image. Without using multi-view image data, or 3D skeleton correspondences between 2D-3D points, a lifting network accepts 2D landmarks as inputs and generates a 3D skeleton prediction. During training, the recovered 3D skeleton is reprojected on random camera viewpoints to generate new ‘synthetic’ 2D poses. By lifting the synthetic 2D poses back to 3D and re-projecting them in the original camera view, we can define self-consistency loss both in 3D and in 2D. The training can thus be self-supervised by exploiting the geometric consistency of the lift-reproject-lift process. Also, self-consistency alone is not sufficient to generate realistic skeletons, however adding a 2D pose discriminator enables the lifter to output valid 3D poses. Additionally, to learn from 2D poses ‘in the wild’, an unsupervised 2D domain adapter network was introduced to allow for an expansion of 2D data. This improves results and demonstrates the usefulness of 2D pose data for self-supervised 3D lifting.
Humans are able to infer full 3D shape of an object even with a partial view using learned priors. Single image 3D reconstruction approaches similarly attempt to learn shape priors to recover 3D shape using supervised data. These approaches have challenges in terms of generalizing to un- seen classes and different domains, since they are primarily trained on synthetic data on a limited number of classes. We propose a network that uses optical flow as input to recover partial 3D structure and utilizes capsule network to take partial 3D shapes and then completing full 3D shapes. This approach improves generalization over un- seen classes and can account for domain shifts during inference.
BIO
Chinghang Chen is an applied scientist at Amazon, his focus is on computer vision and machine learning. He graduated from MSCV program of Carnegie Mellon University, and was advised by Deva Ramanan. Earlier, he was a research assistant at Institute of Information and Science, Academia Sinica, advised by Hong-Yuan Mark Liao. His research interests include vision-based 3D reconstruction, NAS(neural architecture search), efficient neural network, and object detection.