February 23, 2020

Deep learning based human pose estimation

Pose estimation is calculated by using computer vision to detect the position and orientation of an object. This usually means detecting key point locations that describe the object.

Pose estimation is calculated by using computer vision to detect the position and orientation of an object. This usually means detecting key point locations that describe the object. For example, in the example of face pose estimation (a.k.a facial landmark detection), we detect landmarks on a human face. A related example is head pose estimation where we use the facial landmarks to obtain the 3D orientation of a human head with respect to the camera.

In this article, we will focus on human pose estimation, where it is required to detect and localize the major parts/joints of the body ( e.g. shoulders, ankle, knee, wrist etc. ). Remember the scene where Tony stark wears the Iron Man suit using gestures? If such a suit is ever built, it would require human pose estimation! For the purpose of this article, though, we will tone down our ambition a tiny bit and solve a simpler problem of detecting keypoints on the body. A typical output of a pose detector looks as shown below :

Figure 1 : Sample Skeleton output of pose estimation. Image Credit: Oliver Sjöström, Instagram: @ollivves, Website: https://ollivves.com

Keypoint detection datasets

Until recently, advancement in pose estimation has been challenged because of the lack of high-quality datasets. Such is the enthusiasm in AI these days that problems that would not have been addressed are now within reach. Exciting new datasets have been released in the last few years which have made it easier for researchers to attack wider opportunities with all their intellectual might.

Some of the datasets are :

COCO Keypoints challenge
MPII Human Pose Dataset
VGG Pose Dataset

In short the more images a system sees the better and more intelligent it gets.

2. Multi-person pose estimation model

The model used in this tutorial is based on a paper titled Multi-Person Pose Estimation by the Perceptual Computing Lab at Carnegie Mellon University. The authors of the paper train very deep neural networks for this task. Let’s briefly go over the architecture before we explain how to use the pre-trained model.

The model takes as input a color image of size w × h and produces, as output, the 2D locations of keypoints for each person in the image. The detection takes place in three stages :

Stage 1: The first 10 layers of the VGGNet are used to create feature maps for the input image.

Stage 2: A 2-branch multi-stage CNN is used where the first branch predicts a set of 2D confidence maps (S) of body part locations ( e.g. elbow, knee etc.). Given below are confidence maps and Affinity maps for the keypoint Left Shoulder.

Figure 3 : Showing confidence maps for Left Shoulder for the given image

The second branch predicts a set of 2D vector fields (L) of part affinities, which encode the degree of association between parts. In the figure below part affinity between the Neck and Left shoulder is shown.

Pose Estimation Affinity Map Left Shoulder

Figure 4 : Showing Part Affinity maps for Neck – Left Shoulder pair for the given image

Share:

Stay in Touch

Subscribe or updates and get them direct in your email

Cart

Deep learning based human pose estimation

Keypoint detection datasets

2. Multi-person pose estimation model

Pre-trained models for human pose estimation

Leave a Reply Cancel reply

Facebook

Instgram

Email

Related Posts

Drowning