Antal.Ai: Innovative AI Solutions & Computer Vision Expertise

FACIAL LANDMARK DETECTION

Brief Introduction

This solution provides the following:

These are specially trained dlib shape predictors with 1-2 ms inference time and 1-5 average pixel error.
They are able to return with high accuracy the points of the face, eye points, iris and even pupil contour points.

The example below shows how the system fits landmark points to the pupil. The person in the video suffers from a medical condition that makes his right pupil much larger than his left. This video shows how accurate my system can be.

Measuring pupil dilation can be used to improve emotion recognition or to measure cognitive workload.

This can be used for driver monitoring systems, but also for user experinece tests.

Technical details

Input (video or image to process, capable of processing):

mjpeg stream
rtsp stream
USB camera devices
video files (avi, mp4, mkv formats supported)
standalone image files (.png, .jpg formats supported)

Outputs:

Processed video frame
The faces in the frame (boinding boxes)
For each face:
- Unique Tracking ID (when processing a video file, the same ID on each frame belongs to the same person)
- 5 basic facial landmark points
- 14 other facial landmark point
- 4 eye lid landmark point
- 4 iris landmark point
- 4 pupil landmark point
The system is able to to write the processed video to a video file.

Face landmark detector:

Avarage sample error: 5.58927 pixel
Inference time using CPU: 2 ms (on HP Laptop 15-DA0042NH (Processor: Intel(R) Core(TM) i7-8550U CPU))
return these points:
0. the tip of the nose
1. right corner of right eye
2. upper eye lid of right eye
3. left corner of right eye
4. lower eye lid of right eye
5. center of right iris
6. right corner of left eye
7. upper eye lid of left eye
8. left corner of left eye
9. lower eye lid of left eye
10. center of left iris
11. right corner of the mouth
12. upper part of the lip
13. left corner of the mouth
14. lower part of the lip

Eye landmark detector:

Avarage sample error: 1.52895 pixel
Inference time using CPU: 2 ms (on HP Laptop 15-DA0042NH (Processor: Intel(R) Core(TM) i7-8550U CPU))
return these points:
0. right corner of the eye
1. upper eye lid
2. left corner of the eye
3. lower eye lid
4. right side of the iris
5. upper part of the iris
6. left side of the iris
7. lower part of the iris
8. right side of the pupil
9. upper part of the pupil
10. left side of the pupil
11. lower part of the pupil

The demo video was recorded on a HP Laptop 15-DA0042NH (Processor:& Intel(R) Core(TM) i7-8550U& CPU, RAM: 8 Gb).
It used 600 Mb RAM and the CPU usage was 50% during the recording.
The input video was captured using a Xiaomi CMSXJ22A web camera. The input resolution was 1080p.
During recording, the system processing speed was about 55 FPS. When processing a single face, the system can maintain this speed on this hardware. When processing multiple faces, the system may be slower. The visualization was added to the video afterwards. The visualization in the video can be done live, but may slow down processing.

The system is written entirely in C++ and uses the following libraries/technologies:

OpenCV
Dlib