[1] Dlib Converter 2.0: Solving Face Normalization Through Classification
Can we use dlib to classify the face of our Mindtwins with a high enough degree of accuracy to build a library of commonalities?
I need to answer these questions on a single frame:
Whose face is it?
What percentage is the mouth open, eyes open/eyebrows up/mouth width/etc. based on an existing library of data on that person?
Whose face is it?
I set out to build a classifier for dlib that would help better understand the data in a given frame. There are lots of concepts and prior research on this, so I decided to lean on a few common approaches as well as take a few undocumented/experimental concepts.
I wanted to be able to run multiple tests and variations on classification concepts in order to evaluate and score the results. Not every approach is stable in every case.
Consider measuring the distance between the eyes, for example. That measurement will work well if the face is always frontal, but the moment the person’s head turns laterally, the measurement is compromised. That’s assuming the points are correctly placed to begin with, of course.
Why not use some open source existing solution?
The needs of this classifier are pretty specific and twofold. On one hand, we need to identify a face to know which calibration library to use - except we actually always know who is in the shot. The need to identify a face only really serves as a proof of concept of the classifier. If the results of the classifier are successful, meaning that we have greater than 99% accuracy (we do), then we take the classifier to the next level to actually solve the needs set forth by this initiative: measure the bounds of the face, using known data.
How does camera distance and lens focal length affect dlib calculations?
Same person, different lenses
In the above image, we have Reggie with two different lens configurations. I don’t know which lens was used in either shot, but it’s clear that a wider lens was used on the right image and a longer lens was used on the image to the left. You can tell based on the bulbus nature of Reggie’s face on the right, compared to the left. They almost look like two different people.
So how can we produce an accurate classification of facial measurements if the measurements change based on lenses?
One way would be to fit an existing 3D mesh of Reggie using a camera solver, like solvePnP. This work has been largely explored by Jesse and Murilo, I believe, so I don’t want to spend too much time exploring this.
Another (less accurate) solution would be to take lots of different measurements and produce a score that can be compared against a known mean.
Categorize, Normalize and Cluster
Since we have some information on the clips we record, I wanted to write a classifier that would leverage that information, but only if we knew we would always have it - like ‘who’ the Mindtwin belongs to. In the case of Mark, all clips are recorded by him. That isn’t the case with Reggie, although nearly all of them were recorded by him.
Classifications
t_ratio = Classification(None, "The ratio of the distance between the eyes over the ridge height")
eye_distance_spread = Classification(None, "The distance between both eyes, normalized by ridge height")
slope_norm_right = Classification(None, "The normalized slope of the right eye")
slope_norm_left = Classification(None, "The normalized slope of the left eye")
eye_slope = Classification(None, "Eye rotation, normalized. 0.5 = even")
nose_tri_distance = Classification(None, "The distance from each eye to the nose tip, averaged")
eye_size_x = Classification(None, "The width of the eyes, averaged")
ridge_height = Classification(None, "The height of the nose from the tip to the center of the eyes")
https://en.wikipedia.org/wiki/K-means_clustering
Results & More Results with Machine Learning
With the most basic of settings, when analyzing 1081 clips (all of Mark’s, and about 1000 of Obama), the classifier was able to correctly identify 1052 clips and failed to correctly classify 29 clips for an accuracy of 97.32%. Before digging in further, why might have 29 clips failed?
Does including more dlib frames increase success?
Does smoothing the dlib points first increase success?
Does weighting the classification methods increase success?
Are any of the classification methods too unreliable to be considered?
Do any methods produce in the exact same results as other methods?
What other methods might be considered?
Let’s use machine learning to find out…
We have, at the moment, 8 classification methods. The above results assume an equal weighting of all of those approaches, but they couldn’t possibly all be equally effective. So the next step is to narrow down the results of each method to identify which ones are more successful and which ones are less successful.
Classifications: ridge_height, eye_slope, eye_distance_spread, t_ratio, nose_tri_distance, eye_size_x, slope_norm_left, slope_norm_right
Total clips: 1081
Good clips: 1052
Bad clips: 29
Success: 97.3172987974%
Classifications: ridge_height
Total clips: 1081
Good clips: 1052
Bad clips: 29
Success: 97.3172987974%
Classifications: eye_slope
Total clips: 1081
Good clips: 1064
Bad clips: 17
Success: 98.4273820537%
Classifications: eye_distance_spread
Total clips: 1081
Good clips: 1052
Bad clips: 29
Success: 97.3172987974%
WIP - To Be Continued…
T Ratio
a_job_provides_a_sense_of_place (obama) Successfully classified as obama with 100% confidence aardvark (obama) Successfully classified as obama with 87% confidence ****************************************************** aaron_hello (obama) Incorrectly classified as markiplier with 62% confidence ****************************************************** aaron_whats_up (obama) Successfully classified as obama with 100% confidence are_you_alright (markiplier) Successfully classified as markiplier with 100% confidence are_you_okay (markiplier) Successfully classified as markiplier with 100% confidence awesome_2 (markiplier) Successfully classified as markiplier with 100% confidence bring_it_on (markiplier) Successfully classified as markiplier with 100% confidence bye (markiplier) Successfully classified as markiplier with 100% confidence