Do you know how face recognition works in Python? How does python software development help in face detection and recognition? To get the answer, read this post completely.
In the last few years, we have been seeing an explosive growth of python in several fields – web application development, REST API creation, automation of various processes (both technical and non-technical), financial assets management, and so on. Apart from these, Python has also become a de facto standard in building applications with some sort of pattern recognition capabilities. Two examples of this type of application are vehicle number plate detection/reading and face detection/recognition.
Technology in these fields has come a long way now, and even though I wouldn’t call them mature, they are certainly good enough to deliver high-quality results in certain situations. In this article, I am concentrating on evaluating a few such libraries, and then I will conclude it with a working example for facial recognition using Python software development.
The primary libraries that come to my mind when I consider the domain of facial detection and recognition (in Python) are OpenCV, face_recognition, and facenet. While all of them use some sort of deep learning technology (for example, OpenCV uses Tensorflow), the easiest to use is face_recognition. Of course, the list certainly doesn’t end here, but in my experience, quite a large number of face recognition solutions are based on OpenCV (or one of its variants).
Before we see the capabilities of OpenCV, let’s first take a look at its history.
So, let us consider OpenCV. This library has been there for a pretty long time. Intel Research launched it in 1999 to use it for some of their projects, and they made it open source so that a community can grow around it, which will aid in advancing this initiative to subsequent levels. It is essentially a C++ library, and most programmers who used it in their early days were very strong C++ buffs. However, that is not the case anymore.
Python has made its way through this wonderful tool, and we now have python modules (both in 2.x as well as in 3.x), and creating and manipulating images is almost child’s play with this module (as long as the ‘child’ knows python). For example, let us consider the case of breaking up a video in terms of its frames. Here is a little python script that does just that:
To run the above program, you need to find an mp4 video file and replace it with the video file I am using in the script above. Also, you would need to install the OpenCV library (if you are using Ubuntu or Debian, you may use apt-get to install OpenCV to get the library, in the case of Centos, use yum). Then, you would also need to install the Python extension for OpenCV, which you should be able to do using pip.
The next program I’m going to demonstrate is a bit more complex, and what it does is face identification using the face_recognition module of python. Firstly, you need to install the module named “face_recognition” using the command ‘pip install face_recognition’. I would strongly suggest that you do this in a python virtual environment. The ‘face_recognition’ module uses the dlib library, which is a pretty decent library as far as the accuracy of recognition is concerned. The accuracy rate of OpenCV is not so good, and in my experience, you need to ascertain lighting conditions and image quality are good if you need to have a match using OpenCV.
Before we get into the code, let us just go through the basics of how facial recognition is done using dlib.
What happens is we first need to train the system with some reference faces of the person(s) who we are trying to identify. How does it work behind the scenes? Well, what happens is that when you supply the program with sample photographs, it identifies 128 points on the face of each individual whose photograph you have provided as a sample. When you provide the program with a face that has to be matched with one of the sample images, it identifies the same 128 points on the face presented in the test photograph.
These points are called encodings and they are stored as NumPy arrays. Next, the program computes the Eucledian distance between the given points in the 2 images (the sample and the test photograph), and provides a value between 0 and 1. A value of ‘0’ means an exact match, whereas ‘1’ means no match at all. This value is called the threshold value, and lower values are better matches. Since the computation yields a single numeric value, it is upto the person trying to match images to find out whether she/he would like to consider the value to be a match.
In my experience, a threshold value below 0.4 may be considered a probable match. However, one point to note here is that this method also yields (quite) a few false positives. For a person operating such a system, the program can be written in such a way that matches with a threshold value greater than 0.4 and will not be listed. It will allow the operator to concentrate on a small subset of images that are likely to match, thereby making the task easier.
Well, enough talk above, so let us get some code here. We will go through the important lines in detail, so don’t worry about it. Implementations of some functions are not displayed here for brevity, but you would be able to understand what they do from the explanations I provide and their names.
In the above code, we store the face parameters in pickled files (which is not a good idea if you want to have it scaled at some point in time. The idea here is to describe the process. And hence, I have stored them in a pickle as I find that to be a convenient method for demonstration). We pick up the unpickled data from the function “ read_pickled_data”, and we iterate through the entries in it, which contain some metadata (like image name, category/subcategory to which this image belongs, etc.) as well as the encoding of the actual image in question. The main part comes after this. We append the encodings to a list named “ all_known_encodings”, which is then tested with the target “requested_encoding” (which was defined earlier in the code which we have not shown here for brevity). The line
It is the crux of the program, and it defines how the requested image encoding matches with any of the encodings in “all_known_encodings”. To understand the following lines, you should have some understanding of how NumPy works. What happens in the subsequent lines is that a list of “indices” is found, and they are iterated over, and every image in the list is given a score based on the similarity of the features of the faces. We have considered the top 5 images here since the subsequent images will possibly not have a significant similarity.
Finally, we do some bookkeeping and store the matches as a JSON string in the variable “matched_data”.
I would suggest that you take a look at the NumPy documentation to understand what I described above since, without that knowledge, it probably won’t make much sense. I intend to write a blog on NumPy shortly, and I hope it will give you an idea of how NumPy works and why we need it in cases such as the one mentioned above.
How facial recognition works (and how to hack your own in Python)https://t.co/a9eCx6Zx64 pic.twitter.com/EGtgN3wrgt— son of an asylum seeker, father of an immigrant (@doctorow) July 25, 2016
WFirst of all, this 128-point scheme of face recognition has its own issues. The primary one is that it yields a lot of “false positives”. This means it will match images that may look very different to the human eye (and hence they are no match at all), but since the data fits the conditions imposed by the algorithm, they show up as matches. It is a serious concern for government law enforcement agencies, where they pick up images of thousands of individuals and try to match them with a set that may contain the similar number of images.
It becomes impossible to look at the report of every match and find out whether the results are correct or not. Of course, it is easier than manually checking every image and comparing them to the target set of images. It is still a difficult job. Hence, over the years, several improvisations have taken place. Notably, keeping the threshold value at 0.2 (or somewhere near it) does help this scenario.
I have a hypothesis, but I haven’t tested it yet. What if we took more points at certain parts of the face (like the eyes, ears, nose, and lips) and computed the distances between them. And then, we compare the ratios of these corresponding points with other target faces. Scientifically, the distance between 2 points on one of those above-mentioned features of the human face does change with time. But the ratios would remain the same (may not be the same, but within a 0.1% accuracy, which is fair enough). When I have some data and code to prove (or disprove) this point, I’ll explain it in more detail in a subsequent post.
We are moving towards an age when somebody will watch us almost 24 hours a day, and somebody will log our activities on a server somewhere in this world. While that would possibly make this world a more secure place, we will be compromising on privacy. But there is a silver lining there. The entities that will look at our data would be machines (mostly), unless we go and rob a bank or kill a fellow human being using whatever technique and for whatever reason. Such culprits would be easier to find and trace. And that is a huge bonus for the human race in general.
To get something, we need to lose something. In this case, we will lose privacy but will gain security. I feel it is a good thing, but I am sure several people think something else. However, that time has not yet arrived, and till, let’s just enjoy the remaining days of our freedom from the watchful eyes of those machines.