“Can you identify anyone in this photograph?”
A regular feature in Queens Library’s magazine asks readers to “help us solve an archives mystery.” Under the headline, faces stare out from a photograph, captioned with whatever information the archives has–sometimes as little as a city and a decade. It’s crowdsourcing in its barest form, the Hail Mary pass of metadata, and archivists are sure to recognize the problem: unidentified subjects are dead ends.
The problem is pervasive. Archives all over the world hold historic photographs of unknown persons. Alfred Eisenstaedt’s iconic “VJ Day in Time Square” photo depicts a sailor and a nurse in an embrace, and to this day neither subject has been positively identified. Even thrift stores and flea markets have boxes of historic photos, separated from their own pasts. This missing data robs photographs of their power as sources.
How to we reconnect these faces to meaningful descriptions? Up to this point, attempts to do so have been extremely labor-intensive, relying on people’s ability to make sense of one image at a time. The sizeable collection of historic baseball photographs at the Library of Congress has brought out researchers who can sometimes crack a case with their encyclopedic knowledge of uniforms and box scores. Elsewhere, First Nations populations in Canada can follow Project Naming, an effort to restore the identities of thousands of subjects in photographs at Library and Archives Canada. Like the Queens Library feature, it’s an effort to put images in front of readers who might be able to make a connection. These projects do make a dent in the problem, but it’s slow progress. And worse, such efforts are in a race against time. As older generations disappear, they take the keys to our historic collections with them.
Is facial recognition technology a possible solution? Facial-recognition software is becoming a pervasive part of the social web, where it automates tasks such as tagging friends in photos. Most of us have experienced this technology in action, whether we knew it or not: When we post an image to Facebook and the software asks if we’d like to tag a particular person, that’s facial recognition at work. The technology is every bit as imperfect as OCR (optical character recognition) or speech recognition, but it clearly has something to offer. It’s getting good, and it’s becoming accessible to small projects. It is within our power to bring this technology into the archives.
How Does Facial Recognition Work?
Facial recognition works in three stages. First, software has to locate the faces within images. This step is usually called face detection. The most effective approach to the problem today begins by throwing out color information, then comparing sections of the black-and-white image to characteristic patterns of relative light and dark regions that typify faces: shadows in the vicinity of the eyes, the curvature of the sides of the head, and patterns of light and dark around the nose and mouth. If a region of the image conforms reasonably well to one pattern of light and dark blocks, the process goes on to to more such patterns in the same region; if we have a region that could be a pair of eyes, for instance, is there a region beneath it that could be a nose? If enough of the black and white patterns are a “close enough” match with the image’s content, we can be reasonably confident that we’ve found a region of the image that contains a face. This process can scan large images quickly, and it’s effective under a variety of lighting conditions and perspectives.
While this process can handle some variation in perspective, faces in full profile require a separate set of tests than faces in frontal views. It’s a reminder that tasks that are intuitive to human beings can be tremendously challenging to automate.
The result of this first step is a set of locations within an image where faces have been detected. With that data, the next step is to find specific landmarks within each face: the centers of the eyes, the corners of noses and mouths, the points of chins, and so on. There is a variety of methods for cataloging these points. The earliest facial recognition algorithms worked almost entirely from the distances between the centers of the eyes and the corners of the mouth, while modern solutions might count dozens or hundreds of landmarks. After correcting for perspective and scale, this step can begin to quantify the geometry that makes a face distinctive. The end result is a quantitative representation of a face.
The last step is to compare these representations. Different technologies use different systems of representation and comparison, but the answer to the question “are these the same person?” is ultimately a rating of the similarity between two representations. Ratings below a certain threshold are likely matches, and distances above are less likely to be matches. Notice that this is a spectrum, and it’s subject to all the hazards of image quality, unusual perspectives, obstructions like sunglasses and facial hair, and so on. In the end, all of this is just to let us ask the question “are these similar faces?”
Part of what makes facial recognition work in social media is the rich set of data that is available. Your Facebook friends are a finite set of named individuals, and each has dozens if not hundreds of photos from which to build a confident match. Every selfie snapped and posted to social media helps improve the technology. A 2005 white paper on the FaceNet technology employed by Google boasts over 97% accuracy in recognizing faces in a standardized dataset. But this dataset is built of modern-day images, with many people represented in dozens of high-quality images, including a variety of perspectives, lighting situations, and so on. Clearly, the same breadth of data is not available in most archival collections. Often we’re lucky if two images of a subject exist at all, and with photographs, prints, and scans of varying quality, archival images are of less consistent quality than even cell phone snapshots today.
It may come as a surprise that color, one of the most obvious changes in photography in the past century, is mostly irrelevant to facial recognition. While the human eye is good at adjusting to new lighting situations, color is wildly inconsistent from one photograph to another. The prevailing solutions to facial recognition problems work with the broad contrasts of light and dark to make guesses about the geometry of faces, so face detection begins by converting an image to black and white. This is a rare piece of good news for historic images, which are monochrome to begin with.
Facial recognition technology can also benefit from machine-learning technology where developers build stronger statistical models as they build larger datasets of known matches. This process of training the software to reject false positives and false negatives depends on quality metadata, and social media offers an enormous dataset of known, human-verified matches between faces in images. In many cases, when you click a button to confirm an automatic suggestion, you are helping to refine the statistical model. So, the technology we see in social media is tuned to a very particular kind of dataset.
Putting it to Work in the Archives
Can this technology help us with historical photos? The answer is a qualified yes, but let’s start from the beginning.
I began by researching the state of open-source facial recognition. There is competition between proprietary solutions, but there is also a strong community of open-source developers working on the problem. The two camps share code, projects, and people, so the difference between open-source and proprietary solutions is a difference of components, not of foundations. A number of solutions build on OpenCV2, a more general toolkit for computer vision tasks like detecting objects and geometric features in images. OpenBR, a library of code for biometrics applications, includes well-regarded facial recognition features. OpenFace, another facial recognition library, uses sophisticated machine-learning technology and achieves very good benchmarks against standard datasets. It also integrates readily with Python, a scripting language popular with academics for its wide compatibility and ease of use. After a few days of experimenting with the available software, I made a fresh 64-bit Linux install on a laptop, designed a simple MySql database to store my data, and started compiling OpenFace.
Obviously, I wasn’t the first person to take on the problem. A search feature on the BBC website applies facial recognition to help users search its database of photographs for particular people. Myheritage.com, a genealogy-focused website, offers users the ability to upload images of themselves and search for similar faces in a database of historic portraits. A project at UC Riverside is even exploring facial recognition as a way to add analyze paintings and sculptures. The Invisible Australians project, led by Drs Kate Bagnall and Tim Sherrat, also deserves mention. While it works through face detection rather than facial recognition, the project specifically addresses questions of adding value to the archival record by putting faces in front of researchers.
For my project, I started with a set of my own family vacation photos, then when I was confident I had a handle on the way the software would behave, I set out to find some real-world data to work with. The Washington State Archives, Digital Archives, where I was interning at the time, makes thousands of photographs available to the public through its website. These represent everything from recent professional photography to scrapbook pages. And while many of these images feature unidentified subjects, they have all been professionally cataloged, so every image has at least some metadata.
I was immediately drawn to a series of yearbook-style photographs from the Washington State House of Representatives dating all the way back to 1889. Since these were faces with names attached and were certain to feature many subjects in multiple years, it made a convenient dataset for testing. I downloaded images from a handful of years, and ran it through the simple scripts I had written to detect and compare faces. By querying my SQL database of information about the images and the faces within them, I could verify that some of the known positive results were identified correctly, and that many of the non-matching faces were likewise correct. There were a lot of incorrect comparisons, too. The low resolution of the scanned images and the quality of the original 1890s photographs were probably contributing factors. So, I suspect, were the gigantic, face-obscuring mustaches that were popular among Washington’s early legislators–one example of the way the technology is conditioned to twenty-first century norms.
Of the seventy-seven faces in the 1899 house of representatives photo page, sixty-six were correctly identified as faces (two of the undetected faces were in profile, which I wasn’t trying to detect.) The face detector also mis-identified three non-face objects as faces. Of the correctly detected faces where matching faces were detected, seven were correctly matched (true positives,) and twenty-seven were incorrectly matched (false positives.) In the algorithm’s defense, I should mention that I couldn’t tell representative Chrisman of 1899 legislature from representative Kohler of the 1897 legislature, either. From that small sample of low-quality images, that’s still a 21 percent rate of correctly finding a match. These early results were mixed, but it was encouragement enough to keep going.
The digital archives is also home to the A.M. Kendrick Collection. Kendrick was a photographer who documented life in the farm towns of the Columbia basin for more than 50 years, and the collection contains thousands of photographs. The city of Ritzville, Washington is a particular focus. This collection makes a good test case for a number of reasons: Ritzville had relatively little migration during the decades pictured in the collection, and it was a small enough community that the networks of acquaintanceship might be well-represented in the photographs, which are heavy on group portraits from social events. The same people appear across multiple photographs–in a class photo, for instance, then as a bus driver a decade later, and in a wedding photograph a few years after that. It also helps that Kendrick himself was a skilled photographer who produced high-quality prints, and that the collection was digitized at high resolutions. In many cases, Kendrick had also donated several different takes of the same portrait. These images, like the state house yearbooks, made for a convenient sample of known positive matches. I downloaded a selection of group portraits from the collection and applied the same process I had applied to the early Washington state house yearbooks, and found much better rates of correct identification. Then, with help from the staff at the digital archives, I collected group portraits from the collection in bulk. Soon, my dataset grew from hundreds of faces to thousands.
Designing the Archival Facial Identification Database
By this time I was also starting to build a user interface around the database I was generating. The dataset is inherently visual, and the end goal was to produce something for researchers, not just programmers. One of the first features of the interface, inspired by Bagnall and Sherrat’s Invisible Australians project, was simply a page of faces. (You can see it on the live site by clicking the Browse Faces link in the menu bar.) As part of the face detection process, I was extracting a cropped image of each face at a standardized size. Browsing the collection as a set of faces, rather than of photographs in their original context, was a surprising reorientation of the dataset–to work with faces is ultimately to work with people, and it is a strange experience to look at the faces of hundreds of unknown subjects. The experience brought into sharper focus the challenge of placing these people at the center of the research process.
My other design requirement was that the application should be able to ingest new images or collections with minimal preparation. Behind the scenes, the application is built from the ground up for incremental additions. As you can see from the sample dataset, the A.M. Kendrick Collection and the 1890s Washington state legislators’ yearbooks can live side by side in the same dataset. (Whether it adds research value is another question.) By the nature of the application, the number of comparisons grows geometrically with the number of faces, and this becomes the major limitation to the overall size of the dataset. Building a global database of archival subjects with billions of potential comparisons isn’t in the scope of this particular design. But running on a $200 laptop, I can add new images to the dataset and run comparisons to more than three thousand identified faces in a matter of minutes.
The most important design challenge was a user interface that would let researchers browse the collection by those human identities. I began by making the faces within each portrait clickable links, then built up a working interface that would let users navigate from one image to another. In the current design, when a user clicks a face within a full image, all of the likely matches to that face are displayed, as cropped images of faces, to the side of the source image. This list is sorted by the strength of a match, so when there are multiple possible matches, the likeliest will appear at the top of the list. The user can then click any one of those faces to navigate to the full image in which it appears. For an example, a researcher could navigate to a portrait from a wedding, click a face, then get a list of similar faces. Clicking one of those faces would then navigate to another full image where the same subject appeared. My hope is that the interface encourages wandering. Technically, it’s not so different than navigating an archival collections by tags or subjects headings. But following an individual person through the network of his friends and neighbors carries an element of personal connection.
There are clear false positives, too. The quality of matches, in general, correlates with the quality of the source images, but sometimes the algorithms simply mis-identify faces. Alongside these cases, though, there are a number of comparisons where we simply can’t be sure. The software is intended to spot similar faces, which is not necessarily the same thing as finding a person. In the Ritzville data, a number of matches could be the same person at different ages, could be close relatives, or could be pure coincidence. The facial recognition feature on MyHeritage.com, among other applications, works from the assumption that people with similar faces are likely to be related. While I’m not aware of any systematic test of this assumption, it passes the test of common sense. A researcher, in any case, might want to make that determination for herself. So rather than applying a threshold for match quality that would eliminate false positives, I aimed for a middle road. Real people are still better at identifying faces than software is (most of the time,) so if a researcher gets to decide which matches are meaningful from a set of likely matches, I consider that a success.
And this is one important place where my project pulls away from what Facebook or that facial search feature on the BBC site are doing. In those applications you’re trying to connect an image to a known identity: A name, or a Facebook friend. Which is a good idea in those applications, because it lets you make lots of assumptions before they have to do the more processor-intensive work—you’re comparing faces within a small, well-defined set. In archives, though, that’s limiting. In a donated collection, we might not know who any of the subjects are at all. So instead of trying to match the faces to names, my hope was to widen the field and compare all of the faces to each other. That’s heavy data: in this slice of the A.M. Kendrick collection, the thousand or so faces make for about seven million comparisons (the vast majority of which are clear non-matches.) But what we get is a tremendous set of information about context. It’s the whole network of people showing up in each other’s family picnics, weddings, legion halls, and anything else where someone snapped a picture. When all of these isolated photos start to link up, we can begin to see the social networks behind the pictures. It’s contextual metadata, and it’s a huge leap ahead of just handing a researcher a box of photos.
Unlike the labeled images in the 1890s state house yearbooks, the people in this collection are mostly unnamed, so there’s no definitive way to measure the accuracy of matches. But there’s no question that the results are stronger. In one set of images, a photographer donated ten different shots from one group portrait session at a Ritzville church. In this one (admittedly low-hanging) test case, all seventeen subjects are correctly matched across all ten of the images. The dataset contain a great number of true positives. It also includes several intriguingly similar faces, which might be relatives or a subject photographed at different ages. Importantly, many comparisons catch a subject in very different photographic settings: a family portrait and a candid outdoor snapshot, for instance. The software succeeds in connecting the faces between images, sometimes creating metadata about people where none existed before.
That’s not to say that these subjects have to remain nameless. In the bulk downloads I pulled from the collections, I found a handful of scanned high school yearbooks, which the software dutifully scanned and compared to the rest of the collection. These images make powerful nodes in a web of interconnected images: suddenly, the woman on the right in the wedding photo might have a name, a year of graduation, and a whole roster of classmates. It’s a clear example of the potential to bring existing metadata together with orphaned images.
In the interest of assessment, I have left all of the mis-identifications and other false findings in the dataset. There are plenty to see, from objects mis-identified as faces to matches that are clearly not the same person. As I noted elsewhere, image quality can be a factor. The algorithm performs especially poorly with small children, too. Characteristics that are intuitive to the human eye, like age or gender, might fall completely outside the software’s metrics. For a public-facing application, archives staff could groom data before providing access to it.
Facial recognition software alone cannot solve the problem. 1930s Ritzville is not Facebook, and the problems of rescuing indigenous heritage, just for one example, go beyond data processing. And the results, from this very preliminary project, are far from the confidence we see in top-tier applications like Facebook. But for all its imperfections, this dataset offers ample cause for optimism. Facial recognition software can give photographs context, which can open doors to new research directions.
There are several ways to improve the facial recognition itself. Developers could use existing metadata to rule out unlikely matches based on date, location, or other factors. Just like in social media applications, the statistical models would also benefit from more human input. The existing application uses a pre-built statistical model, trained on the Labeled Faces in the Wild dataset. If we could create a way to put possible matches in front of human beings who could then judge the quality of the matches, we would produce very valuable data. As archival crowd-sourcing projects go, it would be relatively straightforward to design and to use. We might invite volunteers to help us “catch time travelers,” for instance. The results I have now are by no means the best that can be done with existing technology.
In the meantime, there are several things archivists can do to have a seat at the table and help steer the development of this technology. First and most fundamentally, facial recognition can only multiply the value of the metadata we have. The value of facial recognition will live or die on the strength of our metadata. Technologies like this should make us more mindful of our cataloging practices, not less so. Any search through a poorly cataloged collection of OCR text will demonstrate the danger expecting a computer to do an archivist’s job. If you have names, dates, and other data elements, keep them, keep them standardized, and keep them close to the images.
I have said that current facial recognition technology has grown to meet the needs of social media. It might be even more accurate to say it has grown to meet the needs of Labeled Faces in the Wild, a standardized dataset of 13,233 images of known subjects. Published by the University of Michigan, Amherst since 2007, the dataset has become a widely-used benchmark for new facial recognition software, since developers can use it to quickly compare their findings against a known list of correct matches. All of the images were collected from the internet, many are professional photographs of celebrities, and all are known to be detectable by a widely-used face detection algorithm. There is no definitive list of years when the images were taken, but few of the subjects were born before 1950. Where the images represent history, it is recent history captured on modern photographic technology. The dataset is a valuable tool for developers, and it has helped to keep a diverse palette of faces at the center of this technology. We would be wise to start talking about a Labeled Faces in the Archives dataset to complement it. Such an effort would make it much easier for developers to take on the problems of facial recognition in historic images. It could make the difference between a realistic graduate project and an unfeasible one.
Lastly, the ultimate value of this technology will depend on our ability to reach across archives and make connections between separate collections. That won’t happen without digitization, online access, and sharing. It might be worthwhile to discuss a standard system for comparing face representations through APIs, or it might be more manageable to design a centralized database of facial-recognition data for digital archives in general. In any event, achieving the best results will require that software engineers and archivists work closely together, and bring a willingness to blur the boundaries between their professional specializations.
What I have built so far is only to test the feasibility of this project. As interesting as the current dataset may be, it is a sample of convenience–To my knowledge, there are no burning questions about identities in the A.M. Kendrick collection. My hope for the project is that in the right hands, it might suggest new solutions to real-world research problems. Because the archives are full of them. Just like OCR or voice recognition, facial recognition has the power to transform the way we provide access to our collections. And just like those technologies, is only as good as the data it can provide to researchers. With more testing and a little luck, we might someday bring much more powerful tools to bear on the question “do you recognize anyone in this photograph?”