NIST Evaluates Face Recognition Software’s Accuracy for Flight Boarding

The most accurate face recognition algorithms have demonstrated the capability to confirm airline passenger identities while making very few errors, according to recent tests of the software conducted at the National Institute of Standards and Technology (NIST).

The findings, released today as “Face Recognition Vendor Test (FRVT) Part 7: Identification for Paperless Travel and Immigration,” focus on face recognition (FR) algorithms’ performance under a particular set of simulated circumstances: matching images of travellers to previously obtained photos of those travellers stored in a database. This use of FR is currently part of the on-boarding process for international flights, both to confirm a passenger’s identity for the airline’s flight roster and also to record the passenger’s official immigration exit from the United States.

The results indicate that several of the FR algorithms NIST tested could perform the task using a single scan of a passenger’s face with 99.5% accuracy or better—especially if the database contains several images of the passenger. Patrick Grother, a NIST computer scientist and one of the report’s authors commented:

“We ran simulations to characterise a system that is doing two jobs: identifying passengers at the gate and recording their exit for immigration. We found that accuracy varies across algorithms, but that modern algorithms generally perform better. If airlines use the more accurate ones, passengers can board many flights with no errors.”

Previous FRVT studies have focused on evaluating how algorithms perform one of two different tasks that are among FR’s most common applications. The first task, confirming that a photo matches a different one of the same person, is known as “one-to-one” matching and is commonly used for verification work, such as unlocking a smartphone. The second, determining whether the person in the photo has a match in a large database.

This latest test concerns a specific application of one-to-many matching in airport transit settings, where travellers’ faces are matched against a database of individuals who are all expected to be present. In this scenario, only a few hundred passengers board a given flight. However, NIST also looked at whether the technology could be viable elsewhere in the airport, specifically in the security line where perhaps 100 times more people might be expected during a certain time window. As with previous studies, the team used software that developers voluntarily submitted to NIST for evaluation. This time, the team only looked at software that was designed to perform the one-to-many matching task, evaluating a total of 29 algorithms.

Among the report’s findings are:

The seven top-performing algorithms can successfully identify at least 99.5% of passengers the first time around if the database contains one image of a passenger. If the database contains a single image of each individual, the study shows that for as many as 428 of 567 simulated flight boarding processes, with each flight carrying 420 passengers, the most accurate FR algorithm can identify passengers for boarding without any false negatives (meaning the software fails to match two images of the same person). Stated in terms of error rates, this corresponds to at least 99.87% of travellers being able to board successfully after presenting themselves one time to the camera. Six additional algorithms give better than 99.5% accuracy.
Performance improves dramatically if the database contains multiple images of a passenger. The database gallery can contain more than one image of a single passenger. When an average of six prior images of a passenger are in the gallery, then all algorithms realise large gains: The most accurate algorithm will check the identities of passengers on 545 of 567 flights without any errors, and at least 18 developers’ algorithms are effective at identifying more than 99.5% of travellers accurately with a single presentation to the camera.
Demographic differences in the dataset have little effect. The team explored differences in performance on male versus female subjects and also across national origin, which were the two identifiers the photos included. National origin can, but does not always, reflect racial background. Algorithms performed with high accuracy across all these variations. False negatives, though slightly more common for women, were rare in all cases.

Grother said that the study does not address an important factor: the sort of camera that an FR system uses. Because airport environments differ, and because the cameras themselves operate in different ways, the report offers some guidance for tests that an airline or immigration authority could run to complement the NIST test results. Such tests would provide accuracy estimates that reflect the actual equipment and environment where it is used.