Measuring Biometric Unlock Security

Today, biometric-based unlock modalities are evaluated almost solely on the basis of False Accept Rate (FAR), a metric that defines how often a model mistakenly accepts a randomly chosen incorrect input. While this is a useful measure, it does not provide sufficient information to evaluate how well the model stands up to targeted attacks.

Metrics

Android 8.1 introduces two new metrics associated with biometric unlocks that are intended to help device manufacturers evaluate their security more accurately:

  • Imposter Accept Rate (IAR): The chance that a biometric model accepts input that is meant to mimic a known good sample. For example, in the Smart Lock trusted voice (voice unlock) mechanism, this would measure how often someone trying to mimic a user's voice (using similar tone, accent, etc) can unlock their device. We call such attacks Imposter Attacks.
  • Spoof Accept Rate (SAR): The chance that a biometric model accepts a previously recorded, known good sample. For example, with voice unlock this would measure the chances of unlocking a user's phone using a recorded sample of them saying: "Ok, Google" We call such attacks Spoof Attacks.

Of these, IAR measurements are not universally useful for all biometric modalities. Consider fingerprint for example. An attacker could create a mold of a user's fingerprint and attempt to use that to bypass the fingerprint sensor, which would count as a spoof attack. However, there isn't a way to mimic a fingerprint that would be accepted as the user's - and so there's not a clear notion of an imposter attack against fingerprint sensors.

SAR, however, works for every biometric modality.

Example attacks

The table below lists examples of imposter and spoof attacks for four modalities.

Modality Imposter Attack Spoof Attack
Fingerprint N/A Fingerprint + Fingerprint mold
Face Trying to look like the user High-res photo, Latex (or other high quality) face masks
Voice Trying to sound like the user Recording
Iris N/A High-res photo + contact lens

Table 1. Example attacks

See Test methodology for advice and more details on methodologies to measure SAR and IAR for different biometrics.

Strong vs. weak unlocks

The bar for an unlock to be considered strong is a combination of the three accept rates - FAR, IAR, and SAR. In cases where an imposter attack does not exist, we consider only the FAR and SAR.

See the Android Compatibility Definition Document (CDD) for the measures to be taken for weak unlock modalities.

Test methodology

Here we explain considerations and offer advice regarding test setups to measure spoof (SAR) and imposter acceptance rates (IAR) for biometric unlock modalities. See Metrics for more information on what these metrics mean and why they're useful.

Common considerations

While each modality requires a different test setup, there are a few common aspects that apply to all of them.

Test the actual hardware

Collected SAR/IAR metrics can be inaccurate when biometric models are tested under idealized conditions and on different hardware than it would actually appear on in a mobile device. For example, voice unlock models that are calibrated in an anechoic chamber using a multi-microphone setup behave very differently when used on a single microphone device in a noisy environment. In order to capture accurate metrics, tests should be carried out on an actual device with the hardware installed, and failing that with the hardware as it would appear on the device.

Use known attacks

Most biometric modalities in use today have been successfully spoofed, and public documentation of the attack methodology exists. Below we provide a brief high-level overview of test setups for modalities with known attacks. We recommend using the setup outlined here wherever possible.

Anticipate new attacks

For modalities where significant new improvements have been made, the test setup document may not contain a suitable setup, and no known public attack may exist. Existing modalities may also need their test setup tuned in the wake of a newly discovered attack. In both cases you will need to come up with a reasonable test setup. Please use the Site Feedback link at the bottom of this page to let us know if you have set up a reasonable mechanism that can be added.

Setups for different modalities

Fingerprint

IAR Not needed.
SAR
  • Create fake fingerprints using a mold of the target fingerprint.
  • Measurement accuracy is sensitive to the quality of the fingerprint mold. Dental silicon is a good choice.
  • The test setup should measure how often a fake fingerprint created with the mold is able to unlock the device.

Face and Iris

IAR Lower bound will be captured by SAR so separately measuring this is not needed.
SAR
  • Test with photos of the target's face. For iris, the face will need to be zoomed in to mimic the distance a user would normally use the feature.
  • Photos should be high resolution, otherwise results are misleading.
  • Photos should not be presented in a way that reveals they are images. For example:
    • image borders should not be included
    • if the photo is on a phone, the phone screen/bezels should not be visible
    • if someone is holding the photo, their hands should not be seen
  • For straight angles, the photo should fill the sensor so nothing else outside can be seen.
  • Face and iris models are typically more permissive when the sample (face/iris/photo) is at an acute angle w.r.t to the camera (to mimic the use case of a user holding the phone straight in front of them and pointing up at their face). Testing at this angle will help determine if your model is susceptible to spoofing.
  • The test setup should measure how often an image of the face or iris is able to unlock the device.

Voice

IAR
  • Test using a setup where participants hear a positive sample and then try to mimic it.
  • Test the model with participants across genders and with different accents to ensure coverage of edge cases where some intonations/accents have a higher FAR.
SAR
  • Test with recordings of the target's voice.
  • The recording needs to be of a reasonably high quality, or the results will be misleading.