Measuring Biometric Unlock Security

To be considered compatible with Android, device implementations must meet the requirements presented in the Android Compatibility Definition Document (CDD). The Android 10 CDD evaluates the security of a biometric implementation using architectural security and spoofability.

  • Architectural security: How resilient a biometric pipeline is against kernel or platform compromise. A pipeline is considered secure if kernel and platform compromises do not confer the ability to either read raw biometric data or inject synthetic data into the pipeline to influence the authentication decision.
  • Spoofability: Spoofability is measured by the Spoof Acceptance Rate (SAR) of the biometric. SAR is a metric introduced in Android 9 to measure how resilient a biometric is against a dedicated attacker. When measuring biometrics you need to follow the protocols described below.

Android uses three types of metrics to measure the strength of a biometric implementation.

  • Spoof Accept Rate (SAR): Defines the metric of The chance that a biometric model accepts a previously recorded, known good sample. For example, with voice unlock this would measure the chances of unlocking a user's phone using a recorded sample of them saying: "Ok, Google" We call such attacks Spoof Attacks.
  • Imposter Accept Rate (IAR): Defines the metric of the chance that a biometric model accepts input that is meant to mimic a known good sample. For example, in the Smart Lock trusted voice (voice unlock) mechanism, this would measure how often someone trying to mimic a user's voice (using similar tone and accent) can unlock their device. We call such attacks Imposter Attacks.
  • False Accept Rate (FAR): Defines the metrics of how often a model mistakenly accepts a randomly chosen incorrect input. While this is a useful measure, it does not provide sufficient information to evaluate how well the model stands up to targeted attacks.

Trust agents

Android 10 changes how Trust Agents behave. Trust Agents can't unlock a device, they can only extend the unlock duration for a device that is already unlocked. Trusted face is deprecated in Android 10.

Tiered Authentication

Biometric security is classified using the results from the architectural security and spoofability tests. A biometric implementation can be classified as either Strong, Weak, or Convenience. The table below describes each tier.

Biometric Tier Metrics Biometric Pipeline Constraints
Strong SAR: 0-7%
FAR: 1/50k
FRR: 10%
Secure
  • 72 hours before fallback to primary authentication (such as PIN, pattern, or password)
  • Can expose an API to applications (eg: via integration with the BiometricPrompt or FIDO2 APIs
  • Must Submit BCR
Weak
(new devices)
SAR: 7-20%
FAR: 1/50k
FRR: 10%
Secure
  • 24 hours before fallback to primary authentication
  • 4 hour idle timeout OR 3 incorrect attempts before fallback to primary authentication
  • Can integrate with BiometricPrompt, but cannot integrate with keystore (eg: to release app auth-bound keys)
  • Must Submit BCR
Weak
(upgrading devices)
SAR: 7-20%
FAR: 1/50k
FRR: 10%
Insecure/Secure
  • Strongly recommend 24 hours before fallback to primary authentication
  • Strongly recommend 4 hour idle timeout OR 3 incorrect attempts before fallback to primary authentication
  • Can integrate with BiometricPrompt if the pipeline is Secure, but cannot integrate with keystore (eg: to release app auth-bound keys)
  • Must Submit BCR
Convenience (new devices) SAR: >20%
FAR: 1/50k
FRR: 10%
Insecure/Secure
  • 24 hours before fallback to primary authentication
  • 4 hour idle timeout OR 3 incorrect attempts before fallback to primary authentication
  • Cannot expose an API to applications
  • Temporary tier could go away after Q
Convenience
(for upgrading devices)
SAR: >20%
FAR: 1/50k
FRR: 10%
Insecure/Secure
  • Strongly recommend 24 hours before fallback to primary authentication
  • Strongly recommend 4 hour idle timeout OR 3 incorrect attempts before fallback to primary authentication
  • Cannot expose an API to applications
  • Temporary tier could go away after Q

Strong vs. weak vs. convenience modalities

The bar for an unlock modality rating is a combination of the three accept rates - FAR, IAR, and SAR. In cases where an imposter attack does not exist, we consider only the FAR and SAR.
See the Android Compatibility Definition Document (CDD) for the measures to be taken for all unlock modalities.

Face and iris authentication

Evaluation process

The evaluation process is made up of two phases. The calibration phase determines the optimal presentation attack for a given authentication solution (that is the calibrated position). The test phase uses the calibrated position to perform multiple attacks and evaluates the number of times the attack was successful.

It is important to first determine the calibrated position because the SAR should only be measured using attacks against the greatest point of weakness on the system.

Calibration process

There are three parameters for face and iris authentication that need to be optimized to ensure optimal values for the testing phase.

Face

  • The presentation medium is the actual output media for the spoof. The following media are currently considered in scope:
    • 2D
      • Printed photos
      • Photos on a monitor or a phone display
      • Videos on a monitor or a phone display
    • 3D
      • 3D printed masks
  • The presentation format relates to further manipulation of the medium or the environment, in a way that aids spoofing. Here are some examples of manipulation to try:
    • Folding printed photos slightly so that it curves at the cheeks (thus slightly mimicking depth) can sometimes vastly aid breaking 2D face authentication solutions.
    • Varying lighting conditions is an example of modifying the environment to aid spoofing
    • Smudging, or dirtying the lens slightly
    • Changing the orientation of the phone between portrait and landscape modes to see if that affects spoofability
  • Performance across subject diversity (or the lack of it) is especially relevant to machine learning based authentication solutions. Testing the calibration flow across subject genders and ethnicities can often reveal substantially worse performance for segments of the global population and is an important parameter to calibrate in this phase.

Spoof testing is intended to test whether or not a system accepts a valid replay or presentation attack. The presentation medium needs to be sufficient to pass as a valid biometric claim during a biometric verification process if anti-spoof or presentation attack detection (PAD) was not implemented or was disabled. A presentation medium that cannot pass a biometric verification process without anti-spoof or PAD functionality is invalid as a spoof and all tests using that medium are invalid. Conductors of spoof tests should demonstrate that presentation medium, or artefacts, used in their tests satisfy this criteria.

Iris

  • The presentation medium is the actual output media for the spoof. The following media are currently considered in scope:
    • Printed photos of faces clearly showing the iris
    • Photos/Videos of faces on a monitor or phone display that clearly shows the iris
    • Prosthetic eyes
  • The presentation format relates to further manipulation of the medium or the environment, in a way that aids spoofing. For example, placing a contact lens over a printed photo or over the display of a photo/video of the eye helps fool some iris classification systems and can help improve the rate of bypass of iris authentication systems.
  • Performance across subject diversity is especially relevant to machine learning based authentication solutions. With iris based authentication, different iris colors can have different spectral characteristics, and testing across different colors can highlight performance issues for segments of the global population.

Test Phase

The test phase is where the resilience of a solution is actually measured using the optimized presentation attack from the previous phase.

Counting attempts in the test phase

A single attempt is counted as the window between presenting a face (real or spoofed), and receiving some feedback from the phone (either an unlock event or a user visible message). Any tries where the phone is unable to get enough data to attempt a match should not be included in the total number of attempts used to compute SAR.

Evaluation protocol

Enrollment

Before starting the calibration phase for either face or iris authentication navigate to the device settings and remove all existing biometric profiles. After all existing profiles have been removed, enroll a new profile with the target face or iris that will be used for calibration and testing. It is important to be in a brightly lit environment when adding a new face or iris profile and that the device is properly situated directly in front of the target face at a distance of 20 cm to 80 cm.

Calibration phase

Prepare the presentation medium.

Face

  • Take a high quality photo or video of the enrolled face under the same lighting conditions, angle, and distance as the enrollment flow.
  • For physical printouts:
    • Cut along the outline of the face, creating a paper mask of sorts.
    • Bend the mask at both cheeks to mimic the curvature of the target face
    • Cut eye-holes in the ‘mask' to show the tester's eyes - this is useful for solutions that look for blinking as a means of liveness detection.
  • Try the suggested presentation format manipulations to see if they affect the chances of success during the calibration phase

Iris

  • Take a high-resolution photo or video of the enrolled face, clearly showing the iris under the same lighting conditions, angle, and distance as the enrollment flow.
  • Try with and without contact lenses over the eyes to see which method increases spoofability

Conducting the calibration phase

Reference positions
  • Reference position: The reference position is determined by placing the presentation medium at an appropriate distance (20-80cm) in front of the device in such a way where the medium is clearly visible in the device's view but all anything else being used (such as a stand for the medium) is not visible.
  • Horizontal reference plane: While the medium is in the reference position the horizontal plane between the device and the medium is the horizontal reference plane.
  • Vertical reference plane: While the medium is in the reference position the vertical plane between the device and the medium is the vertical reference plane.
Reference planes
Figure 1: Reference planes
Vertical arc

Determine the reference position then test the medium in a vertical arc maintaining the same distance from the device as the reference position. Raise the medium in the same vertical plane, creating a 10 degree angle between the device and the horizontal reference plane and test the face unlock.

Continue to raise and test the medium in 10 degree increments until the medium is no longer visible in the devices field of view. Record any positions that successfully unlocked the device. Repeat this process but moving the medium in a downward arc, below the horizontal reference plane. See figure 3 below for an example of the arc tests.

Horizontal arc

Return the medium to the reference position then move it along the horizontal plane to create a 10 degree angle with the vertical reference plane. Perform the vertical arc test with the medium in this new position. Move the medium along the horizontal plane in 10 degree increments and perform the vertical arc test in each new position.

Testing along the horizontal arc
Figure 1: Testing along the vertical and horizontal arc

The arc tests need to be repeated in 10 degree increments for both the left and right side of the device as well as above and below the device.

The position that yields the most reliable unlocking results is the calibrated position for 2D spoofing.

Calibration phase for 3D

The calibration for 3D medium is identical to the calibration phase for 2D medium except using a 3D printed medium (such as a mask). Follow the instructions for 2D medium calibration with the 3D medium and determine the calibrated position.

Testing diversity

It's possible for face and iris models to perform differently across gender and ethnicities. Calibrate presentation attacks across a variety of faces to maximize the chances of uncovering gaps in performance.

Testing phase

At the end of the calibration phase there should be two calibrated positions (collectively across 2D and 3D) intended to test both 2D and 3D spoofability. If a calibrated position can't be established then the reference position should be used. The test methodology is common for both 2D and 3D testing and is very straightforward.

  • Across Eenrolled faces, where E>= 10, and includes at least 5 unique faces (this implies that a minimal test would repeat each of the 5 unique faces twice).
    • Enroll face/iris
    • Using the calibrated position from the previous phase, perform U unlock attempts, counting attempts as described in the previous section, and where U >= 10. Record the number of successful unlocks S.
    • The SAR can then be measured (separately for 2D and 3D) as:

Where:

  • E = the number of enrollments
  • U = the number of unlock attempts per enrollment
  • Si = the number of successful unlocks for enrollment i

Iterations required to gain statistically valid samples of error rates: 95% confidence assumption for all below, large N

Margin of Error Test iterations required per subject
1% 9595
2% 2401
3% 1067
5% 385
10% 97

Time required (30sec per attempt, 5 subjects)

Margin of error Total time
1% 399 hours
2% 100 hours
3% 44.5 hours
5% 16.4 hours
10% 4.0 hours

We recommend targeting a 5% margin of error, which gives a true error rate in the population of 2% to 12%.

Scope

This process is setup to test the resilience of face authentication primarily against facsimiles of the target user's face. It does not address non-facsimile based attacks such as using LEDs, or patterns that act as master prints. While these have not yet been shown to be effective against depth-based face authentication systems, there is nothing that conceptually prevents this from being true. It is both possible and plausible that future research will show this to be the case. At this point, this protocol will be revised to include measuring the resilience against these attacks.

Fingerprint authentication

In Android 10, the bar is set at a minimum resilience to fake fingerprints as measured by a Spoof Acceptance Rate (SAR) that is less than or equal to 7%. A brief rationale of why 7% specifically can be found in this blog post.

Evaluation process

The evaluation process is made up of two phases. The calibration phase determines the optimal presentation attack for a given fingerprint authentication solution (that is, the calibrated position). The test phase uses the calibrated position to perform multiple attacks and evaluates the number of times the attack was successful.

Calibration process

There are three parameters for fingerprint authentication that need to be optimized to ensure optimal values for the testing phase.

  • The presentation medium is the actual output media for the spoof, such as printed fingerprints or a molded replica are all examples of presentation media. The following spoof materials are strongly recommended.
    • Optical
      • Copy Paper/Transparency with non-conductive ink
      • Knox Gelatin
      • Latex Paint
      • Elmer's Glue All
    • Capacitive
      • Copy Paper/Transparency with conductive ink
      • Knox Gelatin
      • Elmer's Carpenter's Interior Wood Glue
      • Elmer's Glue All
      • Latex Paint
    • Ultrasonic
      • Copy Paper/Transparency with non-conductive ink
      • Knox Gelatin
      • Elmer's Carpenter's Interior Wood Glue
      • Elmer's Glue All
      • Latex Paint
  • The presentation format relates to further manipulation of the medium or the environment, in a way that aids spoofing. For example, retouching or editing a high resolution image of a fingerprint prior to creating the 3D replica.
  • Performance across subject diversity is especially relevant to tuning the algorithm. Testing the calibration flow across subject genders and ethnicities can often reveal substantially worse performance for segments of the global population and is an important parameter to calibrate in this phase.
Testing diversity

It's possible for the fingerprint readers to perform differently across gender and ethnicities. A small percentage of the population has fingerprints that are difficult to recognize, so a variety of fingerprints should be used to determine the optimal parameters for recognition and in spoof testing.

Testing process

The test phase is where the resilience of a solution is actually measured. Testing should be done in a non-cooperative manner. Meaning that any fingerprints collected are done so by lifting them off another surface as opposed to having the target actively participate in collection of their fingerprint, such as making a mold.

Counting attempts in the test phase

A single attempt is counted as the window between presenting a fingerprint (real or spoofed) to the sensor, and receiving some feedback from the phone (either an unlock event or a user visible message).

Any tries where the phone is unable to get enough data to attempt a match should not be included in the total number of attempts used to compute SAR.

Evaluation protocol

Enrollment

Before starting the calibration phase for fingerprint authentication navigate to the device settings and remove all existing biometric profiles. After all existing profiles have been removed, enroll a new profile with the target fingerprint that will be used for calibration and testing. Follow all the on screen directions until the profile has been successfully enrolled.

Calibration phase

Ultrasonic

This is similar to the calibration phases of optical and capacitive, but with both a printout and a 3D mold of the target user's fingerprint.

  • Lift a latent copy of the fingerprint off a surface.
  • Test with a presentation medium like printouts
    • Place the lifted fingerprint on the sensor
  • Test with a 3D mold. Spoofed fingerprints must be created using at least 4 different materials (such as gelatin, silicone, wood glue) for each person.
    • Create a mold of the fingerprint
    • Place the molded fingerprint on the sensor
Optical

Calibrating for optical involves lifting a latent copy of the target fingerprint. For example, this may be done using fingerprints lifted via fingerprint powder, or printed copies of a fingerprint and may include manual re-touching of the fingerprint image to achieve a better spoof.

Capacitive

Calibrating for capacitive involves the same steps described above for optical calibration but after the latent copy of the target fingerprint has been made, a mold of the fingerprint is made.

Testing phase

  • Get at least 10 unique people to enroll one fingerprint using the same parameters used when calculating the FRR/FAR
  • Create at least 4 fake fingerprints with at least 4 different spoof materials for each person using the methods described above
  • Try each of the different spoofs across 5 unlock attempts per person

Iterations required to gain statistically valid samples of error rates: 95% confidence assumption for all below, large N

Margin of Error Test iterations required per subject
1% 9595
2% 2401
3% 1067
5% 385
10% 97

Time required (30sec per attempt, 5 subjects)

Margin of error Total time
1% 399 hours
2% 100 hours
3% 44.5 hours
5% 16.4 hours
10% 4.0 hours

We recommend targeting a 5% margin of error, which gives a true error rate in the population of 2% to 12%.

Scope

This process is setup to test the resilience of fingerprint authentication primarily against facsimiles of the target user's fingerprint. The testing methodology is based on current material costs, availability and mold making technology. This protocol will be revised to include measuring resilience against new materials and techniques as they become practical to execute.

Common considerations

While each modality requires a different test setup, there are a few common aspects that apply to all of them.

Test the actual hardware

Collected SAR/IAR metrics can be inaccurate when biometric models are tested under idealized conditions and on different hardware than it would actually appear on a mobile device. For example, voice unlock models that are calibrated in an anechoic chamber using a multi-microphone setup behave very differently when used on a single microphone device in a noisy environment. In order to capture accurate metrics, tests should be carried out on an actual device with the hardware installed, and failing that with the hardware as it would appear on the device.

Use known attacks

Most biometric modalities in use today have been successfully spoofed, and public documentation of the attack methodology exists. Below we provide a brief high-level overview of test setups for modalities with known attacks. We recommend using the setup outlined here wherever possible.

Anticipate new attacks

For modalities where significant new improvements have been made, the test setup document may not contain a suitable setup, and no known public attack may exist. Existing modalities may also need their test setup tuned in the wake of a newly discovered attack. In both cases you will need to come up with a reasonable test setup. Please use the Site Feedback link at the bottom of this page to let us know if you have set up a reasonable mechanism that can be added.

Setups for different modalities

Fingerprint

IAR Not needed.
SAR
  • Create fake fingerprints using a mold of the target fingerprint.
  • Measurement accuracy is sensitive to the quality of the fingerprint mold. Dental silicon is a good choice.
  • The test setup should measure how often a fake fingerprint created with the mold is able to unlock the device.

Face and Iris

IAR Lower bound will be captured by SAR so separately measuring this is not needed.
SAR
  • Test with photos of the target's face. For iris, the face will need to be zoomed in to mimic the distance a user would normally use the feature.
  • Photos should be high resolution, otherwise results are misleading.
  • Photos should not be presented in a way that reveals they are images. For example:
    • image borders should not be included
    • if the photo is on a phone, the phone screen/bezels should not be visible
    • if someone is holding the photo, their hands should not be seen
  • For straight angles, the photo should fill the sensor so nothing else outside can be seen.
  • Face and iris models are typically more permissive when the sample (face/iris/photo) is at an acute angle w.r.t to the camera (to mimic the use case of a user holding the phone straight in front of them and pointing up at their face). Testing at this angle will help determine if your model is susceptible to spoofing.
  • The test setup should measure how often an image of the face or iris is able to unlock the device.

Voice

IAR
  • Test using a setup where participants hear a positive sample and then try to mimic it.
  • Test the model with participants across genders and with different accents to ensure coverage of edge cases where some intonations/accents have a higher FAR.
SAR
  • Test with recordings of the target's voice.
  • The recording needs to be of a reasonably high quality, or the results will be misleading.