Deployment guide

This platform requires you to deploy the infrastructure to your Google Cloud project.

Get the source code

The source code for the Cloud Telemetry Simulation platform is hosted on sdv.googlesource.com, which requires authentication as described in Access tooling repositories.

To access the source code, clone the Cloud Telemetry Simulation repository:

git clone https://sdv.googlesource.com/external/cloud_telemetry_simulation-external

Prerequisites

To deploy the platform, ensure you meet the following prerequisites:

  • A Google Cloud project with Billing enabled.
  • Web demo security: If you deploy the Web Demo, you must configure an OAuth 2.0 Client ID in Google Cloud APIs & Services > Credentials to secure the App Engine application and restrict access to authorized Google Accounts.
  • Software Defined Vehicle (SDV) build artifacts: You must have your own compiled SDV image artifacts. These are not provided in this repository.
    • cvd-host_package.tar.gz
    • sdv_core_cf-img-<version>.zip
  • Permissions: The user or service account running Terraform must have sufficient permissions to create the resources defined in the configuration (for example, Project Editor, or a custom role with permissions for Compute Engine, Cloud Functions, Identity and Access Management, Cloud Storage, and other necessary services).
  • Tools:
    • Google Cloud CLI (gcloud CLI)
    • Terraform (used version in the repository)
    • Docker
    • Go (used version for orchestrator functions in the repository)

Deploy the Google Cloud infrastructure

Deploying the simulation platform involves two main steps: using Terraform to deploy the core infrastructure to Google Cloud, and building and pushing the simulation agent Docker image to Artifact Registry. This section guides you through deploying the infrastructure.

Enter values for the following variables to update the code snippets on this page:

  1. Configure the Terraform backend: Create a file named environments/ENVIRONMENT/backend.hcl to specify where Terraform stores its state file in Cloud Storage.

    # environments/ENVIRONMENT/backend.hcl
    bucket = "TF_BUCKET_NAME"
    prefix = "sdv-telemetry-simulation"
    
  2. Configure project variables: Create a file named environments/ ENVIRONMENT/variables.tfvars with your project's details.

    # environments/ENVIRONMENT/variables.tfvars
    project_id       = "PROJECT_ID"
    default_region   = "REGION"
    default_zone     = "ZONE"
    agent_docker_image = "REGION-docker.pkg.dev/PROJECT_ID/sim-agents/simulation-agent"
        # Security: Map logical tags to SHA256 digests
    
    # Security: Map logical tags to SHA256 digests (optional)
    image_fingerprints = {
    "latest" = "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "stable" = "sha256:88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589"
    }
    
    # Parallel Execution Limit (Default: 5)
    max_concurrent_simulations = 5
    
  3. Apply the Terraform configuration: Navigate to the infrastructure directory, then initialize and apply the configuration:

    # Initialize Terraform with your backend configuration
    terraform init -backend-config=environments/ENVIRONMENT/backend.hcl
    
    # (Optional) Preview the changes
    terraform plan --var-file=environments/ENVIRONMENT/variables.tfvars
    
    # Apply the changes to deploy the infrastructure
    terraform apply --var-file=environments/ENVIRONMENT/variables.tfvars
    

Build and push the simulation agent image

The simulation agent runs the simulation on the Compute Engine virtual machine (VM). You build it with your SDV artifacts and push it to Artifact Registry.

To build and push the simulation agent image:

  1. Place artifacts: Copy your cvd-host_package.tar.gz and sdv_core_cf-img-<version>.zip files into the simulation-agent/sdv-image-resources/ directory.

  2. Build and push: Navigate to the simulation-agent directory, then build and push the image. Replace the image path with the one you configured in your variables.tfvars file.

    # Example using the path from the .tfvars example above
    export AGENT_IMAGE="REGION-docker.pkg.dev/PROJECT_ID/sim-agents/simulation-agent:latest"
    
    # Build the image
    docker build -t $AGENT_IMAGE .
    
    # Push the image to Artifact Registry
    docker push $AGENT_IMAGE
    
  3. Update fingerprints: After pushing a new image, you might need to get its SHA256 digest and update the image_fingerprints map in your variables.tfvars file, then rerun terraform apply.

    # Get the digest using gcloud
    gcloud container images describe $AGENT_IMAGE --format="value(image_summary.digest)"
    

    Your Cloud Telemetry Simulation platform is deployed and ready to accept simulation requests.

Operations and troubleshooting

This solution lets you use Google Cloud built-in tools for observability. It consumes computation resources only per request and during simulation execution.

Cost management

The architecture is designed to be cost-effective by using serverless and ephemeral resources. Costs are primarily driven by:

  • Compute Engine: Billed for the time the simulation VMs are running. Using Spot VMs can significantly reduce this cost.
  • Cloud Functions: Billed per invocation.
  • Cloud Storage: Billed for storing input and output files and logs.
  • Firestore: Billed for reads, writes, and data storage.

Observability

All components are integrated with Google Cloud's operations suite.

  • Logs Explorer: This is your primary tool for troubleshooting. You can filter logs by resource:
    • Cloud Functions: Check logs for the receive-request or schedule-simulation functions to debug orchestration issues.
    • Compute Engine: Check VM instance logs for startup or shutdown problems.
    • Simulation agent: The agent running inside the Docker container forwards its logs to Logs Explorer. Filter by the VM instance name to see detailed simulation progress.
  • Cloud Storage: For completed simulations, the logcat and bugreport files from the Cuttlefish device are uploaded to the simulation's output directory in the Cloud Storage bucket, providing deep insight into the Android environment's behavior.

Service accounts

Terraform creates several service accounts to enable a secure, least-privilege environment. Key service accounts include:

  1. Execution identity (VM):

    • simulation-agent:
      • Attached to: The Compute Engine VMs running the simulation.
      • Role: Allows the VM to upload results and signal completion.
      • Permissions:
        • roles/storage.objectUser: Reads inputs and uploads artifacts (logs, reports) to Cloud Storage.
        • roles/run.invoker: Authenticates and invokes the finish-simulation function.
  2. Orchestration identities (functions):

    • read-simulations-function:
      • Attached to: The read-simulation Cloud Function.
      • Permissions:
        • roles/datastore.user: Reads simulation and running-vm records in Firestore.
    • receive-request-function:

      • Attached to: The receive-request Cloud Function.
      • Permissions:
        • roles/datastore.user: Creates new PENDING simulation records in Firestore.
        • roles/storage.objectUser: Verifies the existence of input files in Cloud Storage.
    • scheduler-function:

      • Attached to: The schedule-simulation Cloud Function.
      • Permissions:

        -   `roles/pubsub.subscriber`: Pulls messages from the simulation
            queue.
        -   `roles/datastore.user`: Performs atomic reads and writes to the
            `running-vms` counter.
        -   `roles/compute.instanceAdmin.v1`: Creates and starts Compute
            Engine VMs.
        -   `roles/iam.serviceAccountUser`: This permission allows this
            function to assign the `simulation-agent` service account to the
            VMs it creates.
        
    • simulation-finisher-function:

      • Attached to: The finish-simulation Cloud Function.
      • Permissions: - roles/compute.instanceAdmin.v1: Deletes the VM after execution completes. - roles/datastore.user: Updates the simulation status to COMPLETED or FAILED.
    • delete-simulation-function:

      • Attached to: The delete-simulation Cloud Function.
      • Permissions: - roles/compute.instanceAdmin.v1: Force-deletes virtual machines during cancellation. - roles/datastore.user: Updates the status for canceled jobs.
  3. Trigger identities:

    • scheduler-trigger:

      • Used by: Eventarc (events) and Cloud Scheduler triggers.
      • Permissions: roles/eventarc.eventReceiver and roles/run.invoker to trigger the orchestrator functions.
    • cleanup-scheduler:

      • Used by: The Cloud Scheduler cron job for cleanup.
      • Permissions: roles/run.invoker to trigger the cleanup logic.

Managing Identity and Access Management policies for these service accounts is the primary way to control access and permissions within the system.