Azure DevOps Agents using Managed Identitites

The currently documented way of handling authentication to Azure DevOps from the Azure DevOps Agent executable expects for you to generate a Personal Access Token (PAT) with permissions to manage Agent Pool resources. In this post we'll take a look at how we can avoid this by using Managed Identities instead.

Word of warning: This functionality is undocumented and might break at some point. However, we've been running this without issues for several months now.

Before we start, check out the Dockerfile build here to get an idea how the Azure DevOps agent is run. We'll be making the changes to the start.sh file. However, in my case the Dockerfile already installs a version of the Azure DevOps agent executable, and it is not downloaded during the start.sh run.

Sometime in the last couple of years, Azure DevOps has added a somewhat undocumented feature of Entra ID oauth token support for it's REST endpoints. What this means in practice is that any endpoint that requires a PAT should be able to take in a bearer token as well. This is what our solution here is based on.

So let's start making some edits. I want to keep the image usable with both a PAT and a Managed Identity, so I'm using a Environment Variable MANAGED_IDENTITY_OBJECT_ID to decide what to do. If this is present, we use it.

## This snippet handles setting the token

if [ -z "$AZP_TOKEN_FILE" ]; then
  # Check if MANAGED_IDENTITY_OBJECT_ID is present
  if [ -n "$MANAGED_IDENTITY_OBJECT_ID" ]; then
    response=$(curl -s "$IDENTITY_ENDPOINT?api-version=2019-08-01&resource=499b84ac-1321-427f-aa17-267ca6975798&object_id=$MANAGED_IDENTITY_OBJECT_ID" -H X-IDENTITY-HEADER:$IDENTITY_HEADER)
    AZP_TOKEN=$(echo "$response" | jq -r '.access_token')

    if [ -z "$AZP_TOKEN" ]; then
      echo 1>&2 "error: failed to retrieve token using MANAGED_IDENTITY_OBJECT_ID"
      exit 1
    fi
  elif [ -z "$AZP_TOKEN" ]; then
    # Fall back to the AZP_TOKEN variable if APPLICATION_ID is not set
    echo 1>&2 "error: missing AZP_TOKEN environment variable"
    exit 1
  fi

  AZP_TOKEN_FILE=/azp/.token
  echo -n $AZP_TOKEN > "$AZP_TOKEN_FILE"
fi

unset AZP_TOKEN

The things to note here:

Next we need to do similar things to handle the cleanup, as the token might have already expired. (This might not work 100% as I've had some lingering offline agents in my pools)

cleanup() {
  if [ -n "$AZP_PLACEHOLDER" ]; then
    echo 'Running in placeholder mode, skipping cleanup'
    return
  fi
  if [ -e config.sh ]; then
    print_header "Cleanup. Removing Azure Pipelines agent..."

    if [ -n "$MANAGED_IDENTITY_OBJECT_ID" ]; then
      response=$(curl -s "$IDENTITY_ENDPOINT?api-version=2019-08-01&resource=499b84ac-1321-427f-aa17-267ca6975798&object_id=$MANAGED_IDENTITY_OBJECT_ID" -H X-IDENTITY-HEADER:$IDENTITY_HEADER)
      AZP_TOKEN=$(echo "$response" | jq -r '.access_token')

      if [ -z "$AZP_TOKEN" ]; then
      echo 1>&2 "error: failed to retrieve token using MANAGED_IDENTITY_OBJECT_ID"
      exit 1
      fi

      AZP_TOKEN_FILE=/azp/.token
      echo -n $AZP_TOKEN > "$AZP_TOKEN_FILE"
    fi

    # If the agent has some running jobs, the configuration removal process will fail.
    # So, give it some time to finish the job.
    while true; do
      ./config.sh remove --unattended --auth PAT --token $(cat "$AZP_TOKEN_FILE") && break

      echo "Retrying in 30 seconds..."
      sleep 30
    done
  fi
}

You can view the full file here

Changes needed in Azure DevOps

If you run the container without doing the following steps, you might get notified that no Agent Pools with the given name are found. This was a bit puzzling at first, but it turns out that the Managed Identity needs to be given a Basic license to Azure DevOps before it can read the pools regardless of other permissions. Sometimes adding the identity to Azure DevOps can be a bit difficult with searches not finding it, so I recommend doing this step first.

We still need to give the identity some permissions, namely the same permissions that the PAT Token should have: Read and Manage Agent Pools AND the Administrator permission to the pool instance itself on the Organization level (I don't think the Project level is needed)

And that should be all! Now if you start the agent with the Managed Identity ObjectId env variable present, you should be good to go!