Skip to content

Rotating NATS Certificate Authorities

The following strategy rotates the NATS CA and NATS related certificates across the director, health monitor, NATS server, and all the deployed VMs. See Components of Bosh for more information on core components.

Preconditions

  • Director is in a healthy state.
  • All VMs are in running state in all deployments.
  • Take note of any ignored VMs. They will be omitted from the VM recreation steps.

Step 1: Update the director, health monitor, and NATS server jobs, to introduce the new CA.

$ bosh create-env ~/workspace/bosh-deployment/bosh.yml \
 --state ./state.json \
 -o ~/workspace/bosh-deployment/[IAAS]/cpi.yml \
 -o add-new-ca.yml \
 -o ... additional opsfiles \
 --vars-store ./creds.yml \
 -v ... additional vars
  • Adds new variables to generate the new NATS CA and the corresponding NATS related certificates signed by it. Please note that not all of these newly generated certificates will be used in this step, as some of them will be used in the following steps.
  • The director and health monitor jobs are given two CA certificates to trust when communicating with the NATS server. This is done through the concatenation of the old and new NATS CAs: ((nats_server_tls.ca))((nats_server_tls_2.ca)). This allows the director and health monitor to trust certificates presented by the NATS server that can be either signed by the new or old CAs. See limitations.
  • The director and health monitor jobs are updated to use new client certificates that were generated by the new CA. These client certs are used for the Mutual TLS communication with the NATS server.
  • The NATS server continues to use the old certificates (signed by old NATS CA) to serve TLS connections. NATS server is given the concatenated CAs from above to verify client certificates (for mTLS) signed by both old CA and new CA.
  • Each VM/agent continues to use the old client certificates to communicate with the NATS server.

Warning

In the below operations file add-new-ca.yml, the nats_server_tls_2 certificate is generated with the internal_ip as the only Subject Alternative Name. Please remember to add any other SANs that maybe neccessary to your environment.

add-new-ca.yml

---
- type: replace
  path: /instance_groups/name=bosh/properties/nats/tls/ca?
  value: ((nats_server_tls.ca))((nats_server_tls_2.ca))

- type: replace
  path: /instance_groups/name=bosh/properties/nats/tls/client_ca?
  value:
    certificate: ((nats_ca_2.certificate))
    private_key: ((nats_ca_2.private_key))

- type: replace
  path: /instance_groups/name=bosh/properties/nats/tls/director?
  value:
    certificate: ((nats_clients_director_tls_2.certificate))
    private_key: ((nats_clients_director_tls_2.private_key))

- type: replace
  path: /instance_groups/name=bosh/properties/nats/tls/health_monitor?
  value:
    certificate: ((nats_clients_health_monitor_tls_2.certificate))
    private_key: ((nats_clients_health_monitor_tls_2.private_key))

- type: replace
  path: /variables/-
  value:
    name: nats_ca_2
    type: certificate
    options:
      is_ca: true
      common_name: default.nats-ca.bosh-internal

- type: replace
  path: /variables/-
  value:
    name: nats_server_tls_2
    type: certificate
    options:
      ca: nats_ca_2
      common_name: default.nats.bosh-internal
      alternative_names: [((internal_ip))]
      extended_key_usage:
      - server_auth

- type: replace
  path: /variables/-
  value:
    name: nats_clients_director_tls_2
    type: certificate
    options:
      ca: nats_ca_2
      common_name: default.director.bosh-internal
      extended_key_usage:
      - client_auth

- type: replace
  path: /variables/-
  value:
    name: nats_clients_health_monitor_tls_2
    type: certificate
    options:
      ca: nats_ca_2
      common_name: default.hm.bosh-internal
      extended_key_usage:
      - client_auth

Step 2: Recreate all VMs, for each deployment.

Deployed VMs need to be recreated in order to receive new client certificates that are signed by the new CA. Also, they will receive a new list of CAs (old and new CAs certs concatenated) to trust when communicating with the NATS server. This recreation of the VMs is crucial for the NATS CA rotation.

To recreate the deployed VMs, please check the output of bosh recreate -h for options.

Step 3: Update the director, health monitor, and NATS server jobs, to remove references for the old NATS CA and certificates signed by it.

$ bosh create-env ~/workspace/bosh-deployment/bosh.yml \
 --state ./state.json \
 -o ~/workspace/bosh-deployment/[IAAS]/cpi.yml \
 -o remove-old-ca.yml \
 -o ... additional opsfiles \
 --vars-store ./creds.yml \
 -v ... additional vars
  • nats.tls.ca property is updated to remove the old CA from the concatenated CAs.
  • The director and health monitor continue to only use new client certificates (for mTLS) that were signed by the new NATS CA. Also, in this step the director and health monitor will start to ONLY trust NATS server certificates that were signed by the new CA.
  • The NATS server is updated to use a new certificate (used to serve TLS connections) signed by the new NATS CA. Also, in this step the NATS server will start to ONLY trust client certificates (for mTLS) that were signed by the new CA.
  • All components now communicate using the new CA.

remove-old-ca.yml

---
- type: replace
  path: /instance_groups/name=bosh/properties/nats/tls/ca?
  value: ((nats_server_tls_2.ca))

- type: replace
  path: /instance_groups/name=bosh/properties/nats/tls/server?
  value:
    certificate: ((nats_server_tls_2.certificate))
    private_key: ((nats_server_tls_2.private_key))

- type: replace
  path: /instance_groups/name=bosh/properties/nats/tls/client_ca?
  value:
    certificate: ((nats_ca_2.certificate))
    private_key: ((nats_ca_2.private_key))

- type: replace
  path: /instance_groups/name=bosh/properties/nats/tls/director?
  value:
    certificate: ((nats_clients_director_tls_2.certificate))
    private_key: ((nats_clients_director_tls_2.private_key))

- type: replace
  path: /instance_groups/name=bosh/properties/nats/tls/health_monitor?
  value:
    certificate: ((nats_clients_health_monitor_tls_2.certificate))
    private_key: ((nats_clients_health_monitor_tls_2.private_key))

- type: replace
  path: /variables/-
  value:
    name: nats_ca_2
    type: certificate
    options:
      is_ca: true
      common_name: default.nats-ca.bosh-internal

- type: replace
  path: /variables/-
  value:
    name: nats_server_tls_2
    type: certificate
    options:
      ca: nats_ca_2
      common_name: default.nats.bosh-internal
      alternative_names: [((internal_ip))]
      extended_key_usage:
      - server_auth

- type: replace
  path: /variables/-
  value:
    name: nats_clients_director_tls_2
    type: certificate
    options:
      ca: nats_ca_2
      common_name: default.director.bosh-internal
      extended_key_usage:
      - client_auth

- type: replace
  path: /variables/-
  value:
    name: nats_clients_health_monitor_tls_2
    type: certificate
    options:
      ca: nats_ca_2
      common_name: default.hm.bosh-internal
      extended_key_usage:
      - client_auth

Step 4: Recreate all VMs, for each deployment.

The recreation of all VMs will remove the old NATS CA reference from thier agent settings. To recreate the deployed VMs, please check the output of bosh recreate -h for options.

Step 5: Clean-up

To make future updates to the BOSH director not rely on the transitional OPS files created above (add-new-ca.yml and remove-old-ca.yml), we recommend performing the manual steps described below to clean up the vars-store file (typically named creds.yml). These steps include:

  1. Create a backup of the vars-store file

  2. Remove old certificate values from the vars-store file

  3. Rename the newly generated NATS related variables to be similar to the old variable names. For example from the sample above (nats_ca_2, nats_server_tls_2, nats_clients_director_tls_2, and nats_clients_health_monitor_tls_2) will be renamed in the vars-store file to (nats_ca, nats_server_tls, nats_clients_director_tls, and nats_clients_health_monitor_tls)

  4. Delete the add-new-ca.yml and remove-old-ca.yml ops files, which are not needed anymore.

Warning

Warning: If you do not perform the clean-up procedure, you must ensure that the ops files (add-new-ca.yml and remove-old-ca.yml) are used every time a create-env is executed going forward (which can be unsustainable). Removing the ops files would revert to the old CA, which can lead to unresponsive agents for existing and newly created VMs.

Limitations

A dependency within older versions of director and health monitor lacks the ability to verify against multiple CAs. For this reason, we specifically concatenate old_ca and new_ca in this specific order: old_ca+new_ca. Only the first certificate is considered for verification. mTLS between the director and NATS server, and the health monitor and the NATS server will fail if the order of the certs is reversed, as the new_ca will only be considered for verification against the old certs NATS server presents to its clients.

Visualization of the Steps

image