Skip to content

Kubernetes

Advanced

Posit Package Manager can be configured to run on Azure in a Kubernetes cluster with Azure Kubernetes Service (AKS) for a non-air-gapped environment. In this architecture, Package Manager can handle a large number of users and benefits from deploying across multiple availability zones.

This configuration is suitable for teams of hundreds of data scientists who want or require multiple availability zones.

Most companies don’t need to run in this configuration unless they have many concurrent package uploads/downloads or are required to run across multiple availability zones for compliance reasons. Instead, the single server architecture of Package Manager would be more suitable for small teams that don’t need these requirements.

Architecture Overview#

This Posit Package Manager implementation deploys the application in an AKS cluster following the Kubernetes installation instructions. It additionally leverages:

Architecture Diagram#

Diagram of Package Manager configuration running on an Kubernetes in Azure with AKS

Kubernetes Cluster#

The Kubernetes cluster should be provisioned using Azure Kubernetes Service (AKS). The cluster should run across multiple availability zones.

Nodes#

We recommend three worker nodes across multiple availability zones. We have tested with Standard D8 v5 instances (8 vCPUs, 32 GiB memory) for each of the nodes and can serve 30 million package installs per month, or one million package installs per day. This configuration can also handle 100 Git builders concurrently building packages from Git repositories.

Note

Each Posit Package Manager user could be downloading dozens or hundreds of packages a day. There are also other usage patterns such as an admin uploading local packages or the server building packages for Git builders, but package installations give a good idea of what load and throughput this configuration can handle.

This reference architecture does not assume autoscaling node pools. It assumes you have a fixed number of nodes within your node group. However, it is safe for Posit Package Manager pods to run on auto-scaling nodes. If a pod is evicted from a node due to a scale-down event, any long-running jobs (e.g. Git builders) that are in progress will be restarted on a different pod. All long-running jobs are tracked externally in the database.

Database#

This configuration uses Azure Database for PostgreSQL - Flexible Server on a Standard D4ds v4 instance (4 vCPUs, 16 GiB memory) with 128 GiB of storage and zone-redundant high availability.

Zone-redundant high availability allows for the Azure Database instance to run in an active/passive configuration across 2 availability zones, with auto-failover when the primary instance goes down.

The Azure Database instance should be configured with an empty Postgres database for the Posit Package Manager metadata. To handle a higher number of concurrent users, the configuration option PostgresPool.MaxOpenConnections should be increased to 50.

This is a very generous configuration. In our testing, the Postgres database handled one million package installs per day without exceeding 10-20% CPU utilization.

Storage#

An Azure Files NFS file share is used to store data about packages and sources, as well as cached metadata to decrease response times for requests.

We have provisioned a 1000 GiB NFS file share using Azure Premium SSDs with zone-redundant storage (ZRS). ZRS allows for data to be copied across 3 availability zones within a single region.

The NFS file share should also be configured with the recommended mount options for NFS Azure file shares: nconnect=4, noresvport, actimeo=30, and lookupcache=pos.

Networking#

Posit Package Manager should be deployed in an AKS cluster with ingress using the Managed NGINX Application Routing addon. This is the recommended way to configure ingress for AKS.

Configuration Details#

The configuration of Package Manager is managed through the official Helm chart: https://github.com/rstudio/helm/tree/main/charts/rstudio-pm. For complete details, refer to the Kubernetes installation steps.

Replicas#

This reference architecture uses three replicas for the Posit Package Manager service. If you want to ensure that each replica runs on a different node, set a topologySpreadConstraints:

values.yaml
replicas: 3

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: kubernetes.io/hostname
  whenUnsatisfiable: DoNotSchedule
  labelSelector:
    matchLabels:
      # The helm chart will add this label to the pods automatically
      app.kubernetes.io/name: rstudio-pm

Resiliency and Availability#

This configuration of Posit Package Manager is comparable to what has been deployed on the Posit Public Package Manager service. As a publicly available service, the architecture is tested by the R and Python communities that use it. Public Package Manager is used by many more users than any private Posit Package Manager instance. The current uptime for the Posit Public Package Manager service can be found on the status page.

FAQ#

See the Frequently Asked Questions page for more information for the general FAQ.