Skip to content

AWS EKS Cluster

Posit Package Manager can be configured to run on AWS in an EKS cluster configuration for a non-air-gapped environment. In this architecture, Package Manager can handle a large number of users and benefits from deploying across multiple availability zones.

This configuration is suitable for teams of hundreds of data scientists who want or require multiple availability zones.

Most companies don’t need to run in this configuration unless they have many concurrent package uploads/downloads or are required to run across multiple availability zones for compliance reasons. Instead, the single server architecture of Package Manager would be more suitable for small teams that don’t need these requirements.

Architecture Overview#

This Posit Package Manager implementation deploys the application in an EKS cluster following the Kubernetes installation instructions. It additionally leverages:

  • An AWS Application Load Balancer (ALB) for ingress.
  • 3 EKS nodes running in a node group across multiple availability zones.
  • An S3 bucket for Posit Package Manager’s object storage.
  • An RDS instance across two availability zones that includes a Postgres database for Posit Package Manager metadata.

Architecture Diagram#

Diagram of Package Manager configuration running on an EKS cluster in AWS

Sizing and Performance#

Nodes#

Posit Package Manager can be run on three nodes in a node group across multiple availability zones. We have tested with c6a.4xlarge instances (16 vCPUs, 32GiB Memory) for each of the nodes and can serve 30 million package installs per month, or 1 million package installs per day. This configuration can also handle 100 Git builders concurrently building packages from Git repositories.

Note

Each Posit Package Manager user could be downloading dozens or hundreds of packages a day. There are also other usage patterns such as an admin uploading local packages or the server building packages for Git builders, but package installations give a good idea of what load and throughput this configuration can handle. This is the configuration that the Posit Public Package Manager service currently runs, so we don’t anticipate any individual customer needing to scale beyond this configuration.

Database#

This configuration uses RDS with Postgres on a db.t3.xlarge instance with 100GB of storage across 2 availability zones. This is a very generous configuration. In our testing, the Postgres database handled 1,000,000+ package installs per day without exceeding 10-20% CPU utilization.

Storage#

The S3 bucket is used to store data about packages and sources, as well as cached metadata to decrease response times for requests. An S3 bucket with default settings is sufficient for this Posit Package Manager configuration.

Configuration Details#

EKS Cluster#

The EKS cluster requires the following configuration:

  • Shared encryption keys for every node
  • Shared configuration file for every node
  • All the necessary versions of R and Python (if using Git building functionality)

Networking#

Posit Package Manager should be deployed in a EKS cluster with the control plane and node group in a private subnet with ingress using an Application Load Balancer in a public subnet. This should run across multiple availability zones.

S3#

An S3 bucket with default settings is sufficient for this Posit Package Manager configuration. S3 can also be used with KMS for client-side encryption.

RDS#

The RDS instance should be configured with an empty Postgres database for the Posit Package Manager metadata. To handle a higher number of concurrent users, the configuration option PostgresPool.MaxOpenConnections should be increased to 50.

Resiliency and Availability#

This configuration of Posit Package Manager has been deployed on the Posit Public Package Manager service. As a publicly available service, the architecture is tested by the R and Python communities that use it. Public Package Manager is used by many more users than any private Posit Package Manager instance. The current uptime for the Posit Public Package Manager service can be found on the status page.