Building Git Packages#
Advanced
Package Manager can build R and Python source packages from a Git repository. There are two Git sources types used for building from a Git repository:
git
- Used to point to R package repositoriesgit-python
- Used to point to Python package repositories
This document provides information for how to create and edit Git builders for both R and Python packages. Most of the information applies to both git
and git-python
sources, specific notes will be provided in each section if that is not the case.
Git Builder Prerequisites#
R Prerequisites#
R packages can come in multiple formats:
- Source: A collection of directories and files containing source code.
- Bundle: A specially created tar file containing bundled source code. The result of
R CMD build
. - Binary: A binary file specific to an operating system and architecture, containing compiled source code. Not an executable. The result of
R CMD INSTALL
.
In some configurations, Posit Package Manager will need to run R
in order to transform a package from one state to another. Because of this, Package Manager requires an installation of R
to use git
sources to build R packages with Git builders.
R Installation#
In most cases, Posit recommends you install R from pre-compiled binaries.
To install from pre-compiled binaries, follow the instructions at Install R.
Alternatively, you can install R from source by following the instructions at Install R from source.
Package Manager supports two ways of discovering R installations: through explicit configuration and through automatic detection.
Explicit R Configuration (recommended)#
You can specify the installation of R in the Package Manager configuration file:
Replace /opt/R/4.3.2
with the path to your R installation. Use the path to the R installation directory, not the path to the binary (do not include /bin/R
). While multiple versions of R may be installed on the server, only one version of R may be specified for use by Package Manager.
If the RVersion
field is included, then it must be valid, and it must only appear once in the configuration file. Check the server log after starting and stopping the Package Manager process for messages relevant to the R configuration.
Automatically Detecting R#
If Server.RVersion
is not set, Package Manager attempts to automatically detect an R installation on the server. Automatic R detection will be disabled if Server.RVersion
is configured.
Python Prerequisites#
Python packages have several formats for describing the metadata of a package. This is often stored in a setup.py
or pyproject.toml
file and needs to be parsed to properly build the package. Because of this, Package Manager requires the following when using a git-python
source with Python package repositories:
- An installation of Python on the system
- The
build
andvirtualenv
modules for building the source distribution.
Python Installation#
In most cases, Posit recommends you install Python from pre-compiled binaries.
To install from pre-compiled binaries, follow the instructions at Install Python.
Alternatively, you can install Python from source by following the instructions at Install Python from source.
Package Manager supports two ways of discovering Python installations: through explicit configuration and through automatic detection.
Explicit Python Configuration (recommended)#
You can specify the installation of Python in the Package Manager configuration file:
Replace /usr/bin/python
with the path to your Python executable.
While multiple versions of Python may be installed on the server, only one version of Python may be specified for use by Package Manager.
If the Server.PythonVersion
field is included, then it must be valid, and it must only appear once in the configuration file. Check the server log after starting and stopping the Package Manager process for messages relevant to the Python configuration.
Automatically Detecting Python#
If Server.PythonVersion
is not set, Package Manager attempts to automatically detect a Python installation on the server. Automatic detection will be disabled if Server.PythonVersion
is configured.
Required Modules#
The build
and virtualenv
modules are required to be installed so a Git builder can fetch necessary dependencies and build the source distribution in an isolated environment. They can be installed with:
If they are not installed in a location where python
can call them with python -m build
or python -m virtualenv
, then Git builders for Python will be disabled.
Using Git Builders#
Package Manager defines a git-builder
as an entity that watches a Git endpoint, whether it be remote (e.g., git@github.com:user/example.git
) or local (e.g. file:///path/to/local/git/repo
), for changes and builds R package bundles and Python source distributions. When referring to "Git" sources in Package Manager, this broadly refers to both git
and git-python
source types which are used depending on if a Git package repository points to an R or Python package.
An administrator follows these steps:
- Ensure the necessary prerequisites for R and/or Python have been configured, as described in Git builder prerequisites above.
- Create a
git
(R) orgit-python
(Python) source. - Create a
git-builder
for the source, specifying whether to watch for commits to a Git branch or tags in a Git repository. The endpoint can be: HTTP, SSH (see below), or a local file path (see below). See therspm create git-builder
command for full details, e.g., how to track a specific branch. - Based on the selection specified with the
rspm create git-builder
command, Package Manager clones the Git endpoint and runs a job to transform the Git clone into a package bundle for R packages, or source distributions for Python packages. The package is made available to any repositories subscribing to the source. - Package Manager polls the Git endpoint to watch for either new commits or new tags (based on the selection specified with the
rspm create git-builder
command). If an update is available, Package Manager automatically pulls the new changes and launches another job. The job creates a package bundle for R packages, or source distributions from Python packages from the latest Git clone and updates the package available in thegit
orgit-python
source. For R packages, previous versions are archived. All Python package versions remain available in the source. - Users can now install the packages with the proper installation tool for the corresponding language. For R, packages are installed from the repository via
install.packages
notdevtools
. Python packages are installed viapip
.
For a specific example building an R package, reference the Git Builders for R Quickstart Guide. For a specific example building a Python package, reference the Git Builders for Python Quickstart Guide.
Building Package Vignettes for R#
R packages with git
sources have the optional ability to attempt building a package vignette. By default, the Git builders for R packages use the --no-build-vignettes
option to build packages and bypass the complexity associated with additional software and system dependencies. If you want to enable vignette building and are willing to manage the required dependencies yourself, set the Git.AttemptVignettes
option to true. When enabled, the builders will attempt to build the vignettes for R packages.
Even if the Git.AttemptVignettes
setting is enabled, but the required dependencies are missing or any other issue occurs during the vignette building process, the builders will automatically fall back to using the --no-build-vignettes
option. This ensures that the build process will continue, even if vignettes cannot be built.
Pulling Python Dependencies from a PyPI Repository#
By default git-python
Git builders will pull the necessary dependencies for building a Python package from https://pypi.org. If pulling from a different PyPI repository is desired, the Git.PyPIRepoURL
config option can be used:
When setting this, Git builders will pull the Python package dependencies from the defined URL instead of PyPI. This is also helpful if you would like to pull from a specific snapshot date:
Git Logging on the Server#
Server log messages related to this component can be shown by enabling debug logging. More information about activating debug logging is in the configuration appendix.
Accessing Restricted Git Endpoints Using Git Credentials#
If Git builders require authentication to read a repository, Package Manager can use SSH keys or HTTPS credentials to authenticate against the endpoint.
Importing an SSH key#
Begin by creating an SSH key and granting the SSH key access to the Git endpoint. Although Package Manager allows the use of SSH keys with no passphrase, it is still recommended to use a strong SSH key with a passphrase. The specific steps for granting access will depend on your Git provider.
Once you have the path to the SSH key, use the rspm import ssh-key
command to name and securely store the SSH key for later use by Package Manager. If desired, you can now remove the SSH key file from your file system, as it is loaded into the Package Manager database as encrypted text. Multiple keys can be imported, but they must have unique names to refer to. This is the --name
argument you pass to the rspm import ssh-key
command, not the filename of the key.
Because the use of a password for the key requires writing the password to a file on disk, there is some risk of leaking the password while it is on disk. To mitigate this risk, encrypt the password using the rspm encrypt
command prior to writing it in a file. Package Manager will understand passwords in the file as either plaintext or encrypted text, and will decrypt the password as necessary to unlock your SSH key.
To use the newly imported SSH key with a new Git builder, specify the key name with the --credential
flag in the rspm create git-builder
command.
Importing an HTTPS credential#
HTTPS credentials consist of a username and password. In applications where two-factor authentication is used, you will often need a PAT that serves as the password. To import an HTTPS credential, use the rspm import https-credential
command.
Note that using this command requires you to input the password/PAT on the command line, which may be stored in your command history. To avoid this possible credential leak, encrypt your password using the rspm encrypt
command, and pass this encrypted string to the rspm import https-credential
command instead.
To use the newly imported credential with a new Git builder, specify the credential name with the --credential
flag in the rspm create git-builder
command.
Git Credential Security#
Package Manager encrypts and stores imported credentials in the metadata database. Any person (by default, members of the rstudio-pm
unix group) with access to the admin CLI can:
- Associate an imported credential with a Git builder using the
rspm create git-builder
command. - List the names of available Git credentials using the
rspm list git-credentials
command.
Users cannot access the contents of the credential, nor is the credential available for arbitrary actions. We recommend granting SSH keys and HTTPS credentials imported to Package Manager limited read-only access to only the endpoints you wish to expose as R packages.
Changing credentials for a Git builder#
Credentials may be rotated by either creating a new credential and editing an existing git-builder:
$ rspm import ssh-key --name=[key name] --path=[/path/to/key]
$ rspm edit git-builder --name=[git-builder name] --source=[git source] --new-credential=[key name]
Or you can update a credential by running the rspm edit ssh-key
or rspm edit https-credential
commands. Note that this will change the credential for all the Git builders using the key name.
If you would like to change the URL type of a Git builder from https
to git
or vice-versa, ensure that you pass a credential that matches the new URL type.
$ rspm edit git-builder --name=[git-builder name] --source=[git source] --new-url=git@github.com:somebody/something --new-credential=[name of ssh key]
Package Manager does not change credentials aside from manually specified commands. Therefore, when changing a URL without changing the credential, Package Manager will continue using any prior credential. To remove the association of the credential with the git-builder, use the rspm edit git-builder --remove-credential
command:
Accessing Git Endpoints on a Local File System#
If a Git repository is available locally, it is possible to point a Git builder to watch this location and update when changes are made on a branch or tag (based on the selection specified with the rspm create git-builder
command).
To do this, the Git.AllowFileURLs
configuration option must first be set to true
. Once the option is set, the following command can be used to create a Git builder pointed to a local Git repository:
Note
Git must be installed on the server if you need support for file://
URLs.
Commits vs Tags#
A package based on a Git endpoint can can be configured to watch one of two types of changes: "commits" or "tags." In short, "commits" watches for changes to a specified Git branch, where "tags" watches for new tags in the whole Git repository.
Commit mode is recommended for bleeding edge repositories, whereas tag mode is suitable for exposing stable releases of packages.
A Git source can support different packages with different modes. However, a given package can only have one mode in a source. If you would like to surface the same package in both commit and tag mode, you must create two Git sources.
Using commits and tags has slightly different behavior for R and Python, outlined below.
Commits and Tags Behavior for R Packages#
-
Commits: Package Manager will update the package any time new commits are discovered in a branch. In this mode, Package Manager automatically modifies the package's version, assigning a unique version number to each build. The version number is created based on the commit time-stamp and is designed to avoid conflicts with the version scheme used by the package author. For example, if the Description file for a package indicates a version of
1.1-3
, the automatic version number would be:1.1-3.0.0.0.1537204599
. If the author updates the package with a new commit, but keeps the version in the Description file the same, the new automatic version number would reflect the new commit time-stamp, e.g.1.1-3.0.0.0.1537218677
. This process ensures that users of the package always get the correct behavior frominstall.packages
, with newer commits being associated with a semantically higher version number.Tip
The above version behavior for "commits" triggers may be overridden by using the
Git.ForceDescriptionVersion
configuration option. This will force all packages built by commits in a branch to use the exact version in the DESCRIPTION file.
-
Tags: Package Manager will update the package any time a new Git tag is discovered. In this mode, Package Manager retains the version specified in the package's Description file. This mode is designed to work when a Git tag is used to indicate a package release. Note: The name of the tag must match the version in the Description file. For example, if your package's Description file has
Version: 5.4.2
, your tag must be either5.4.2
orv5.4.2
. If two tags reference the same version, preference is given to the newer tag. If a newer tag references an older version than a prior tag, the new tag is built as an archived package. If a tag is removed from a Git endpoint, any packages already built for that tag remain.Tip
If you wish to build packages where the Description file version does not match the tag, use the
Git.AllowTagVersionMismatch
configuration option. WhenGit.AllowTagVersionMismatch
istrue
, the version from the Description file will be used.
Commits and Tags Behavior for Python Packages#
- Commits: Package Manager will update the package any time new commits are discovered in a branch. In this mode, Package Manager will use the version that is built from
python -m build
and overwrite it any time a new commit is pushed. For example, if a package has a version1.4.4
and a new commit is pushed, the current package will be replaced for version1.4.4
and overwritten with the latest commit.
- Tags: Package Manager will update the package any time a new Git tag is discovered. In this mode, Package Manager retains the version built from
python -m build
. Unlike Git builders for R packages, the tag is not required to match the version being built for Python packages. Because of this, theGit.AllowTagVersionMismatch
does not do anything for Python Git builders. The highest version built will always be the latest available. If a new tag references an old version, it will be built and can be downloaded, but will not be served as the latest available version.
Git Cloning Depth#
By default, Git builders will shallow clone a repository to make cloning more efficient. If a package requires cloning a deeper commit depth, use the --clone-depth=[depth]
flag when creating or editing the Git builder. Set the flag value to the required cloning depth. The default cloning depth is 1
, which means that only the latest commit will be cloned. If a full deep cloning is necessary, setting the --clone-depth
flag to 0
will clone all of the commits on the branch.
Tip
If a Python package is not building the latest version, try setting --clone-depth=0
. Some Python packages require the whole repository to be cloned to determine the version to build, setting clone-depth
to 0
will resolve this issue by cloning the entire depth of the repository.
Git Submodules#
If your Git repository includes Git submodules that are required for building the R package, use the --recurse-submodules=[depth]
flag when creating or editing the Git builder. Set the flag value to the required submodule recursion depth. The default recursion depth is 0
, which means that no submodules will be included when cloning. If the Git builder includes credentials, the same credentials will be used for checking out the Git submodules.
Git Builder Naming and Multiple Builders for the same Git Repo#
For git
sources, when you create a Git builder without providing a name, the Git builder is named after the R package name by default. For git-python
sources, Python packages require the --name
flag as the Python package name cannot be determined prior to building.
If you need to add multiple Git builders for the same Git repo to the same Git source, use the --name
flag to provide a custom name for each additional Git builder. For example, you could create Git builders to follow two branches:
rspm create source --name=git --type=git
rspm create repo --name=git
rspm subscribe --repo=git --source=git
# Build "main" branch
rspm create git-builder --source=git --url https://github.com/rstudio/plumber --branch=main --wait
# Build "v044" branch
rspm create git-builder --source=git --url https://github.com/rstudio/plumber --branch=v044 --name=plumber-044 --wait
Note
When multiple Git builders are enabled for the same repository using the same git
source for R packages, the latest version will always be marked as "latest", and all others will be considered "archived," regardless of when the Git builder was created or when the build was run.
Git Directories#
By default, packages will be built from the git root directory. If the package exists in a different location, it can be specified using the --sub-dir
flag when adding a git package.
Managing Packages from Git#
The git-builder
(described above) watches the Git endpoint for changes, automatically handling package updates and archives. There might be cases where you wish to remove packages, or to stop package building altogether.
Packages can be removed at any time using the rspm remove
command.
To stop automatic package building, but keep the existing packages, use the rspm delete git-builder
command. To resume package building, simply create a new git-builder
with the same metadata.
# To remove previously-built packages from git:
$ rspm remove --source=[name of source] --name=[name of package and scope]
# To stop automatic package building, but keep the packages:
$ rspm delete git-builder --name=[name of package] --source=[name of source]
To view information about the current Git endpoints that are being tracked, use:
Editing Git builders#
Git builders have a few fields that may be edited: credentials, URLs, sub directories, and branches (for "commits" triggers). The Git builders cannot be changed from SSH to HTTP URLs or vice versa.
$ rspm edit git-builder --name=[git-builder name] --source=[git source name] --new-url=[HTTP/SSH URL]
$ rspm edit git-builder --name=[git-builder name] --source=[git source name] --new-credential=[credential name]
$ rspm edit git-builder --name=[git-builder name] --source=[git source name] --new-branch=[branch name]
$ rspm edit git-builder --name=[git-builder name] --source=[git source name] --new-sub-dir=[sub directory path]
$ rspm edit git-builder --name=[git-builder name] --source=[git source name] --remove-credential
$ rspm edit git-builder --name=[git-builder name] --source=[git source name] --remove-sub-dir
Combining Packages from Git with Other Package Sources#
Local packages cannot be added manually to a git source, but a repository can surface packages from a git source alongside local packages and CRAN packages by subscribing to multiple sources. Take care when managing a repository's subscriptions, as order is important. Refer to the Multiple Sources section for more details.
Polling Frequency#
You can control how frequently Package Manager checks for updates using the Git.PollInterval
configuration field. If multiple commits occur between checks, Package Manager will create a single version representing all of the changes. If multiple tags are created or removed between checks, Package Manager will build each tag individually, automatically archiving tags representing older versions of the package.
Repository Versioning is identical in all source types, including git sources.
Tracking Changes and Errors#
If a repository subscribes to a Git source, you can view the Git source's history in the Activity Log. For a git
source, the Activity Log will identify each change to an R package including the new version, and a message will indicate the associated Git tag or commit as appropriate. For git-python
sources, an entry will only show that a package was built, but not the specific package and version.
If an error is encountered attempting to clone, poll, or bundle a package, the Activity Log will record the attempt and include a message with the CLI command to be run to view a full error log.
You can also use the following Package Manager CLI commands to quickly check your active Git builders and view the logs:
$ rspm list git-builders
<< Git Builders:
<< - [git package name]
<< Source: [source name]
<< URL: [source url]
<< Trigger: [git package trigger]
<< Key: none
$ rspm list git-builds --source=[source name] --name=[git package name]
<< Git Builds:
<< - [git package name]
<< Transaction ID: [transaction ID]
<< SHA: [SHA]
<< Tag: [tag]
<< Status: [job status]
<< Time: [time of run]
<< Only showing latest build, for more builds use the --count and --page flags
<< For more information run: rspm logs --transaction-id=[transaction ID]
Package Manager automatically tries to build updates from a Git source three times. If the build fails more than three times, the update causing the failure is ignored. New updates are still discovered and built.
To retry a failed update, or to force a Git builder to rebuild the latest package version, use the rspm rerun
command:
$ rspm rerun git-builder \
--name=[package name] \
--source=[source name] \
--tag=[tag to rebuild, only required if the build trigger is tags]
You can also rerun commit-based Git builders to rebuild the package based on a specific Git SHA:
$ rspm rerun git-builder \
--name=[package name] \
--source=[source name] \
--commit-sha=[Git SHA for commit to rebuild]
To aid in debugging, it can help to view output from the Git commands that are run as well as output from the SSH connection when applicable. To enable debugging, refer to the Logging.SystemLogLevel
configuration property in the configuration appendix. To enable the debug log temporarily without restarting the server use the rspm config log
command:
Process Management#
Refer to the Process Management section for information on how Package Manager securely runs R and Python processes when building packages for Git sources.