Python Packaging and Distribution
Warning: More information is needed to complete this guideline.
Examples of Python packaging and distribution options and how to use them.
Purpose
Warning Need to add an explanation of how this guideline supports DS workflows, meets internal and external policies, and aids in collaboration and our overall success
Python Packaging Background
Python packaging has modernized a lot since the release of Python 3 in 2008. It is still changing and there is no universally agreed upon standard for packaging. We expect these guidelines to change as the ecosystem continues to evolve. In short, at the time of writing, the python community is attempting to centralize project configuration into a single pyproject.toml file that is supported by many build tools. Unfortunately this process is not complete. Setuptools, which is probably still the most common build tool used by python developers (alternatives include Flit and Poetry), still uses the legacy setup.py file to enable editable installs. All other setuptools configuration can be represented in setup.cfg, which was the setuptools solution that has been superseded by pyproject.toml for other build tools. Hopefully someday all these tools will centralize on pyproject.toml but for now, to support setuptools, we need all three configuration files. See the example project structure below for an example.
Resources
Some resources that describe the path by which we arrived where we are:
Nomenclature
Working definitions
Package: A directory containing python modules and an init.py file. Subpackage: A package directory containing an init.py file, which is itself contained inside an enclosing package. Module: A python file that can be imported, possibly as part of a package or subpackage.
Package Structure
Packages can be structured in various ways but some general practices have emerged as the most readable.
From the repo root directory:
.
├── CHANGES.md # Changes log, e.g. between versions
├── LICENSE.txt # License file
├── README.md # Readme file for display on upstream git server (e.g. Bitbucket) and in building documentation
├── build # Build artifacts (ignored by git)
│ ├── bdist.macosx-10.15-x86_64
│ └── lib
├── data # Directory for data required by package
│ └── naif0012.tls
├── dist # Build artifacts, pushed to PyPI by twine for distribution (ignored by git)
│ ├── lasp_datetime-0.1.dev5+gbea8efc.d20210430-py3-none-any.whl
│ └── lasp_datetime-0.1.dev5+gbea8efc.d20210430.tar.gz
├── lasp_datetime # Package root
│ ├── __init__.py
│ ├── constants.py # Example of a module
│ ├── conversions
│ ├── core.py
│ ├── leapsecond.py
│ ├── utils.py
│ └── version.py
├── lasp_datetime.egg-info # Build artifacts (ignored by git)
│ ├── PKG-INFO
│ ├── SOURCES.txt
│ ├── dependency_links.txt
│ ├── requires.txt
│ └── top_level.txt
├── pyproject.toml # Unified configuration file, used by setuptools, poetry, flit, and many others. Allows
# flexibility in build tools.
├── setup.cfg # Setuptools-specific configuration file (will eventually be replaced by pyproject.toml)
├── setup.py # Legacy setuptools script, for supporting editable installs only
├── tests # Tests package root directory (may be excluded from distributions via setup.cfg)
│ ├── __init__.py
│ ├── test_constants.py # Example test module. Should be named `test_xyz.py` when testing `xyz.py` module
│ ├── test_conversions
│ ├── test_core.py
│ ├── test_leapsecond.py
│ ├── test_utils.py
│ └── test_version.py
└── venv # Project virtual environment (ignored by git). May be located elsewhere but
# most easily managed in the repo directory.
├── bin
├── include
├── lib
└── pyvenv.cfg
Configuration
Configuration depends partly on which build tool you wish to use. We will cover configuration for a project that is built with setuptools, which has long been the best supported python build tool (though others are starting to become popular).
setup.py
You have almost certainly seen this before. This is the legacy configuration file for setuptools. It traditionally contained all the metadata for a python project and was executed during installation with something like python setup.py install. These days, this file is only necessary to support editable installs (pip install -e .) and can be reduced to the following stub, with all remaining configuration placed in declarative files, setup.cfg and pyproject.toml.
#! /usr/bin/env python
"""Bare bones setup script. The sole purpose of this script is to support editable pip installs for development"""
import setuptools
if __name__ == "__main__":
setuptools.setup()
setup.cfg
This is the declarative successor to setup.py. All the same metadata that once existed in setup.py can now be placed here. This file also supersedes requirements.txt (see the install_requires keyword). Someday this is likely to be superseded by pyproject.toml. Documentation on format exists here: [https://setuptools.readthedocs.io/en/latest/userguide/declarative_config.html] (https://setuptools.readthedocs.io/en/latest/userguide/declarative_config.html)
An example file contents is below:
[metadata]
name = lasp_datetime
author = Gavin Medley, Brandon Stone
author_email = Gavin.Medley@lasp.colorado.edu, Brandon.Stone@lasp.colorado.edu
license = Copyright 2018 Regents of the University of Colorado. All rights reserved.
license_file = LICENSE.txt
url = https://bitbucket.lasp.colorado.edu/projects/SDS/repos/py_datetime/browse
description = Python implementation of LASP's heritage idl_datetime library
long_description = file: README.md
long_description_content_type = text/markdown
keywords = astronomy, astrophysics, cosmology, space, science, units, time
classifiers =
Intended Audience :: Science/Research
Natural Language :: English
Topic :: Scientific/Engineering
Topic :: Scientific/Engineering :: Astronomy
Programming Language :: Python :: 3
Operating System :: MacOS :: MacOS X
Operating System :: POSIX :: Linux
platforms =
Operating System :: MacOS :: MacOS X
Operating System :: POSIX :: Linux
[options]
# We set packages to find: to automatically find all sub-packages
packages = find:
install_requires =
numpy
python_requires = >=3.8, <4
[options.packages.find]
exclude =
tests
tests.*
[options.extras_require]
dev =
build
coverage
pylint
pytest
twine
test =
coverage
pylint
pytest
build =
build
twine
pyproject.toml
In the current state of python packaging, pyproject.toml is primarily for specifying which build backend to use when installing and preparing packages for distribution (e.g. setuptools vs poetry vs flit vs others). pip reads this file and acts according to the metadata specified here. This allows additional functionality that has never been provided directly by setuptools, such as the ability to specify packages that are required for building (but not using) the package being developed. For example, setuptools_scm is a library for detecting package versioning by introspecting the local git repo, but it is not necessary for using the package, only for building it so we specify it here rather than in setup.cfg.
[build-system]
# Minimum requirements for the build system to execute.
requires = ["setuptools>=45", "wheel", "setuptools_scm[toml]>=6.0"]
build-backend = "setuptools.build_meta"
The pyproject.toml file is also used by many other python libraries as a source of configuration information. See this Awesome pyproject.toml page for a list of projects currently using this file for configuration: https://github.com/carlosperate/awesome-pyproject
Build Tools
Part of the current revolution in python packaging is a goal of making python build-tool-agnostic. That is, the community is trying to agree on one or just a few metadata configuration files that can be read by many build tools so that developers can build their projects with whatever tool they prefer.
setuptools
This could be considered the legacy build tool for python projects but it is still the most widely used and what most people are familiar with. It is so ubiquitous that it is one of only two packages that are installed by default in pip virtual environments (with the other being pip itself). Setuptools uses setup.py or setup.cfg (or both).
Documentation: https://setuptools.readthedocs.io/en/latest/
Poetry
Poetry might be the trendiest python build tool out there. It uses pyproject.toml for configuration. IMAP SDC and some SWxTREC projects use Poetry. For an example, you can look to the IMAP SDC infrastructure repository, which has an example of a pyproject.toml, and pre-commit tools to update poetry.lock and generate a requirements.txt file for use in AWS Lambdas. There is also an overview document on using Poetry.
Documentation: https://python-poetry.org/docs/
Flit
Flit appears to be a lightweight tool that leverages pyproject.toml similar to Poetry.
Documentation: https://flit.readthedocs.io/en/latest/
Distribution
Generating Distribution Archives Using a Build Tool
Tutorial on generating distribution archives: [https://packaging.python.org/tutorials/packaging-projects/#generating-distribution-archives] (https://packaging.python.org/tutorials/packaging-projects/#generating-distribution-archives)
Depending on the build tool you choose, generating distribution archives will be managed differently. For the PyPA build tool, it may look like:
# First, ensure the build module is installed from PyPI with
# pip install build
# Then
python -m build
Uploading Artifacts to LASP Package Index
The LASP PyPI is hosted on our Nexus artifact repository, at [https://artifacts.pdmz.lasp.colorado.edu/#browse/browse:lasp-pypi] (https://artifacts.pdmz.lasp.colorado.edu/#browse/browse:lasp-pypi)
Documentation on uploading python build artifacts to Nexus can be found here: https://confluence.lasp.colorado.edu/x/WQ96Aw
Versioning
Versioning can be managed in many ways as long as it is kept PEP 440 (https://www.python.org/dev/peps/pep-0440/). The suggested way is to use a library such as setuptools_scm, which introspects the local git repo and finds the latest tag from which to create a version identifier. During the build process, that version is injected into the metadata for the package and optionally also written to a version.py file so it remains accessible to the library internally.
Options
The options for Python packaging and distribution that we often see used at LASP are:
Packaging for PyPI (pip install)
PyPI resources
Built-In (build + twine)
Warning: Need to add introductory paragraph that summarizes Built-In
How to use Built-In
Python Packaging User Guide: https://packaging.python.org/en/latest/ The link below is a fairly complete tutorial. There are also instructions there for using various other build tools: https://packaging.python.org/en/latest/tutorials/packaging-projects/
Built-In resources
Setuptools Example – Library Package
setup.py
"""
Setup file for the science data processing pipeline.
The only required fields for setup are name, version, and packages. Other fields to consider (from looking at other
projects): keywords, include_package_data, requires, tests_require, package_data
"""
from setuptools import setup, find_packages
# Reads the requirements file
with open('requirements.txt') as f:
requirements = f.read().splitlines()
setup(
name='my_py_library',
version='0.1.0',
author='Jane Doe, John Doe, This is just a str',
author_email='jane.doe@lasp.colorado.edu',
description='Science data processing pipeline for the instrument',
long_description=open('README.md', 'r').read(), # Reads the readme file
python_requires='>=3.8, <4',
url='https://some-git.url',
classifiers=[
"Natural Language :: English",
"Topic :: Scientific/Engineering",
"Topic :: Scientific/Engineering :: Astronomy",
"Programming Language :: Python :: 3.8",
"Operating System :: MacOS :: MacOS X",
"Operating System :: POSIX :: Linux",
],
packages=find_packages(exclude=('tests', 'tests.*')),
package_data={
"my_py_library": [
"some_necessary_config_data.json",
"calibration_data/*"
]
},
py_modules=['root_level_module_name',],
install_requires=requirements,
entry_points={
'console_scripts': [
'run-processing=my_py_library.cli:main', # package.module:function
]
}
)
Publish to PyPI - Poetry
How to Publish to PyPI from Poetry
poetry lock
poetry install
poetry version
poetry build
PYPI_USERNAME=__token__
PYPI_TOKEN=<token-copied-from-pypi-account>
poetry publish # You will be prompted for your PyPI credentials if you don't provide the environment variables
Poetry Project Configuration Example – Library Package
pyproject.toml
# pyproject.toml
# See: https://python-poetry.org/docs/pyproject/
[tool.poetry]
name = "my_python_package"
version = "0.1.0"
description = "Science data processing library and applications for some instrument."
authors = [ # Alphabetical
"Jane Doe <jane.doe@lasp.colorado.edu>",
"John Doe <john.doe@lasp.colorado.edu>"
]
# Configure private PyPI repo to download packages
[[tool.poetry.source]]
name = "lasp-pypi" # This name will be used in the configuration to retrieve the proper credentials
url = "https://artifacts.pdmz.lasp.colorado.edu/repository/lasp-pypi/simple" # URL used to download your private packages
# Dependency specification for core package
[tool.poetry.dependencies]
python = "^3.9"
astropy = "^4.2.1"
h5py = "^3.3.0"
numpy = "^1.21.0"
spiceypy = "^4.0.1"
lasp-packets = "1.2"
requests = "^2.26.0"
SQLAlchemy = "^1.4.27"
psycopg2 = "^2.9.2"
cloudpathlib = {extras = ["s3"], version = "^0.6.2"}
# Development dependencies
[tool.poetry.dev-dependencies]
pytest-cov = "^2.12.1"
pylint = "^2.9.3"
responses = "^0.14.0"
pytest-randomly = "^3.10.2"
moto = {extras = ["s3"], version = "^2.2.16"}
# Script entrypoints to put in installed bin directory
[tool.poetry.scripts]
sdp = 'my_python_package.cli:main'
# Poetry boilerplate
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
Packaging for Conda (conda install)
Warning: Need a volunteer to expand on Conda
How to install and use Conda
https://conda.io/projects/conda-build/en/latest/user-guide/tutorials/build-pkgs.html
Conda Develop: There is a conda subcommand called
conda develop, but it is not actively maintained. The maintainers of conda recommend usingpip installto install an editable package in development mode. See: https://github.com/conda/conda-build/issues/1992
Useful Links
Here are some helpful resources:
Credit: Content taken from a Confluence guide written by Gavin Medley