# Python Packaging and Distribution
> > **Warning:** More information is needed to complete this guideline.
Examples of Python packaging and distribution options and how to use them.
## Purpose
> **Warning** Need to add an explanation of how this guideline supports DS workflows, meets internal and external
> policies, and aids in collaboration and our overall success
## Python Packaging Background
Python packaging has modernized a lot since the release of Python 3 in 2008. It is still changing and there is no
universally agreed upon standard for packaging. We expect these guidelines to change as the ecosystem continues to
evolve.
In short, at the time of writing, the python community is attempting to centralize project configuration into a single
pyproject.toml file that is supported by many build tools. Unfortunately this process is not complete. Setuptools,
which is probably still the most common build tool used by python developers (alternatives include Flit and Poetry),
still uses the legacy setup.py file to enable editable installs. All other setuptools configuration can be represented
in setup.cfg, which was the setuptools solution that has been superseded by pyproject.toml for other build tools.
Hopefully someday all these tools will centralize on pyproject.toml but for now, to support setuptools, we need all
three configuration files. See the example project structure below for an example.
## Resources
Some resources that describe the path by which we arrived where we are:
- [Python Packaging Authority (PyPA) tutorial on packaging using setuptools][1]
- [What the Heck is pyproject.toml?][2]
- [Github discussion on pyproject.toml support][3]
- [PEP 508 -- Dependency specification for Python Software Packages][4]
- [PEP 518 -- Specifying Minimum Build System Requirements for Python Projects][5]
- [PEP 621 -- Storing project metadata in pyproject.toml][6]
- [Dependency specification in pyproject.toml based on PEP 508][7]
- [Setuptools setup.cfg documentation][8]
- [Poetry documentation (an optional build tool)][9]
- [Flit documentation (an optional build tool)][10]
[1]: https://packaging.python.org/tutorials/packaging-projects/
[2]: https://snarky.ca/what-the-heck-is-pyproject-toml/
[3]: https://github.com/pypa/setuptools/issues/1688/
[4]: https://www.python.org/dev/peps/pep-0508/
[5]: https://www.python.org/dev/peps/pep-0518/
[6]: https://www.python.org/dev/peps/pep-0621/
[7]: https://www.python.org/dev/peps/pep-0631/
[8]: https://setuptools.readthedocs.io/en/latest/userguide/declarative_config.html
[9]: https://python-poetry.org/
[10]: https://flit.readthedocs.io/en/latest/
## Nomenclature
### Working definitions
Package: A directory containing python modules and an __init__.py file.
Subpackage: A package directory containing an __init__.py file, which is itself contained inside an enclosing package.
Module: A python file that can be imported, possibly as part of a package or subpackage.
### Package Structure
Packages can be structured in various ways but some general practices have emerged as the most readable.
From the repo root directory:
```text
.
├── CHANGES.md # Changes log, e.g. between versions
├── LICENSE.txt # License file
├── README.md # Readme file for display on upstream git server (e.g. Bitbucket) and in building documentation
├── build # Build artifacts (ignored by git)
│ ├── bdist.macosx-10.15-x86_64
│ └── lib
├── data # Directory for data required by package
│ └── naif0012.tls
├── dist # Build artifacts, pushed to PyPI by twine for distribution (ignored by git)
│ ├── lasp_datetime-0.1.dev5+gbea8efc.d20210430-py3-none-any.whl
│ └── lasp_datetime-0.1.dev5+gbea8efc.d20210430.tar.gz
├── lasp_datetime # Package root
│ ├── __init__.py
│ ├── constants.py # Example of a module
│ ├── conversions
│ ├── core.py
│ ├── leapsecond.py
│ ├── utils.py
│ └── version.py
├── lasp_datetime.egg-info # Build artifacts (ignored by git)
│ ├── PKG-INFO
│ ├── SOURCES.txt
│ ├── dependency_links.txt
│ ├── requires.txt
│ └── top_level.txt
├── pyproject.toml # Unified configuration file, used by setuptools, poetry, flit, and many others. Allows
# flexibility in build tools.
├── setup.cfg # Setuptools-specific configuration file (will eventually be replaced by pyproject.toml)
├── setup.py # Legacy setuptools script, for supporting editable installs only
├── tests # Tests package root directory (may be excluded from distributions via setup.cfg)
│ ├── __init__.py
│ ├── test_constants.py # Example test module. Should be named `test_xyz.py` when testing `xyz.py` module
│ ├── test_conversions
│ ├── test_core.py
│ ├── test_leapsecond.py
│ ├── test_utils.py
│ └── test_version.py
└── venv # Project virtual environment (ignored by git). May be located elsewhere but
# most easily managed in the repo directory.
├── bin
├── include
├── lib
└── pyvenv.cfg
```
## Configuration
Configuration depends partly on which build tool you wish to use. We will cover configuration for a project that is
built with setuptools, which has long been the best supported python build tool (though others are starting to become
popular).
### setup.py
You have almost certainly seen this before. This is the legacy configuration file for setuptools. It traditionally
contained all the metadata for a python project and was executed during installation with something like python setup.py
install. These days, this file is only necessary to support editable installs (pip install -e .) and can be reduced to
the following stub, with all remaining configuration placed in declarative files, setup.cfg and pyproject.toml.
```python
#! /usr/bin/env python
"""Bare bones setup script. The sole purpose of this script is to support editable pip installs for development"""
import setuptools
if __name__ == "__main__":
setuptools.setup()
```
### setup.cfg
This is the declarative successor to setup.py. All the same metadata that once existed in setup.py can now be placed
here. This file also supersedes requirements.txt (see the install_requires keyword). Someday this is likely to be
superseded by pyproject.toml. Documentation on format exists here:
[https://setuptools.readthedocs.io/en/latest/userguide/declarative_config.html]
(https://setuptools.readthedocs.io/en/latest/userguide/declarative_config.html)
**An example file contents is below:**
```python
[metadata]
name = lasp_datetime
author = Gavin Medley, Brandon Stone
author_email = Gavin.Medley@lasp.colorado.edu, Brandon.Stone@lasp.colorado.edu
license = Copyright 2018 Regents of the University of Colorado. All rights reserved.
license_file = LICENSE.txt
url = https://bitbucket.lasp.colorado.edu/projects/SDS/repos/py_datetime/browse
description = Python implementation of LASP's heritage idl_datetime library
long_description = file: README.md
long_description_content_type = text/markdown
keywords = astronomy, astrophysics, cosmology, space, science, units, time
classifiers =
Intended Audience :: Science/Research
Natural Language :: English
Topic :: Scientific/Engineering
Topic :: Scientific/Engineering :: Astronomy
Programming Language :: Python :: 3
Operating System :: MacOS :: MacOS X
Operating System :: POSIX :: Linux
platforms =
Operating System :: MacOS :: MacOS X
Operating System :: POSIX :: Linux
[options]
# We set packages to find: to automatically find all sub-packages
packages = find:
install_requires =
numpy
python_requires = >=3.8, <4
[options.packages.find]
exclude =
tests
tests.*
[options.extras_require]
dev =
build
coverage
pylint
pytest
twine
test =
coverage
pylint
pytest
build =
build
twine
```
### pyproject.toml
In the current state of python packaging, pyproject.toml is primarily for specifying which build backend to use when
installing and preparing packages for distribution (e.g. setuptools vs poetry vs flit vs others). pip reads this file
and acts according to the metadata specified here. This allows additional functionality that has never been provided
directly by setuptools, such as the ability to specify packages that are required for building (but not using) the
package being developed. For example, setuptools_scm is a library for detecting package versioning by introspecting
the local git repo, but it is not necessary for using the package, only for building it so we specify it here rather
than in setup.cfg.
```python
[build-system]
# Minimum requirements for the build system to execute.
requires = ["setuptools>=45", "wheel", "setuptools_scm[toml]>=6.0"]
build-backend = "setuptools.build_meta"
```
The pyproject.toml file is also used by many other python libraries as a source of configuration information. See this
Awesome pyproject.toml page for a list of projects currently using this file for configuration:
[https://github.com/carlosperate/awesome-pyproject](https://github.com/carlosperate/awesome-pyproject)
## Build Tools
Part of the current revolution in python packaging is a goal of making python build-tool-agnostic. That is, the
community is trying to agree on one or just a few metadata configuration files that can be read by many build tools so
that developers can build their projects with whatever tool they prefer.
### setuptools
This could be considered the legacy build tool for python projects but it is still the most widely used and what most
people are familiar with. It is so ubiquitous that it is one of only two packages that are installed by default in pip
virtual environments (with the other being pip itself). Setuptools uses setup.py or setup.cfg (or both).
Documentation: [https://setuptools.readthedocs.io/en/latest/](https://setuptools.readthedocs.io/en/latest/)
### Poetry
Poetry might be the trendiest python build tool out there. It uses pyproject.toml for configuration. IMAP SDC and some
SWxTREC projects use Poetry. For an example, you can look to the IMAP SDC infrastructure repository, which has an
example of a pyproject.toml, and pre-commit tools to update poetry.lock and generate a requirements.txt file for use
in AWS Lambdas. There is also an overview document on using Poetry.
Documentation: [https://python-poetry.org/docs/](https://python-poetry.org/docs/)
### Flit
Flit appears to be a lightweight tool that leverages pyproject.toml similar to Poetry.
Documentation: [https://flit.readthedocs.io/en/latest/](https://flit.readthedocs.io/en/latest/)
## Distribution
### Generating Distribution Archives Using a Build Tool
Tutorial on generating distribution
archives: [https://packaging.python.org/tutorials/packaging-projects/#generating-distribution-archives]
(https://packaging.python.org/tutorials/packaging-projects/#generating-distribution-archives)
Depending on the build tool you choose, generating distribution archives will be managed differently. For the PyPA
build tool, it may look like:
```python
# First, ensure the build module is installed from PyPI with
# pip install build
# Then
python -m build
```
### Uploading Artifacts to LASP Package Index
The LASP PyPI is hosted on our Nexus artifact repository,
at [https://artifacts.pdmz.lasp.colorado.edu/#browse/browse:lasp-pypi]
(https://artifacts.pdmz.lasp.colorado.edu/#browse/browse:lasp-pypi)
Documentation on uploading python build artifacts to Nexus can be found
here: [https://confluence.lasp.colorado.edu/x/WQ96Aw](https://confluence.lasp.colorado.edu/x/WQ96Aw)
### Versioning
Versioning can be managed in many ways as long as it is kept PEP
440 ([https://www.python.org/dev/peps/pep-0440/](https://www.python.org/dev/peps/pep-0440/)). The
suggested way is to use a library such as setuptools_scm, which introspects the local git repo and finds the latest tag
from which to create a version identifier. During the build process, that version is injected into the metadata for the
package and optionally also written to a version.py file so it remains accessible to the library internally.
## Options
The options for Python packaging and distribution that we often see used at LASP are:
- [PyPI](#packaging-for-pypi--pip-install-)
- [Conda](#packaging-for-conda--conda-install-)
## Packaging for PyPI (`pip install`)
### PyPI resources
- [PyPI Help Page](https://pypi.org/help/)
- [Setting up a PyPI account](https://pypi.org/account/register/)
- [Getting a PyPI access token](https://pypi.org/help/#apitoken)
### Built-In (`build` + `twine`)
> **Warning**: Need to add introductory paragraph that summarizes Built-In
#### How to use Built-In
Python Packaging User Guide: https://packaging.python.org/en/latest/
The link below is a fairly complete tutorial. There are also instructions there for using various other build tools:
https://packaging.python.org/en/latest/tutorials/packaging-projects/
#### Built-In resources
- [Python Packaging User Guide](https://packaging.python.org/en/latest/)
#### Setuptools Example – Library Package
setup.py
```python
"""
Setup file for the science data processing pipeline.
The only required fields for setup are name, version, and packages. Other fields to consider (from looking at other
projects): keywords, include_package_data, requires, tests_require, package_data
"""
from setuptools import setup, find_packages
# Reads the requirements file
with open('requirements.txt') as f:
requirements = f.read().splitlines()
setup(
name='my_py_library',
version='0.1.0',
author='Jane Doe, John Doe, This is just a str',
author_email='jane.doe@lasp.colorado.edu',
description='Science data processing pipeline for the instrument',
long_description=open('README.md', 'r').read(), # Reads the readme file
python_requires='>=3.8, <4',
url='https://some-git.url',
classifiers=[
"Natural Language :: English",
"Topic :: Scientific/Engineering",
"Topic :: Scientific/Engineering :: Astronomy",
"Programming Language :: Python :: 3.8",
"Operating System :: MacOS :: MacOS X",
"Operating System :: POSIX :: Linux",
],
packages=find_packages(exclude=('tests', 'tests.*')),
package_data={
"my_py_library": [
"some_necessary_config_data.json",
"calibration_data/*"
]
},
py_modules=['root_level_module_name',],
install_requires=requirements,
entry_points={
'console_scripts': [
'run-processing=my_py_library.cli:main', # package.module:function
]
}
)
```
### Publish to PyPI - Poetry
[Poetry Build and Publish Docs](https://python-poetry.org/docs/cli/#build)
How to Publish to PyPI from Poetry
```bash
poetry lock
poetry install
poetry version
poetry build
PYPI_USERNAME=__token__
PYPI_TOKEN=
poetry publish # You will be prompted for your PyPI credentials if you don't provide the environment variables
```
#### Poetry Project Configuration Example – Library Package
pyproject.toml
```toml
# pyproject.toml
# See: https://python-poetry.org/docs/pyproject/
[tool.poetry]
name = "my_python_package"
version = "0.1.0"
description = "Science data processing library and applications for some instrument."
authors = [ # Alphabetical
"Jane Doe ",
"John Doe "
]
# Configure private PyPI repo to download packages
[[tool.poetry.source]]
name = "lasp-pypi" # This name will be used in the configuration to retrieve the proper credentials
url = "https://artifacts.pdmz.lasp.colorado.edu/repository/lasp-pypi/simple" # URL used to download your private packages
# Dependency specification for core package
[tool.poetry.dependencies]
python = "^3.9"
astropy = "^4.2.1"
h5py = "^3.3.0"
numpy = "^1.21.0"
spiceypy = "^4.0.1"
lasp-packets = "1.2"
requests = "^2.26.0"
SQLAlchemy = "^1.4.27"
psycopg2 = "^2.9.2"
cloudpathlib = {extras = ["s3"], version = "^0.6.2"}
# Development dependencies
[tool.poetry.dev-dependencies]
pytest-cov = "^2.12.1"
pylint = "^2.9.3"
responses = "^0.14.0"
pytest-randomly = "^3.10.2"
moto = {extras = ["s3"], version = "^2.2.16"}
# Script entrypoints to put in installed bin directory
[tool.poetry.scripts]
sdp = 'my_python_package.cli:main'
# Poetry boilerplate
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
```
## Packaging for Conda (`conda install`)
> **Warning**: Need a volunteer to expand on Conda
### How to install and use Conda
https://conda.io/projects/conda-build/en/latest/user-guide/tutorials/build-pkgs.html
> Conda Develop:
> There is a conda subcommand called `conda develop`, but it is not actively maintained. The maintainers of
conda recommend using `pip install` to install an editable package in development mode.
> See: https://github.com/conda/conda-build/issues/1992
## Useful Links
Here are some helpful resources:
- [Python Packaging User's Guide](https://packaging.python.org/en/latest/)
- [The Hitchhiker's Guide to Python - Packaging your Code](https://docs.python-guide.org/shipping/packaging/)
- [The Sheer Joy of Packaging](https://python-packaging-tutorial.readthedocs.io/en/latest/index.html)
- [Package Python Projects the Proper Way with Poetry](https://hackersandslackers.com/python-poetry-package-manager/)
- [Poetry Documentation](https://python-poetry.org/docs/)
- [Setuptools Documentation](https://setuptools.pypa.io/en/latest/)
- [Building conda packages from scratch](https://conda.io/projects/conda-build/en/latest/user-guide/tutorials/build-pkgs.html)
Credit: Content taken from a Confluence guide written by Gavin Medley