Python
This note includes some important concepts and how-to guides related to Python programming.
Packages
What is a Python package?
A Python package is a collection of Python modules organized in a specific way. It allows for the logical organization of Python code and makes it easier to manage and distribute.
Traditionally, a Python package must include an __init__.py file in its directory. This file can be empty, but its presence tells Python that the directory should be treated as a package. Since Python 3.3, __init__.py is no longer strictly required for packages, but it is still recommended for compatibility and to define package-level variables or initialization code.
__init__.py is required if you want Python to recognize a directory as a package in versions before Python 3.3, or if you need to execute initialization code when the package is imported. It is also necessary for creating namespace packages and for compatibility with tools and older codebases.
A subpackage is a package that exists within another package. It is simply a subdirectory containing its own __init__.py file, allowing for hierarchical organization of modules and packages.
What is Package Repository?
To reuse these packages across projects, package repositories like PyPI make it easy to share, install, and manage packages.
A package repository is a centralized place where Python packages are stored and can be easily accessed and installed by users.
Common Python package repositories include:
- PyPI (Python Package Index): The official and most widely used repository for Python packages.
- Anaconda Cloud: A repository for packages tailored for scientific computing and the Anaconda distribution.
- Private Repositories: Organizations can host their own private package repositories for internal use, using tools like devpi or Warehouse.
Create a Python package
At its simplest, a Python package is just a directory containing Python code files and an __init__.py file. This structure allows Python to recognize the directory as a package and import its modules.
To share Python packages, a standardized distribution format is formalized in PEP 427 and related PEPs. These formats are accepted by PyPI and installable via pip. The main formats are .tar.gz (source distribution) and .whl (wheel, binary distribution). For conda, the formats are .tar.bz2 and .conda.
Build tools such as setuptools, build, flit, and poetry are designed to create PyPI-compatible distributions in these formats, making it easy to publish and install packages.
Example: a simple repository for developing a python package
my_package/
├── my_package/
│ ├── __init__.py
│ └── module.py
├── pyproject.toml
├── README.md
├── LICENSE
└── tests/
└── test_module.py
See flat vs src layout in the documentation for more information.
This structure uses pyproject.toml for configuration, which is now the recommended standard for Python packaging by PEP 518. Below is a simple example of what pyproject.toml might look like:
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "my_package"
version = "0.1.0"
description = "A simple example Python package."
authors = [
{
name = "Your Name",
email = "your@email.com"
}
]
dependencies = [
"requests>=2.0.0"
]
readme = "README.md"
license = { file = "LICENSE" }
The key properties of pyproject.toml, including the standard interface between frontends (pip, build) and backends (setuptools, flit, poetry, etc.) and the project metadata information, are defined in the specification.
pyproject.toml can also include additional sections for tool-specific configuration (e.g., for linters, formatters, or test runners).
Names in the python package development
Package name: This is the name of the Python package as used in import statements. For example, if your package is structured as my_package/, you would import it in Python as import my_package. This name should be unique within your project and follow Python naming conventions (lowercase, underscores allowed).
Project name: This is the name specified in pyproject.toml under [project] (name = "my_package"). It is the name shown on PyPI and used for distribution. It can contain dashes, underscores, and is case-insensitive on PyPI. For example, pip install my-package installs the project named my-package from PyPI, but you may import it as import my_package in Python.
Repository name: This is the name of your source code repository (e.g., on GitHub, GitLab). It is independent of the package and project name, but for clarity, it is often similar or identical. For example, your repository might be github.com/yourusername/my-package.
Import name: The name used in import statements in Python code. It must match the directory/module name inside your package.
Pip install name: The name used with pip install to fetch the package from PyPI. This is the name field in pyproject.toml and can differ from the import name (e.g., pip install my-package vs import my_package).
Tips:
- Keep the import name and project name similar to avoid confusion, but remember that dashes (
-) are not allowed in import names. - PyPI normalizes project names:
my-package,my_package, andMy_Packageare considered the same project. - Choose a unique project name to avoid conflicts on PyPI.
- Repository name is for your own organization and does not affect installation or import.
- Always check that your
pyproject.tomlnamematches your intended PyPI project name.
Publishing package to PyPI
PyPI supports two main distribution formats for Python packages: sdist (source distribution) and wheel (built distribution).
- sdist: The original format for distributing Python packages, containing the source code and metadata. It is created using tools like
setuptoolsorpython -m build. Users install from sdist by building the package locally. - wheel: Introduced in PEP 427 (2012), wheel is a modern binary package format that allows for faster installation since it does not require building from source. Wheels are built using tools like
python -m buildorpip wheel.
To build both formats, use:
python -m build
This will generate .tar.gz (sdist) and .whl (wheel) files in the dist/ directory.
Understanding Wheel Files
A wheel (.whl) file is essentially a ZIP archive with a .whl extension. You can extract it using standard ZIP tools to inspect its contents, which typically include compiled Python bytecode (.pyc files), source code, and metadata.
Example:
# Rename and extract a wheel file
unzip my_package-0.1.0-py3-none-any.whl -d my_package_extracted/
Downloading and Inspecting Source Distributions
To download a package’s source distribution (sdist) without installing it, use:
pip download --no-binary :all: --no-deps package_name
This downloads the .tar.gz file to your current directory. You can extract and inspect it:
tar -xzf package_name-1.0.0.tar.gz
cd package_name-1.0.0/
Inside, you’ll find the source code, pyproject.toml or setup.py, and other package files.
Building a Wheel from Source Distribution
You can build a wheel from a downloaded sdist using pip:
pip wheel package_name-1.0.0.tar.gz
Or, extract the sdist and build from the directory:
tar -xzf package_name-1.0.0.tar.gz
cd package_name-1.0.0/
python -m build --wheel
This generates a .whl file in the dist/ directory.
Uploading to PyPI
To upload your package to PyPI, use twine:
pip install twine
twine upload dist/*
Twine securely uploads your distributions to PyPI. Alternatives like flit or poetry also support building and publishing packages.
Conda Package Formats
Conda uses its own package format (.tar.bz2 and .conda) for distributing packages via Anaconda Cloud and other conda channels. Conda packages are built and managed using the conda-build tool and are designed for cross-platform compatibility, including binaries and dependencies.
- .tar.bz2: The original conda package format, containing metadata and files compressed with bzip2.
- .conda: A newer, more efficient format introduced for faster installs and smaller package sizes.
Conda packages are published to channels (e.g., defaults, conda-forge) and installed using conda install.