Python Best Practices

Some Python terminology that a user might encounter, particularly when working through this Python guide.

Purpose

This document is intended to provide a guide for LASP Python developers. In particular, we hope that any libraries that are intended to be shared at LASP will adhere to these conventions. We recognize that individual projects may need to violate some of these principles but we strive to keep them general enough that few exceptions will be made.

General

Values

“Build tools for others that you want to be built for you.” - Kenneth Reitz
“Simplicity is always better than functionality.” - Pieter Hintjens
“Fit the 90% use-case. Ignore the nay sayers.” - Kenneth Reitz
“Beautiful is better than ugly.” - PEP 20
Build for open source (even for closed source projects)

Guidelines

“Explicit is better than implicit” - PEP 20
“Readability counts.” - PEP 20
“Anybody can fix anything.” - Khan Academy Development Docs
Fix each broken window (bad design, wrong decision, or poor code) as soon as it is discovered.
“Now is better than never.” - PEP 20
Test ruthlessly. Write docs for new features.
Even more important that Test-Driven Development–Human-Driven Development

Style

Naming

Variables, functions, methods, packages, modules
- lower_case_with_underscores
Classes and Exceptions
- CapWords
Protected methods and internal functions
- _single_leading_underscore(self, …)
Private methods
- __double_leading_underscore(self, …)
Constants
- ALL_CAPS_WITH_UNDERSCORES

Naming Guidelines

Avoid one-letter variables (esp. l, O, I)

Exception: In very short blocks, when the meaning is clearly visible from the immediate context

Fine:

for e in elements:
    e.mutate()

Avoid redundant labeling

Yes:

import audio

core = audio.Core()
controller = audio.Controller()

No:

import audio

core = audio.AudioCore()
controller = audio.AudioController()

Prefer “reverse notation”

Yes:

elements = ...
elements_active = ...
elements_defunct = ...

No:

elements = ...
active_elements = ...
defunct_elements ...

Other Guidelines

Avoid getter and setter methods

Yes:

person.age = 42

No:

person.set_age(42)

Return only one type from functions

Prefer raising exceptions to returning multiple types.

Yes:

def get_user_name(id: int) -> str
    """Retrieves username from database by ID"""
    record = db.get_user(id)
    if record:
        return record.name
    else:
        raise RecordNotFoundError(f"No record found for user ID={id}")

No:

def get_user_name(id: int) -> str
    """Retrieves username from database by ID"""
    record = db.get_user(id)
    if record:
        return record.name
    else:
        return None

Indentation

Use 4 spaces–never tabs. Enough said.

Imports

Import entire modules instead of individual symbols within a module. For example, for a top-level module canteen that has a file canteen/sessions.py,

Yes:

import canteen
import canteen.sessions
from canteen import sessions

No:

from canteen import get_user  # Symbol from canteen/__init__.py
from canteen.sessions import get_session  # Symbol from canteen/sessions.py

Exception: For third-party code where documentation explicitly says to import individual symbols. Rationale: Avoids circular imports. See here. Put all imports at the top of the page with three sections, each separated by a blank line, in this order:

System imports
Third-party imports
Local source tree imports

Rationale: Makes it clear where each module is coming from.

Documentation

Follow PEP 257’s docstring guidelines. reStructured Text and Sphinx can help to enforce these standards. Use one-line docstrings for obvious functions:

"""Return the pathname of ``foo``."""

Multiline docstrings should include:

Summary line
Use case, if appropriate
Args
Return type and semantics, unless None is returned

"""Train a model to classify Foos and Bars.

Usage::

    >>> import klassify
    >>> data = [("green", "foo"), ("orange", "bar")]
    >>> classifier = klassify.train(data)

:param train_data: A list of tuples of the form ``(color, label)``.
:rtype: A :class:`Classifier <Classifier>`
"""

Notes:

Use action words (“Return”) rather than descriptions (“Returns”).
Document init methods in the docstring for the class.

class Person(object):
    """A simple representation of a human being.

    :param name: A string, the person's name.
    :param age: An int, the person's age.
    """
    def __init__(self, name, age):
        self.name = name
        self.age = age

Line Lengths

Don’t stress over it. 80-100 characters is fine.

Use parentheses for line continuations:

wiki = (
    "The Colt Python is a .357 Magnum caliber revolver formerly manufactured "
    "by Colt's Manufacturing Company of Hartford, Connecticut. It is sometimes "
    'referred to as a "Combat Magnum". It was first introduced in 1955, the '
    "same year as Smith & Wesson's M29 .44 Magnum."
)

Anti-Patterns

These are not limited to a language, or even to the software itself. They may manifest during the planning process, implementation, or even years down the road due to iterative changes. If you see an anti-pattern, make an effort to fix it. Now is better than never.

Main Causes of Anti-Patterns

Haste – When project deadlines are tight, budgets are cut, team sizes are reduced, in these pressure situations we tend to ignore good practices.
Apathy – Developers who really don’t care about the problem or the solution will almost always produce a poor design.
Ignorance – When a developer either lacks knowledge of the domain or of the technology being used, that ignorance will result in anti-patterns being introduced.

God Objects (aka “The Blob”)

A class or package in your system that does far too much. The catch-all for any code where the developer is not sure where to put it, or is just too lazy to create a new class or package. Also what can happen is developers will put code somewhere else simply because it is smaller and easier to work with, even if it is not the correct location. This anti-pattern is usually caused by a lack of proper object-oriented design skills on a team.

How to avoid it:
- Code reviews or pair programming.
- If you can’t describe the scope of a class’s functionality with a single sentence, then it has too much responsibility.

Lava Flows

Lava Flows occur when code has been around for so long that people are afraid to modify it. This often happens because the original authors/maintainers have left and there is no one who fully groks that area of the code. Some warning signs are big chunks of commented code with notes like “FIXME: This doesn’t appear to be used, commenting it out”.

Copy on Write Code (Parallel Protectionism)

Similar to Lava Flows, this tends to occur when developers aren’t sure of the consequences of modifying areas of a codebase. Instead of trusting regression tests or digging into the scope, they simply copy the code so that their changes don’t interfere with existing functionality. That is, the original code is used as is until it needs to be modified, then it gets copied for modification (copy-on-write).

Method Container Objects

This is most common when coming to Python from Java, where everything must be an object. If a class contains many class methods and not much else, you may have a Method Container object and you can probably make all the methods into functions within a module.

Tramp Data

Named for a parameter that tramps from function to function in the code base, this is less of an anti-pattern and more of a code smell that indicates poor design decisions. It occurs when a parameter is passed several levels deep into the stack without being used by the intermediate functions. Often it is used as a (better) alternative to a global variable but it indicates that there is a poor division of responsibility in the codebase.

Testing

Strive for 100% code coverage, but don’t get obsessed over the coverage score. Useful python testing libraries are unittest and pytest. The pytest testrunner can run unittest tests but not vice versa.

General Testing Guidelines

Use long, descriptive names. This often obviates the need for docstrings in test methods.
Tests should be isolated. Don’t interact with a real database or network. Use a separate test database that gets torn down (Docker is great for this) or use mock objects.
Prefer factories to fixtures.
Never let incomplete tests pass, else you run the risk of forgetting about them. Instead, add a placeholder like assert False, “TODO: finish me”. Pytest offers an xfail decorator to mark tests that are expected to fail (but they still show up separately from passed tests).

Unit Tests

Focus on one tiny bit of functionality. Mock out everything else.
Should be fast, but a slow test is better than no test.
It often makes sense to have one testcase class for a single class or model.

import unittest
import factories

class PersonTest(unittest.TestCase):
    def setUp(self):
        self.person = factories.PersonFactory()

    def test_has_age_in_dog_years(self):
        self.assertEqual(self.person.dog_years, self.person.age / 7)

Functional / Integration Tests

Functional tests are higher level tests that are closer to how an end-user would interact with your application. They are typically used for web and GUI applications.

Write tests as scenarios. Testcase and test method names should read like a scenario description.
Use comments to write out stories, before writing the test code.

import unittest

class TestAUser(unittest.TestCase):

    def test_can_write_a_blog_post(self):
        # Goes to the her dashboard
        ...
        # Clicks "New Post"
        ...
        # Fills out the post form
        ...
        # Clicks "Submit"
        ...
        # Can see the new post
        ...

Notice how the testcase and test method read together like “Test A User can write a blog post”.

Exception Handling

For more complete documentation, see: https://docs.python.org/3/tutorial/errors.html

When to Handle an Exception – Common Examples

The basic rule of thumb is to handle exceptions that you expect to arise during normal runtime but allow the program to continue on a useful path. Different parts of a codebase may have different context about how to handle errors. Low level functions should rarely do much error handling to keep them as general as possible. Higher level abstractions are more likely to live in a context where assumptions can be made about which exceptions are recoverable.

Parsing Messy Data

Without guarantees about the integrity of a data, we frequently need a program to get as much out of the data as possible. In this example, we log exceptions raised by parse_csv_record and simply continue on.

Resilient Data Processing:

with open(file) as csv:
    new_line = csv.readline()
    parsed_data = []
    n_parsing_failures = 0
    n_parsed_records = 0
    while new_line:
        try:
            parsed_data.append(parse_csv_record(line))
            n_parsed_records += 1
        except CsvRecordParsingError as csv_err:
            n_parsing_failures += 1
            logger.exception(csv_err)  # Logs exception, including stack trace
        new_line = csv.readline()
if n_parsing_failures:
    logger.info(f"{n_parsing_failures} parsing failures encountered in {file}. {n_parsed_records} successfully parsed.")

Connection Retries

No Web API is 100% reliable. Connection errors do occur. Whenever you are making a request to a Web API, it’s always a good idea to check for timeout errors and other common connection problems (4XX responses) and retry the request. For example

Connection Retry:

def resilient_push_data(payload: dict, n_tries: int = 3):
    while n_tries:
        n_tries -= 1
        try:
            response = push_data_to_server(data={"payload": True})
            return response
        except TimeoutError as timeout_err:
            logger.error(f"Failed to push data. Connection timed out. {n_tries} remaining.")
        except ConnectionError is conn_err:
            logger.error("Failed to push data. Server refused the request.")
    raise DataPushError(f"Failed to push data.")  # Handle this higher up in the stack, if necessary

Multiprocessing

When spawning process pools for handling parallel workloads, you typically want to know what happens in those processes, including exceptions. If an exception is raised in a subprocess, you lose control over that process and cannot notify the parent process what occurred.

from multiprocessing import Pipe, Process

def child_process_function(data, pipe):
    try:
        result = process_data(data)
    except Exception as unexpected_error:
        result = unexpected_error
        process_logger.exception(unexpected_error)
    pipe.send({"pid": os.getpid(), "result": result})

def process_data(data):
    processes = []
    receiver_pipes = []
    for datum in data:
        receiver, sender = Pipe()
        process = Process(target=child_process_function, args=(datum, sender))
        processes.append(process)
        receiver_pipes.append(receiver)
        process.start()
    for p in processes:
        p.join()
    for receiver in receiver_pipes:
        try:
            msg = receiver.recv()
            logger.info(msg)
        except Exception as comms_error:  # This should really never happen unless your child process blocks forever
        or dies before it can communicate for some reason.
            print("Failed to communicate with a child process. You should probably check the logs for that process.")

Custom Exceptions

https://docs.python.org/3/tutorial/errors.html#user-defined-exceptions

In general, ValueError should be the default built in exception. Most other built-in exception types have specific meanings within the Python standard library. If you wish to impart more specific information, define custom exceptions as follows:

import Exception

class CsvRecordParsingException(Exception):
    """Exception raised when a single CSV record fails to parse"""
    pass

Exception Handling as Control Flow

In most cases, using exception handling as control flow is an anti-pattern and can be rewritten more clearly by checking assumptions before your try clause. However, there are cases where it’s useful such as when an external state cannot be determined without calling functions that raise exceptions.

x = MaybeContainsData()

def gross_costly_control_flow(x):
    try:
        parse_data(x)  # Costly
    except NoDataError as no_data:
        x.add_data()  # Costly
        gross_costly_control_flow(x)
    except DataMalformed as bad_data:
        x.fix_data()  # Costly
        gross_costly_control_flow(x)

There are a few code smells above, but it’s a contrived example. If your code looks like this, seriously consider refactoring it:

x = MaybeContains_data()

if not x.contains_data():  # Cheap
    x.add_data()
if not x.data_valid():  # Cheap
    x.fix_data()

parse_data(x)  # Any exceptions raised now should be unexpected since we dealt with the cases we understand

Chaining Exceptions

Sometimes it’s useful to raise one exception from another in order to add context. In this case, the stack trace will indicate that the TotalFailureError was raised as a result of the UnderstoodButNotRecoverableError

try:
    do_a_thing()
except UnderstoodButNotRecoverableError as really_bad_error:
    raise TotalFailureError("A bad thing happened. Unfortunately, we can't do anything useful except tell you about
    it.") from really_bad_error

The Finally Clause

At the end of all your exception handling, regardless of whether things were handled, you can still do stuff. This should only be used for cleanup and should not be used for continuing a lengthy program execution.

try:
    do_a_thing()
except TimeoutError as timeout_error:
    handle_timeout_condition()
finally:
    clean_up_db_connections()
    flush_logs_to_log_server()

Useful Links

PEP 8 (Python’s Official Style Guide)

Acronyms

PEP = Python Enhancement Proposals

Credit

Content taken from a Confluence guide written by Gavin Medley