# Python Best Practices

Some Python terminology that a user might encounter, particularly when working through this Python guide.

## Purpose

This document is intended to provide a guide for LASP Python developers. In particular, we hope that any libraries that
are intended to be shared at LASP will adhere to these conventions. We recognize that individual projects may need to
violate some of these principles but we strive to keep them general enough that few exceptions will be made.

---

## General

### Values

- "Build tools for others that you want to be built for you." - Kenneth Reitz
- "Simplicity is always better than functionality." - Pieter Hintjens
- "Fit the 90% use-case. Ignore the nay sayers." - Kenneth Reitz
- "Beautiful is better than ugly." - [PEP 20](https://peps.python.org/pep-0020/)
- Build for open source (even for closed source projects)

### Guidelines

- "Explicit is better than implicit" - [PEP 20](https://peps.python.org/pep-0020/)
- "Readability counts." - [PEP 20](https://peps.python.org/pep-0020/)
- "Anybody can fix anything." - Khan Academy Development Docs
- Fix each broken window (bad design, wrong decision, or poor code) as soon as it is discovered.
- "Now is better than never." - [PEP 20](https://peps.python.org/pep-0020/)
- Test ruthlessly. Write docs for new features.
- Even more important that Test-Driven Development--Human-Driven Development

---

## Style

### Naming

- Variables, functions, methods, packages, modules
  - lower_case_with_underscores
- Classes and Exceptions
  - CapWords
- Protected methods and internal functions
  - _single_leading_underscore(self, ...)
- Private methods
  - __double_leading_underscore(self, ...)
- Constants
  - ALL_CAPS_WITH_UNDERSCORES

#### Naming Guidelines

##### Avoid one-letter variables (esp. l, O, I)

Exception: In very short blocks, when the meaning is clearly visible from the immediate context

**Fine:**

```python
for e in elements:
    e.mutate()
```

##### Avoid redundant labeling

**Yes:**

```python
import audio

core = audio.Core()
controller = audio.Controller()
```

**No:**

```python
import audio

core = audio.AudioCore()
controller = audio.AudioController()
```

##### Prefer "reverse notation"

**Yes:**

```python
elements = ...
elements_active = ...
elements_defunct = ...
```

**No:**

```python
elements = ...
active_elements = ...
defunct_elements ...
```

#### Other Guidelines

##### Avoid getter and setter methods

**Yes:**

```python
person.age = 42
```

**No:**

```python
person.set_age(42)
```

##### Return only one type from functions

Prefer raising exceptions to returning multiple types.

**Yes:**

```python
def get_user_name(id: int) -> str
    """Retrieves username from database by ID"""
    record = db.get_user(id)
    if record:
        return record.name
    else:
        raise RecordNotFoundError(f"No record found for user ID={id}")
```

**No:**

```python
def get_user_name(id: int) -> str
    """Retrieves username from database by ID"""
    record = db.get_user(id)
    if record:
        return record.name
    else:
        return None
```

##### Indentation

Use 4 spaces--never tabs. Enough said.

##### Imports

Import entire modules instead of individual symbols within a module. For example, for a top-level module canteen that
has a file canteen/sessions.py,

**Yes:**

```python
import canteen
import canteen.sessions
from canteen import sessions
```

**No:**

```python
from canteen import get_user  # Symbol from canteen/__init__.py
from canteen.sessions import get_session  # Symbol from canteen/sessions.py
```

Exception: For third-party code where documentation explicitly says to import individual symbols.
Rationale: Avoids circular imports. See here.
Put all imports at the top of the page with three sections, each separated by a blank line, in this order:

1. System imports
2. Third-party imports
3. Local source tree imports

Rationale: Makes it clear where each module is coming from.

##### Documentation

Follow [PEP 257's](https://peps.python.org/pep-0257/) docstring guidelines. reStructured Text and Sphinx can help to
enforce these standards.
Use one-line docstrings for obvious functions:

```python
"""Return the pathname of ``foo``."""
```

Multiline docstrings should include:

- Summary line
- Use case, if appropriate
- Args
- Return type and semantics, unless ```None``` is returned

```python
"""Train a model to classify Foos and Bars.

Usage::

    >>> import klassify
    >>> data = [("green", "foo"), ("orange", "bar")]
    >>> classifier = klassify.train(data)

:param train_data: A list of tuples of the form ``(color, label)``.
:rtype: A :class:`Classifier <Classifier>`
"""
```

Notes:

- Use action words ("Return") rather than descriptions ("Returns").
- Document __init__ methods in the docstring for the class.

```python
class Person(object):
    """A simple representation of a human being.

    :param name: A string, the person's name.
    :param age: An int, the person's age.
    """
    def __init__(self, name, age):
        self.name = name
        self.age = age
```

##### Line Lengths

Don't stress over it. 80-100 characters is fine.

Use parentheses for line continuations:

```python
wiki = (
    "The Colt Python is a .357 Magnum caliber revolver formerly manufactured "
    "by Colt's Manufacturing Company of Hartford, Connecticut. It is sometimes "
    'referred to as a "Combat Magnum". It was first introduced in 1955, the '
    "same year as Smith & Wesson's M29 .44 Magnum."
)
```

---

## Anti-Patterns

These are not limited to a language, or even to the software itself. They may manifest during the planning process,
implementation, or even years down the road due to iterative changes. If you see an anti-pattern, make an effort to fix
it. Now is better than never.

### Main Causes of Anti-Patterns

1. **Haste** – When project deadlines are tight, budgets are cut, team sizes are reduced, in these pressure situations
   we tend to ignore good practices.
2. **Apathy** – Developers who really don’t care about the problem or the solution will almost always produce a poor
   design.
3. **Ignorance** – When a developer either lacks knowledge of the domain or of the technology being used, that ignorance
   will result in anti-patterns being introduced.

### God Objects (aka "The Blob")

A class or package in your system that does far too much. The catch-all for any code where the developer is not sure
where to put it, or is just too lazy to create a new class or package. Also what can happen is developers will put code
somewhere else simply because it is smaller and easier to work with, even if it is not the correct location. This
anti-pattern is usually caused by a lack of proper object-oriented design skills on a team.

- How to avoid it:
  - Code reviews or pair programming.
  - If you can’t describe the scope of a class's functionality with a single sentence, then it has too much
  responsibility.

### Lava Flows

Lava Flows occur when code has been around for so long that people are afraid to modify it. This often happens because
the original authors/maintainers have left and there is no one who fully groks that area of the code. Some warning signs
are big chunks of commented code with notes like "FIXME: This doesn't appear to be used, commenting it out".

### Copy on Write Code (Parallel Protectionism)

Similar to Lava Flows, this tends to occur when developers aren't sure of the consequences of modifying areas of a
codebase. Instead of trusting regression tests or digging into the scope, they simply copy the code so that their
changes don't interfere with existing functionality. That is, the original code is used as is until it needs to be
modified, then it gets copied for modification (copy-on-write).

### Method Container Objects

This is most common when coming to Python from Java, where everything must be an object. If a class contains many
class methods and not much else, you may have a Method Container object and you can probably make all the methods
into functions within a module.

### Tramp Data

Named for a parameter that tramps from function to function in the code base, this is less of an anti-pattern and
more of a code smell that indicates poor design decisions. It occurs when a parameter is passed several levels deep
into the stack without being used by the intermediate functions. Often it is used as a (better) alternative to a global
variable but it indicates that there is a poor division of responsibility in the codebase.

---

## Testing

Strive for 100% code coverage, but don't get obsessed over the coverage score.
Useful python testing libraries are `unittest` and `pytest`. The `pytest` `testrunner` can run
`unittest` tests but not vice versa.

### General Testing Guidelines

- Use long, descriptive names. This often obviates the need for docstrings in test methods.
- Tests should be isolated. Don't interact with a real database or network. Use a separate test database that gets torn
  down (Docker is great for this) or use mock objects.
- Prefer factories to fixtures.
- Never let incomplete tests pass, else you run the risk of forgetting about them. Instead, add a placeholder like
  assert False, "TODO: finish me". Pytest offers an xfail decorator to mark tests that are expected to fail (but they
  still show up separately from passed tests).

### Unit Tests

- Focus on one tiny bit of functionality. Mock out everything else.
- Should be fast, but a slow test is better than no test.
- It often makes sense to have one testcase class for a single class or model.

```python
import unittest
import factories

class PersonTest(unittest.TestCase):
    def setUp(self):
        self.person = factories.PersonFactory()

    def test_has_age_in_dog_years(self):
        self.assertEqual(self.person.dog_years, self.person.age / 7)
```

### Functional / Integration Tests

Functional tests are higher level tests that are closer to how an end-user would interact with your application. They
are typically used for web and GUI applications.

- Write tests as scenarios. Testcase and test method names should read like a scenario description.
- Use comments to write out stories, before writing the test code.

```python
import unittest

class TestAUser(unittest.TestCase):

    def test_can_write_a_blog_post(self):
        # Goes to the her dashboard
        ...
        # Clicks "New Post"
        ...
        # Fills out the post form
        ...
        # Clicks "Submit"
        ...
        # Can see the new post
        ...
```

Notice how the testcase and test method read together like "Test A User can write a blog post".

---

## Exception Handling

For more complete documentation,
see: [https://docs.python.org/3/tutorial/errors.html](https://docs.python.org/3/tutorial/errors.html)

### When to Handle an Exception – Common Examples

The basic rule of thumb is to handle exceptions that you expect to arise during normal runtime but allow the program to
continue on a useful path.
Different parts of a codebase may have different context about how to handle errors. Low level functions should rarely
do much error handling to keep them as general as possible. Higher level abstractions are more likely to live in a
context where assumptions can be made about which exceptions are recoverable.

#### Parsing Messy Data

Without guarantees about the integrity of a data, we frequently need a program to get as much out of the data as
possible. In this example, we log exceptions raised by parse_csv_record and simply continue on.

**Resilient Data Processing:**

```python
with open(file) as csv:
    new_line = csv.readline()
    parsed_data = []
    n_parsing_failures = 0
    n_parsed_records = 0
    while new_line:
        try:
            parsed_data.append(parse_csv_record(line))
            n_parsed_records += 1
        except CsvRecordParsingError as csv_err:
            n_parsing_failures += 1
            logger.exception(csv_err)  # Logs exception, including stack trace
        new_line = csv.readline()
if n_parsing_failures:
    logger.info(f"{n_parsing_failures} parsing failures encountered in {file}. {n_parsed_records} successfully parsed.")
```

#### Connection Retries

No Web API is 100% reliable. Connection errors do occur. Whenever you are making a request to a Web API, it's always a
good idea to check for timeout errors and other common connection problems (4XX responses) and retry the request. For
example

**Connection Retry:**

```python
def resilient_push_data(payload: dict, n_tries: int = 3):
    while n_tries:
        n_tries -= 1
        try:
            response = push_data_to_server(data={"payload": True})
            return response
        except TimeoutError as timeout_err:
            logger.error(f"Failed to push data. Connection timed out. {n_tries} remaining.")
        except ConnectionError is conn_err:
            logger.error("Failed to push data. Server refused the request.")
    raise DataPushError(f"Failed to push data.")  # Handle this higher up in the stack, if necessary
```

#### Multiprocessing

When spawning process pools for handling parallel workloads, you typically want to know what happens in those
processes, including exceptions. If an exception is raised in a subprocess, you lose control over that process and
cannot notify the parent process what occurred.

```python
from multiprocessing import Pipe, Process

def child_process_function(data, pipe):
    try:
        result = process_data(data)
    except Exception as unexpected_error:
        result = unexpected_error
        process_logger.exception(unexpected_error)
    pipe.send({"pid": os.getpid(), "result": result})

def process_data(data):
    processes = []
    receiver_pipes = []
    for datum in data:
        receiver, sender = Pipe()
        process = Process(target=child_process_function, args=(datum, sender))
        processes.append(process)
        receiver_pipes.append(receiver)
        process.start()
    for p in processes:
        p.join()
    for receiver in receiver_pipes:
        try:
            msg = receiver.recv()
            logger.info(msg)
        except Exception as comms_error:  # This should really never happen unless your child process blocks forever
        or dies before it can communicate for some reason.
            print("Failed to communicate with a child process. You should probably check the logs for that process.")
```

### Custom Exceptions

[https://docs.python.org/3/tutorial/errors.html#user-defined-exceptions](https://docs.python.org/3/tutorial/errors.html#user-defined-exceptions)

In general, ValueError should be the default built in exception. Most other built-in exception types have specific
meanings within the Python standard library. If you wish to impart more specific information, define custom exceptions
as follows:

```python
import Exception

class CsvRecordParsingException(Exception):
    """Exception raised when a single CSV record fails to parse"""
    pass
```

### Exception Handling as Control Flow

In most cases, using exception handling as control flow is an anti-pattern and can be rewritten more clearly by
checking assumptions before your `try`  clause. However, there are cases where it's useful such as when an external
state
cannot be determined without calling functions that raise exceptions.

```python
x = MaybeContainsData()

def gross_costly_control_flow(x):
    try:
        parse_data(x)  # Costly
    except NoDataError as no_data:
        x.add_data()  # Costly
        gross_costly_control_flow(x)
    except DataMalformed as bad_data:
        x.fix_data()  # Costly
        gross_costly_control_flow(x)
```

There are a few code smells above, but it's a contrived example. If your code looks like this, seriously consider
refactoring it:

```python
x = MaybeContains_data()

if not x.contains_data():  # Cheap
    x.add_data()
if not x.data_valid():  # Cheap
    x.fix_data()

parse_data(x)  # Any exceptions raised now should be unexpected since we dealt with the cases we understand
```

### Chaining Exceptions

Sometimes it's useful to raise one exception from another in order to add context. In this case, the stack trace will
indicate that the `TotalFailureError` was raised as a result of the `UnderstoodButNotRecoverableError`

```python
try:
    do_a_thing()
except UnderstoodButNotRecoverableError as really_bad_error:
    raise TotalFailureError("A bad thing happened. Unfortunately, we can't do anything useful except tell you about
    it.") from really_bad_error
```

### The Finally Clause

At the end of all your exception handling, regardless of whether things were handled, you can still do stuff. This
should only be used for cleanup and should not be used for continuing a lengthy program execution.

```python
try:
    do_a_thing()
except TimeoutError as timeout_error:
    handle_timeout_condition()
finally:
    clean_up_db_connections()
    flush_logs_to_log_server()
```

---

## Useful Links

[PEP 8 (Python's Official Style Guide)](https://peps.python.org/pep-0008/)

---

## Acronyms

- **PEP** = Python Enhancement Proposals

---

## Credit

Content taken from a Confluence guide written by Gavin Medley