At Spring, we maintain a large Python monorepo with complete Mypy coverage configured under Mypy’s strictest available settings. In short, that means every function signature is annotated and implicit Any
conversions are disallowed.
Lines-of-code is admittedly a poor measure, but as a rough approximation: our monorepo contains over 300,000 lines of Python, with about half of that constituting our core data platform, and the other half, end-user code written by data scientists and machine learning researchers. My unsubstantiated guess is that this is one of the most comprehensively-typed Python codebases out there for its size.
We first adopted Mypy in July 2019, achieved complete type coverage about a year later, and have been happy Mypy users ever since.
A few weeks back, I had a brief conversation with Leo Boytsov and Erik Bernhardsson on Twitter about Python typing — then I saw Will McGugan praising types too. Since Mypy is a key part of how we ship and iterate on Python code at Spring, I wanted to write about our experience using it at-scale over the past few years.
TL;DR: while Mypy adoption comes at a cost (via upfront and ongoing investment, a learning curve, etc.), I’ve found it invaluable for maintaining a large Python codebase. Mypy may not be for everyone, but it is for me.
What is Mypy?
(If you’re already familiar with Mypy, skip ahead.)
Mypy is a static type checker for Python. If you’re written any Python 3, you may have noticed that Python supports type annotations, like this:
def greeting(name: str) -> str:
return 'Hello ' + name
Python defined this type annotation syntax in 2014 via PEP 484. While these annotations are part of the language, Python (and its associated first-party tooling) doesn’t actually use them to enforce type safety.
Instead, type checking is implemented via third-party tools. Mypy is one such tool. Facebook’s Pyre is another — though in my experience, Mypy is more popular (Mypy has more than twice as many stars on GitHub, it’s the default type-checker used in Pants, etc.). IntelliJ has their own type checker, which powers the type inference you see in PyCharm. In each case, these tools describe themselves as “PEP 484-compliant”, in that they work with the type annotations defined by the language itself.
In other words: Python saw its responsibility as defining the syntax and semantics of its type annotations (though PEP 484 itself was heavily inspired by existing versions of Mypy), but intentionally left it to third-party tools to enforce those semantics.
Note that when you use a tool like Mypy, you’re running it out-of-band from Python itself — i.e., you run mypy path/to/file.py
, and Mypy spits out any inferred violations. (Python exposes but does not leverage those type annotations at runtime.)
(As an aside: in writing this post, I learned that Mypy started out with very different goals, comparable to projects like PyPy. PEP 484 didn’t exist (it was inspired by Mypy!), so Mypy defined its own syntax that diverged from Python and implemented its own runtime (i.e., Mypy code was executed through Mypy). One of Mypy’s goals at the time was to improve performance by leveraging static types, encouraging immutability, and more — and they explicitly eschewed compatibility with CPython. Mypy switched to a Python-compatible syntax in 2013, and PEP 484 came out in 2015. (I believe the “use static typing to speed up Python” concept became Mypyc, which is still an active project, and is used to compile Mypy itself.))
Integrating Mypy at Spring
We introduced Mypy to our codebase in July 2019 (#1724). When the proposal first dropped, we found ourselves talking through two main sources of hesitation internally:
- Although Mypy was first introduced at PyCon Finland in 2012, and they began releasing PEP 484-compatible versions in early 2015, it still felt like a fairly new tool — at least to us. No one on the team had used it before, despite having worked in some fairly large Python codebases (at Khan Academy and elsewhere). I’d worked in one codebase that showed signs of having enabled Mypy at some point in the past, but had since given up on it.
- Like other incremental type checkers (e.g., Flow), the value of Mypy compounds over time, as more and more of your codebase is annotated. While Mypy can and will catch bugs with even minimal annotations, the more you’re able to invest in annotating your codebase, the more valuable it becomes.
Despite these reservations, we decided to give Mypy a shot. Internally, our engineering culture tended towards a strong preference for static typing (apart from Python, we write a lot of Rust and TypeScript). So, we were primed to want to use Mypy. Beyond that, I was personally willing to champion the tool and invest in extending our code coverage.
We started by typing a handful of files. A year later, we finished typing our existing code (#2622), and were able to upgrade to the strictest Mypy settings available (most importantly, disallow_untyped_defs
, which requires all functions signatures to be annotated), which we’ve maintained ever since. (The team at Wolt has a nice post on what they call the "Professional-grade mypy configuration”, which is, coincidentally, what we use.)
Reflections
Overall: I have a very positive opinion of Mypy. As a developer of core infrastructure (i.e., the common libraries used across services and across the team), I find it invaluable.
I’ll use it again for any and all future Python projects.
Benefits
Zulip wrote a nice blog post back in 2016 on the benefits of using Mypy (that post is also linked from the Mypy docs).
I don’t want to rehash all the benefits of static typing (it is good), but I will briefly double down on a few that they called out in their post:
- Improved readability: with type annotations, code tends towards self-documenting (and the accuracy of that documentation can be statically enforced, unlike in a docstring).
- Catching bugs: it’s true! Mypy does catch bugs. All the time.
- Refactoring with confidence: this has been the single most impactful benefit of using Mypy. With extensive Mypy coverage, I can confidently ship changes that span hundreds or even thousands of files. This is, of course, related to (2) — most of the bugs we catch with Mypy are bugs that we iron out while refactoring.
The value of (3) is hard to overstate. It’s no exaggeration to say that there are classes of changes that I’ve shipped ten, or even a hundred times faster with the help of Mypy.
Though it’s entirely subjective, in writing this post I realized that I trust Mypy. Not to the same degree as, say, the OCaml compiler, but still — it’s completely changed my relationship to maintaining Python code, and I can’t imagine going back to a world without annotations.
Pain points
The Zulip post similarly highlights the pain points that they experienced in their Mypy migration (interactions with linters, import cycles).
Candidly, the pain points that I’ve felt with Mypy differed from those outlined in the Zulip article. I’d bucket them into three categories:
- Lack of type annotations for external libraries
- The Mypy learning curve
- Fighting the type system
Let’s review them one-at-a-time:
1. Lack of type annotations for external libraries
The first and most important pain point is that most of the third-party Python libraries we pull in are either untyped or otherwise PEP 561-non-compliant. In practice, that means any reference to an external library gets interpreted as Any
, which weakens your type coverage significantly.
Whenever we add a third-party library to our environment, we include an entry in our mypy.ini
, which tells Mypy to ignore the absence of type annotations for those modules (the rare exception being libraries that are typed or provide type stubs):
[mypy-altair.*]
ignore_missing_imports = True
[mypy-apache_beam.*]
ignore_missing_imports = True
[mypy-bokeh.*]
ignore_missing_imports = True
...
Because of this escape hatch, even coarse object annotations don’t work as you might expect. For example, Mypy allows this:
import pandas as pd
def return_data_frame() -> pd.DataFrame:
"""Mypy interprets pd.DataFrame as Any, so returning a str is fine!"""
return "Hello, world!"
Beyond third-party libraries, we’ve also had some bad luck with the Python standard library itself. For example, functools.lru_cache
does have a type annotation in typeshed, but for complicated reasons, it doesn’t preserve the underlying function signature, so any function decorated with @functools.lru_cache
sheds all of its type annotations.
For example, Mypy allows this:
import functools
@functools.lru_cache
def add_one(x: float) -> float:
return x + 1
add_one("Hello, world!")
The situation with third-party libraries is improving. NumPy, for example, started exposing types in version 1.20. Pandas, too, has a set of public typing stubs, but they’re marked as incomplete. (Adding stubs to those libraries is very much non-trivial, it’s a huge effort!) As another data point, I saw Wolt’s Python project template on Twitter recently, and that too includes types-by-default.
So, types are becoming less rare. But in my experience, I’m surprised when we add a dependency that does have type annotations. It still feels like the exception, not the rule.
2. The Mypy learning curve
This is entirely anecdotal, but most folks that join Spring (and have written Python in the past) haven’t used Mypy before, though they’re typically aware of and familiar with Python’s type annotation syntax.
Similarly, in candidate interviews, folks are often unfamiliar with the typing
module beyond the annotation syntax itself. I typically show candidates a snippet that uses typing.Protocol
as part of a broader technical discussion, and I can’t recall any candidates having seen that specific construct before — which, of course, is totally fine! But it says something about the popularity of typing in the ecosystem right now.
So, when we onboard team members, Mypy tends to be a new tool, something they have to learn. While the basics of the type annotation syntax are straightforward, we do regularly hear questions like, “Why doesn’t Mypy like this?”, “Why is Mypy failing here?”, etc.
For example, this is something that typically has to be explained:
if condition:
value: str = "Hello, world"
else:
# Not ok -- we declared `value` as `str`, and this is `None`!
value = None
...
if condition:
value: str = "Hello, world"
else:
# Not ok -- we already declared the type of `value`.
value: Optional[str] = None
...
# This is ok!
if condition:
value: Optional[str] = "Hello, world"
else:
value = None
Or, another source of confusion:
from typing import Literal
def my_func(value: Literal['a', 'b']) -> None:
...
for value in ('a', 'b'):
# Not ok -- `value` is `str`, not `Literal['a', 'b']`.
my_func(value)
When explained, the “why” of these examples makes sense, but I can’t deny that people on the team lose time getting familiar with Mypy. And anecdotally, we’ve heard from folks on the team that PyCharm’s type-assistance (even with ample static typing) just feels less helpful and complete than (e.g.) what you get with TypeScript from the same IDE. Unfortunately, this is just the cost of doing business with Mypy.
Beyond the learning curve, there’s also the ongoing overhead of annotating functions and variables. In the past, I’ve suggested relaxing our Mypy rules for certain “kinds” of code (like exploratory data analysis) — and yet, the sentiment on the team is that the annotations are worthwhile, which has been cool to see.
3. Fighting the type system
There are a couple things I try to avoid when writing code for Mypy, lest I find myself fighting the type system: taking code that I know works, and coercing Mypy to accept it.
First, @overload
, from the typing
module: Very powerful, but hard to get right. Of course, if I need to overload a method, I’ll use it — but, like I said, I prefer to avoid it if I can. The basics are straightforward:
@overload
def clean(s: str) -> str:
...
@overload
def clean(s: None) -> None:
...
def clean(s: Optional[str]) -> Optional[str]:
if s:
return s.strip().replace("\u00a0", " ")
else:
return None
But often, we want to do things like “return a different type based on a boolean flag, with a default”, which requires this incantation:
@overload
def lookup(
paths: Iterable[str], *, strict: Literal[False]
) -> Mapping[str, Optional[str]]:
...
@overload
def lookup(
paths: Iterable[str], *, strict: Literal[True]
) -> Mapping[str, str]:
...
@overload
def lookup(
paths: Iterable[str]
) -> Mapping[str, Optional[str]]:
...
def lookup(
paths: Iterable[str], *, strict: Literal[True, False] = False
) -> Any:
pass
Even this is a hack — you can’t pass a bool
into find_many_latest
, you have to pass a literal True
or False
.
Similarly, I’ve run into issues in the past with using @typing.overload
vs. @overload
, using @overload
with class methods, and so on.
Second, TypedDict
, again from the typing
module: can be useful, but tends to produce awkward code. For example, you can’t destructure a TypedDict
— it always has to be constructed from literal keys — so this doesn’t work:
from typing import TypedDict
class Point(TypedDict):
x: float
y: float
a: Point = {"x": 1, "y": 2}
# error: Expected TypedDict key to be string literal
b: Point = {**a, "y": 3}
In practice, it’s just very hard to do Pythonic things with TypedDict
objects. I end up tending towards either dataclass
or typing.NamedTuple
objects instead.
Third, decorators. The Mypy docs have a standard suggestion for signature-preserving decorators and decorator factories. And while it’s quite advanced, it does work:
F = TypeVar("F", bound=Callable[..., Any])
def decorator(func: F) -> F:
def wrapper(*args: Any, **kwargs: Any):
return func(*args, **kwargs)
return cast(F, wrapper)
@decorator
def f(a: int) -> str:
return str(a)
However, I find that doing anything fancy with decorators (and, in particular, anything non-signature-preserving) leads to code that’s hard to type or rife with casts. Which might be a good thing! Mypy has certainly changed how I write Python: clever code is harder to type properly, and so I try to avoid writing clever code. (The other fear with decorators is the @functools.lru_cache
problem I raised above: since a decorator ultimately defines an entirely new function, you’re opening the door to significant and surprising errors, if you happen to annotate your code incorrectly.)
I feel similarly about circular imports — the need to import types, to use as annotations, can create circular imports that you’d otherwise avoid (which the Zulip team highlighted as a pain point too). And while circular imports are a pain point with Mypy, they’re often indicative of a design flaw in the system or code itself, which Mypy forces me to reckon with.
In my experience, though, even experienced Mypy users will find themselves making a correction or two to otherwise-working code prior to passing type-checks.
(By the way: Python 3.10 contains a great improvement to the decorator situation via ParamSpec
.)
Tips & tricks
Finally, a couple tricks I’ve found useful in working with Mypy.
reveal_type
Adding reveal_type
anywhere will cause Mypy to display the inferred type of a variable when type-checking the file. This is very, very, very useful.
The most basic example would be:
# No need to import anything. Just call `reveal_type`.
# Your editor will flag it as an undefined reference -- just ignore that.
x = 1
reveal_type(x) # Revealed type is "builtins.int"
reveal_type
is especially useful when you start dealing with generics, as it can help you understand how generics are being “filled in”, whether your types are getting narrowed, and so on.
Mypy as a library
Mypy can be used as a runtime library!
We have an internal workflow orchestration library that looks a bit like Flyte or Prefect. The details aren’t critical, but notably, it is fully typed — so we can statically enforce the type-safety of runnable tasks as they’re chained together.
Getting that typing right was very challenging. To ensure that it remains intact, and isn’t poisoned by a stray Any
, we actually wrote unit tests that call Mypy on a set of files, and assert that the errors raised by Mypy match an expected set of violations:
def test_check_function(self) -> None:
result = api.run(
[
os.path.join(
os.path.dirname(__file__),
"type_check_examples/function.py",
),
"--no-incremental",
],
)
actual = result[0].splitlines()
expected = [
# fmt: off
'type_check_examples/function.py:14: error: Incompatible return value type (got "str", expected "int")', # noqa: E501
'type_check_examples/function.py:19: error: Missing positional argument "x" in call to "__call__" of "FunctionPipeline"', # noqa: E501
'type_check_examples/function.py:22: error: Argument "x" to "__call__" of "FunctionPipeline" has incompatible type "str"; expected "int"', # noqa: E501
'type_check_examples/function.py:25: note: Revealed type is "builtins.int"', # noqa: E501
'type_check_examples/function.py:28: note: Revealed type is "builtins.int"', # noqa: E501
'type_check_examples/function.py:34: error: Unexpected keyword argument "notify_on" for "options" of "Expression"', # noqa: E501
'pipeline.py:307: note: "options" of "Expression" defined here', # noqa: E501
"Found 4 errors in 1 file (checked 1 source file)",
# fmt: on
]
self.assertEqual(actual, expected)
GitHub Issues
When I find myself wondering how to resolve a certain typing issue, I inevitably find myself in Mypy’s GitHub Issues (more-so than, e.g., Stack Overflow). It’s probably the best source of typing knowledge for Mypy-specific workarounds and How-To’s. You’ll typically find tips and suggestions from the core team (including Guido) on notable issues.
The main downside is that each comment in a GitHub Issue represents commentary from a specific moment in time — what was a limitation in 2018 may’ve been resolved, what was a workaround last year may have a new best practice. So be sure to read through Issue history with that in mind.
typing-extensions
While the typing
module tends to see a bunch of improvement with each Python release, some features are backported via the typing-extensions
module.
For example, while we’re only on Python 3.8, we make use of ParamSpec
for the aforementioned workflow orchestration library via typing-extensions
. (Unfortunately, PyCharm doesn’t seem to support the ParamSpec
syntax via typing-extensions
and marks it as an error, but oh well.) Features that depend on syntactical changes to the language itself are, of course, not available via typing-extensions
.
NewType
There are a bunch of useful helpers in the typing
module, but NewType
is one of my favorites.
NewType
lets you create distinct types from existing types. As an example, you could use NewType
to demarcate fully-qualified Google Cloud Storage URLs, in lieu of the str
, like:
from typing import NewType
GCSUrl = NewType("GCSUrl", str)
def download_blob(url: GCSUrl) -> None:
...
# Incompatible type "str"; expected "GCSUrl"
download_blob("gs://my_bucket/foo/bar/baz.jpg")
# Ok!
download_blob(GCSUrl("gs://my_bucket/foo/bar/baz.jpg"))
By indicating the intent to callers of download_blob
, we’ve made the function self-documenting.
I find NewType
to be especially useful for converting primitives like str
and int
, into semantically meaningful types.
Performance
Mypy’s performance has never been a major problem for us. Mypy persists type-checking results to a cache, which speeds up repeated invocations (per the docs: “Mypy performs type checking incrementally, reusing results from previous runs to speed up successive runs”).
Running mypy
on our largest service takes ~50-60 seconds with a cold cache, and ~1-2 seconds with a warm cache.
There are at least two ways to speed up Mypy, both of which leverage this concept (we don’t use either of them internally):
- The Mypy daemon, which runs Mypy persistently in the background, allowing it to retain cache state in memory. Although Mypy caches results to disk between runs, the daemon is indeed much faster. (We used the Mypy daemon for a while by default, but I disabled it after we experienced some flakiness with the shared state — I can’t remember the specifics.)
- Shared remote caches. As mentioned above, Mypy caches type-checking results to disk between runs — but if you’re running Mypy from a fresh machine or a fresh container (as on CI), you won’t see any benefit from that caching. The solution is to seed your disk with a recent cache state (i.e., warm the cache). The Mypy docs outline this process, but it’s fairly involved, and the specifics will depend on your own setup. We’ll probably enable this eventually for our own CI systems — we just haven’t gotten around it.
Conclusion
Mypy has had a significant impact on our ability to ship code with confidence. Though I’ve tried to be honest in assessing the costs associated with adopting it, I’d do it all over again.
Beyond the value of the tool itself, I find Mypy to be an extremely impressive project, and I’m grateful to the maintainers for all the work they’ve put into it over the years. With each subsequent Mypy and Python release, we’re seeing noticeable improvements to the typing
module, the annotation syntax, and Mypy itself. (Examples include: the new union types syntax (X | Y
), ParamSpec
, and TypeAlias
, all of which were included in Python 3.10.)
If you have any questions, reactions, etc., find me on Twitter.
Published on August 21, 2022.