Contents of Part II
In this part, we will implement graph data structure using classes and interfaces, and discuss when it’s worth overruling type hints.
We need to make a few design choices.
First, should we require the user to provide unique and hashable node objects, or should we accept any values and wrap them into
class Node ourselves? Let’s do the safer thing and wrap values into Node instances: this protects us in case the values could become non-unique or non-hashable in the future.1
Second, should we store adjacency sets inside or outside
Node? I have a slight personal preference for nodes knowing their adjacency information, since it occasionally allows us to use one function argument (a node) rather than two (a node and a graph).2
Third, should we separate Graph and Node interfaces from their implementation? For example, we can have abstract base classes
IMutableGraph from which all concrete implementations would inherit. Apart from making the design cleaner, it should help generate more precise type hints (e.g., a traversal function may work on any object that implements
IGraph, since it doesn’t need to mutate anything). This is a good idea, but let’s do without interfaces for now; we’ll add them later.
Our first implementation looks like this:
A couple of minor type checking points:
- Using None as default parameter value automatically adds
Optionalto the stated argument type.
- We provided type hints for instance attributes in the class definition body of
Graph; this is a good practice, but it is also acceptable to annotate their types inside
__init__(or not at all if mypy can infer them)
Type System Limitations
Now, just like in Part I, let’s again add reverse adjacency information to
Graph in order to make iteration through incoming edges faster. The new class
ReversibleGraph should share a big part of its implementation with
Graph, so it would make sense to derive it from
Graph. We also want to reuse all of the global functions since they should work without change for both implementations.
However, if we follow this plan, our code will not type check. There are two reasons for that.
Liskov Substitution Principle
Python type hint system follows the so-called Liskov substitution principle, or LSP:
Y, it should *always* be safe to substitute an instance of
Xin place of an instance of
Therefore, when mypy sees that
ReversibleGraph inherits from
Graph, it analyzes the code to see whether it is indeed safe to use a
ReversibleGraph in place of
Graph. It turns out that it’s not safe, and so the type check fails.
To see why it’s not always safe, consider this function:
def f(g: Graph[int], node: Node[int]) -> None:
if node.value > 0:
If we try to call it with
g of type
node of type
Node, it would fail in runtime when the
ReversibleGraph.remove_node tries to access the non-existent
To summarize, mypy thinks inheritance is serious business, and that it represents a relationship with some strict guarantees (LSP). We, on the other hand, wanted to use inheritance just to help with code reuse. A feature is in the works that would let the programmer turn LSP on or off as desired; but for now, we have to find another solution.
We could satisfy mypy by using composition instead of inheritance (that is, by wrapping
Graph instance inside
ReversibleGraph instance). But let’s not do that because this will make the code less logical and more verbose. Instead, let’s simply mark the lines about which mypy complains with
type: ignore. Those directives are very precise: they suppress error messages for that line, but the type checker still parses those lines, and uses information it learned from them to type check the rest of the program.
Types Outside the Type System
Since we were careful enough to make
ReversibleGraph support the same API, a function like
read_graph should work without change for both. However, its type annotation is not obvious.
The first argument can be any class that supports the mutable graph API, that is
IGraphMutable. The return type should be based on the value of the first argument; for example, if we call
read_graph(ReversibleGraph, ...), the type checker should conclude that the return type is
ReversibleGraph. (We can’t set return type to
IGraph since that would prevent us from using the reverse adjacency included in
If we write:
def read_graph(cls: Type[IGraph], s: Iterable[str], node_type: Callable[[str], T]) -> ???:
we can’t specify the return type.
Perhaps we can use a type variable to achieve the desired result:
# G is a type variable that must be a subtype of GraphInterface
T = TypeVar('T')
G = TypeVar('G', bound=IGraph[T])
def read_graph(Type[G[T]], ...) -> G[T]: ...
This would have worked if
IGraph wasn’t generic; unfortunately, type variables cannot be bound by a generic type5. We also cannot provide type arguments to type variables. The above code would not even pass mypy’s syntax check.
The problem is simple: the types we’ve been trying to define do not exist in mypy’s type system. Such types require more sophisticated language implementation and more learning effort to use, and they are usually found only in the more advanced functional languages (such as Haskell and Scala).
In my opinion, the best solution here is to make class
Node non-generic by changing the type of its
value attribute to
Any6. This means we no longer need a generic class for
Node, and so we can use type variable to provide type hints for graph functions. Now is the time to add the interface classes as we discussed earlier; it’s easier to do so without dealing with generics.
Our final implementation even reuses the test functions:
Note that we didn’t bother trying to prevent assignment to attributes in the immutable interfaces; for example, even though
IGraph is an (immutable)
AbstractSet, users can assign a new container to
@property could have insured against such accidental assignment, it’s probably not worth the runtime cost.7
Our API for a directed graph is so limited that it fits an undirected graph as is; we can always separate out the interfaces later, when the need arises.
Conceptually, an undirected graph is equivalent to a directed graph that satisfies two constraints:
- each edge has a corresponding edge in the opposite direction
- there are no loops (i.e., edges with the same tail and head)
Therefore, one way to implement an undirected is to reuse our (directed)
Graph implementation. We just need to modify the
remove_edge methods to ensure those constraints are never violated:
It might seem wasteful to store each edge twice. However, we can’t really improve on that. Let’s say we define order on nodes, and store each edge only once, in the adjacency set of the lower-valued node. In this case, iterating through neighbors would be ridiculously slow since neighbors may end up stored on any node.
Class-based representation keeps the API clean and stable.
However, as we make our approach more general, type hints become more complex, and at some point we run into the limitations of python type system.
By its nature, static type checking provides insurance against certain bugs at the cost of constraints on the developer. Very powerful type systems (such as in Scala, Haskell, Idris) are extremely flexible; but they are also slightly harder to learn, debug, and implement. Simpler type systems (e.g., in Java, C#, C++, and python) are less flexible, and will occasionally get in the way of the developers.
I don’t recommend non-trivial refactoring just to satisfy the type checker, unless it also improves code quality. Often, it’s better to just overrule mypy using:
Anywhich is compatible with anything
# type: ignorecomment which suppresses type checker errors in the current line
- casts, which overrides the type that mypy inferred for a variable
Part I – Representing graphs as dictionaries or sets
Part II – Representing graphs as classes
Part III – Implementing graph traversal
Part IV – Trees and binary trees
- Non-hashable will be easily caught by runtime. Non-unique would result in a hard-to-detect bug: the graph will treat multiple nodes with the same value as if they are the same, which means the bug might be triggered only by certain inputs.
- If desired, we could even store a reference to the graph inside each node, to make nodes completely self-sufficient.
- Another way to phrase it is that inheritance implies subtyping.
- You might say that you would never call this function with such mismatched arguments, but it doesn’t matter: LSP requires that this substitution is *always* safe, not “safe as long as you’re careful”.
- If we left out
TypeVar, it would have meant
GraphInterface[Any], thus giving up some of the type safety.
- We give up type safety in the least complicated part of the graph implementation.
- At present, there’s no way to make mypy enforce constant vs non-constant distinction in the nested structure.