Optional properties in JSON

I had to work with JSON data where many properties were optional. In other words, a dictionary at any level could have missing keys, and the corresponding value should be treated as missing. For example, for input data {"k1": {"k2": {"k3": 1}}}, a nested look-up of "k1", "m2", "n3" should result in a missing value.

The builtin dictionary get method would work if keys were only missing at the lowest level:

We couldn’t use if statement or try/except since most of the lookups occur inside expressions. The straightforward approach of checking for the presence of the key at each level is quite verbose:

Since values are known to be dictionaries, and empty dictionaries won’t have the key we’re looking up, we could improve it by using logical and short-circuit:

This is fine, if it only needs to be written in a couple of places. But if it’s a common theme for the data, this code is still unattractive and error-prone as it’s repeated over and over.

Replacing the dictionaries with defaultdict is no good: it would not work if nesting depth is not fixed and would add expensive noise to the object in the form of empty dictionaries with every lookup attempt.

We could flatten the JSON structure:

However, this means we can no longer pass around intermediate dictionaries and lists. If we only need the leaves and don’t mind the overhead of flattening each JSON object, it’s an acceptable solution.

A simpler solution that doesn’t torture the JSON structure and adds no overhead is to follow the example of unittest.mock.MagicMock that silently accepts every request without actually doing any real work:

For our immediate use case, we only need get and __getitem__ (for list index lookup), but I also added __getattr__ method for use below with member access. Also, we really only need a single instance of this class, but it’s safer to define __eq__: this way, we don’t have to worry about accidentally creating and comparing multiple instances.

If we want even more syntactic sugar, so we can write json_obj["k1"]["m2"]["n3"], we could convert all the dictionaries inside the json object to instances of a custom dict subclass. At this point, we’re really creating a mini-DSL, so we might as well allow attribute lookup in dictionaries: json_obj.k1.m2.n3:

This is a lot less intrusive than flattening the json structure, but it still adds a modest runtime overhead: both at the initial conversion and on subsequent lookups (only if the key is missing).

It wouldn’t be hard to return NA on non-existent indexes in lists, but if that’s the correct semantics in our domain, we probably should be using dictionaries instead of lists in the first place.

Update: a couple good solutions were suggested in response to this post on Reddit. One is:

I think it works best if there are no lists; with lists, the switching between {} and [] might become somewhat easy to mess up.

The other is to define a function that can be used like this: json_get(json_obj, "k1", "k2", "n3"). It’s good if every property is optional; it won’t be able to express that some properties are required, such as in json_obj.get("k1", NA)["m2"].get("m3", NA).