By Mateusz Ciszczoń and Konrad Hałas
The dacite Python library, which was originally created by FLYR staff engineer Konrad Hałas, is widely used within our app platform to serialize dictionary data into data classes. Read on to learn dacite’s purpose and how we’ve recently managed to significantly improve its performance and memory usage.
Passing plain dictionaries as a data container between your functions or methods isn’t the best practice. Of course, you can always create your custom class instead, but this solution is overkill if you only want to merge a few fields within a single object.
Fortunately, Python has a useful solution to this problem: data classes. With @dataclass decorator, you can easily create a new custom type with a list of given fields in a declarative manner. Data classes support type hints by design.
However, even when using data classes, you have to create their instances somehow. In many such cases, your input is a dictionary – it can be a payload from an HTTP request or raw data from a database. If you want to convert those dictionaries into data classes, dacite is your best friend.
The dacite library was originally created to simplify the creation of type-hinted data transfer objects (DTOs), which can cross the boundaries in the application architecture.
It’s important to mention that dacite is not a data validation library. There are dozens of awesome data validation projects, and it doesn’t make sense to duplicate this functionality within dacite. However, it is easy to combine dacite with one of the data validation libraries due to dacite’s simple interface.
The project was created back in 2018 by one of our staff engineers, Konrad Hałas, as an extraction of some utility functions from a few of his other projects. After some polishing, the library was released in its first version. In the past few years, it has managed to gain some attention, with 1.3k stars and 1.7k usages, according to statistics provided by Github on the repository’s page.
Some of the most interesting usages of dacite include:
- Apache’s Submarine,
- OpenAI’s GPT Discord Bot,
- Facebook’s Cicero research project,
- Facebook’s ThreatExchange tools,
- pypa’s sample project repository,
- Riffusion (a Stable Diffusion for music generation),
- and many more interesting use cases.
While the library was only lightly maintained in the last months, we have now dedicated some time to work on it. We first started by cleaning up the Issues tab, fixing all of the issues marked as bugs, and introducing some enhancements – especially support for Python versions 3.10 and 3.11. We have also worked heavily on the performance of the library.
Using dacite in FLYR Marketing Technology
We love data classes and use them a lot in the FLYR martech codebase. We also process tons of JSON data. Our system deals with multiple third-party APIs, which are mostly JSON-based. We also use MongoDB as a database for denormalized results of our computations, and MongoDB uses JSON (or to be more precise, BSON) under the hood. It’s also not that uncommon for us to store JSON data within MySQL JSON fields.
As you see, we deal with “raw” data almost everywhere in our system, and we use dacite to transform all these Python dictionaries into data classes.
To make it a little bit easier for us, we implemented a Serializable class, which we use as a base class for almost all of our data classes. It uses dacite with specific configuration – it knows how to convert ISO date format used by MongoDB to Python datetime object, etc.
As previously mentioned, we use dacite rather extensively throughout our app – thus any performance issues that the dacite library might have, even small ones, will be multiplied and will negatively impact the performance of the marketing technology platform as a whole. This is why we have set a goal to significantly improve dacite’s performance. But in order to improve something, one has to be able to benchmark it, and ideally also compare the results between changes applied to the code.
Before we made any changes, we set out to find the tool to measure and compare performance. Since our test suite is written using Pytest, Pytest-benchmark seemed like a natural choice. It is very easy to integrate because it offers a simple interface in the form of a benchmark() function. It also generates a handy table in the terminal every time you run your tests, so you can easily see whether that little change you’ve just made affected performance.
Pytest-benchmark also exports performance data into JSON files for future reference. These files can be compared later, using the pytest-benchmark compare command. Finally, these JSON reports can also be used to generate a simple webpage with charts presenting the performance changes of your library from commit to commit. We will discuss the details later on, but all of these perks solidified our choice of benchmarking library.
To accurately measure the performance, we have prepared a completely separate suite of integration tests. We did not measure the performance of some internal functions and methods, instead deciding to test the from_dict method, which is the main entry point of the library. We have prepared data classes and fixtures that make use of all of the various features and options of dacite.
Here is an example of a benchmarking test we created:
Now, when we run pytest -k “test_basic_scenario” our test will run, and we’ll also see this in our terminal:
Here we can see instant feedback regarding how much time our tested function took to finish running. Pytest-benchmark is smart enough to consider the eventual instability of the testing environment, which is why it doesn’t run the tested function just once – it runs it multiple times in order to do some statistics later on.
In our example, the tested function has been called 2,713 times, and the performance of each of these runs has been calculated and taken into account. That’s why you can see the Min and Max time of the execution, as well as the Mean and Median composites, which are usually more useful. The OPS column is also useful when comparing performance between runs.
Now that we had our way of benchmarking the performance of dacite (and a few benchmarking scenarios written), it was time to actually start improving the library’s performance.
Since the codebase is not large, we knew there would probably not be a single pain point that we can fix. Instead, any significant performance gain would be a result of many smaller improvements. As we’ll see later on, we were only partially right.
Let’s start with the smaller improvements first. In dacite, we have a rather complicated function called is_instance, which checks whether given input data (value) conforms to a provided type definition (type_):
As you can see, the logic runs a number of checks, whether the value is this type or that type, and only then, at the last else statement, it tries to parse it as simple numeric types (int, float, complex). When a simple type is provided, our logic has to go through all of these unnecessary conditional statements before reaching the final else statement.
The solution was rather simple. Moving the check for simple types to the top of the is_instance function significantly improved the performance of that function:
Another performance improvement was made inside the main from_dict function. Let’s inspect it quickly:
Pay attention to the copying part:
The logic was originally implemented this way in order to achieve a clean code solution that’s easy to read and work with. Since a few functions have been receiving the field variable as an argument and checking its .type attribute, it was natural to just copy the field and set the type attribute.
But this slowed down the execution of the code – not only was it more processing power-hungry, but it also required higher RAM usage by having to store the copied field in memory.
The solution was simple and straightforward. We refactored some of the functions that relied on the field.type attribute, and instead changed their interface to also accept an additional field_type argument. We then eliminated the copy.copy(field) logic.
Applying these fixes did improve the performance of dacite somewhat, but that just wasn’t good enough. We wanted to find additional ways to speed up the library.
Caching to the rescue
As we were aiming at improving dacite’s performance 5 to 10 times, or more, there was one more solution that could help us achieve our goal: caching.
The underlying logic of the dacite library is rather simple: iterate over all the fields in the provided data class, check their types by performing a multitude of checks, and – depending on the result of these checks – instantiate the value. A lot of these tasks are repetitive and will often come down to just calling some functions with similar or the same arguments. Introducing caching is a great way to speed up the execution of such functions because they won’t have to be evaluated time and time again. Instead, a cached value will be returned in a timely manner.
We decided to use the standard LRU cache mechanism that is present in the functools module. We added a new dacite.cache module:
The main point of interest is the cache function, which accepts a function as an argument and caches that function. It can also be used as a @cache decorator, which we will see in just a moment. The three other functions are just utilities to allow for easy cleaning of the cache, as well as changing the maxsize parameter to adjust RAM usage.
Once we had this caching mechanism ready, we applied it to relevant functions and function calls. We’ve decorated most of the functions in the dacite/types.py file, as well as wrapped a few function calls within the from_dict function itself.
After applying the cache function:
With the fixes and the caching added, we’ve been able to increase the performance of the from_dict function 5 to 10 times – sometimes even more, depending on the particular use case.
Below is a comparison of our performance suite. Runs marked with 0001_6eecf29 are before any performance improvements were applied, while runs 0002_c831d57 have all the improvements in place.
Integrating with the CI
We now had a suite of performance tests that helped us accurately measure and improve the performance of our library. That’s a great achievement already, but we wanted to take it one step further and integrate benchmarking into our Continuous Integration process on Github, in order to be able to trace any unwanted performance degradations in the future.
There is an easy way to integrate benchmarking results generated by pytest-benchmark with Github Actions: GitHub Action for Continuous Benchmarking. It can take benchmark reports generated by a multitude of tools for different languages (Python, Java, JS, Go, and more), save them for further reference (e.g. to fail a CI job if the performance has deteriorated too much), and even generate a nice Github Page that displays how the performance has changed between commits.
In our case, adding it was very simple. In our definition of Github Action that runs some tests and linters for Pull Requests, we’ve added:
Thanks to this configuration if the performance of dacite is worse by 130% or more than the master, the job for the Pull Request will fail.
For commits and merges made to the master branch, inputs.publish_performance will be set to true, and the benchmark results will be published to a Github Page as a chart: https://konradhalas.github.io/dacite/performance/3.11/. You can also look up the performance of other Python versions by changing 3.11 to anything between 3.7 and 3.11. Each benchmark suite (i.e. each test) gets its own chart:
Some additional performance tips for using data classes
While improving dacite’s performance, we also found a few valuable improvements – frozen data classes and slots – that can be done to your projects, regardless of whether you use dacite.
In FLYR martech, we use dacite heavily for instantiating and passing around Data Transfer Objects, which are internally represented as data classes. In recent Python versions (3.10 and higher), a new slots argument has been added to the @dataclass decorator. Without going too deeply into the implementation details, declaring @dataclass(slots=True) ensures that only fields declared in the data class can be provided, or it throws an AttributeError:
However, by declaring the slots, Python interpreter is able to do some significant memory optimizations, thus reducing RAM usage. Our codebase had many data classes to go through and add this slots=True parameter, but the effort was well worth it. Not only did RAM usage decrease, but the time it takes to process such data classes is also much less. Here is a working example:
In our tests, the a instance uses up 648 bytes of memory if no slots=True is provided, but with slots enabled, it only uses 328 bytes – almost 50% less! However, since we are operating on small values anyway, adding slots to your data classes will only speed up your library if it processes huge amounts of data in the form of data classes. In our case, the difference in RAM usage was negligible.
Frozen data classes
Another improvement that we wanted to introduce is getting rid of frozen=True for our data classes. Unfortunately, in our case, we could not use regular data classes in place of the frozen ones, but that might not necessarily be your case – in which case you could achieve a significant speed boost.
Let’s first compare two data classes: A and B. The only difference between them is that the former is frozen, while the latter is not.
To test the performance of both the A and B data classes, we measured how much time it takes to create 10 000 000 instances of these classes with this data input:
The results are significant, as removing the frozen=True from the data class makes it as much as two times faster. The exact speed increase depends on the data class itself, the data, and Python version, but that’s quite an improvement in any case.
But there is the additional guarantee that comes with the frozen data class; it cannot be modified in any way. To keep the guarantee in the test environment, while dropping it on production code for performance reasons, you can create your custom decorator wrapper for the data class. This way you can alter its behavior so that the data classes are frozen during testing, and not frozen while running on production. If your business logic tries to modify the frozen data class somewhere along the way, there should be an exception thrown.
Here is our simple decorator, which we called dto:
The IS_TESTING variable should come from the environment configuration and will be either True or False, depending on the conditions described above.
If you’re using mypy for static type checking, please be aware that it does not handle data class aliases properly and will throw type errors. You can read more on the topic in mypy docs.
Impact on FLYR’s MarTech platform
With all of the fixes and changes to the dacite library that we described above, it is finally time to see how the performance of our FLYR martech platform was impacted.
In order to be able to accurately measure the influence of the improved dacite library on our system, we’ve chosen a single API endpoint, which serves heavy data loads. The data comes from MongoDB and is parsed by dacite before being served in the response. We also decided to run the performance testing and compare the results in a local environment to avoid the results being affected by the network transportation layer.
With everything ready and prepared, we created some fixture data to mimic our real production data and started testing the response time of the aforementioned API endpoint. For that, we used a tool called Locust, which allows for easy performance testing of APIs and other HTTP resources. It can simulate multiple users trying to use a web app simultaneously (you can simulate hundreds or thousands of users) and will register the time it took for the webpage/API endpoint to load or return a response.
We ran Locust benchmark for 120 seconds, starting with one user requesting our API endpoint and adding another user each second – up to 20 users. The pre- and post-improvement comparisons were really satisfying. We’ve managed to lower the response time of our API endpoint by up to 10 times, from an average of 1.2 seconds to less than 0.3 seconds.
Since our system has many more similar APIs to the one which we’ve checked, where we query some data from MongoDB and parse it using dacite before serving the response, the performance improvement will be felt throughout the entire system.
Summary and Conclusions
Our FLYR marketing technology team loves open source and uses it extensively. We are pleased to have had the opportunity to work on the dacite library as part of our daily job responsibilities. In this way, we both improved the performance of our platform and provided added value to the open-source community using dacite in their own projects.
In our case, where we rely on dacite heavily and use it to process a lot of data, the new version of the library with the improved performance is perceptible throughout our platform – our APIs respond faster, and many tasks finish quicker than before.
Keep in mind that dacite is a free and open-source library, so if you’d like to help by fixing bugs, adding features, or even improving the performance even further, please do! Visit the repository at https://github.com/konradhalas/dacite and look around the open issues. Any help will be welcomed.
We also hope that this article will be a good starting point for anyone thinking about improving the performance of their own Python libraries and projects. With the help of tools like pytest-benchmark and Github Action for Continuous Benchmarking, it will be both easy and productive.