Tech stories
A Deep Dive into Table partitioning š° Part 4: How the default partition saved the day
By Derk van Veen, Database Engineer
In the previous blog posts, we discussed the ins and outs of table partitioning in Postgres. We touched on different types of partitioning, how to partition tables of several terabytes(TB), and how to automate partition maintenance.
Today, weāll dive deep into another story where partitioning saved the day at Adyen. While partitioning is a great strategy for dealing with scaling issues, it can become a problem on its own. Nonetheless, once itās well understood and experimented with over time, it can offer a powerful set of tools that were not available before.
Itās good to mention that all the practices weāve recommended so far should be followed for a āboringā life (remember, boring is good in our world). However, sometimes, we must go around the best practices and push our knowledge to its limits. Sometimes, we need to get creative; this story is a great example of when we did so.
The Good, the Bad and the Ugly - The DEFAULT partition
We have touched on the DEFAULT partition topic in all the previous blogs, and it was all for this moment. Based on all the information, one should never create a DEFAULT partition since itās āthe root of all evilā. āBoringā is what we want, and the DEFAULT partition is far from boring. Thatās why, at Adyen, we spent a great effort ensuring all the tables are free of this evil. And we did so! Butā¦ (it wouldnāt have been a blog without a ābutā now, would it?)
ā¦ there comes a day in an Adyenāers life when they face a challenge that forces them to think outside the box; and to think outside the box, you need to forget about rules and best practices, and put the deep knowledge to the best.
This day happened about one year ago when we brought the DEFAULT partition back to life for, what we call, a good use case. We still donāt know whether itās the best solution, but we canāt think of a better one to date. To explain this use case, we need to go way back, to the beginnings of Adyen.
Database Sharding
Adyen did not always have scaling problems. Things were quite āboringā in the early days āthe one database, one server type of days. All the data was close to each other, and all the code was close to the data. It was a simple world.
As you scale, you need to adapt. Vertical scalability has its well-known limits, and when the database became a bottleneck, we understood it was time to apply database sharding.
In simple words, sharding splits a large database into smaller, more manageable shards. Each shard has a subset of the data, usually distributed across multiple servers.
This approach allows horizontal scaling, and we could then split the load between different shards. But we all know thereās no free lunch. Like any good thing, it comes with a cost. Besides maintenance, your application must route the queries to the correct database shard. Although sharding added complexity, it enabled us to take payment processing to the next level and continue to scale horizontally when we want to.
However, thereās one last question to answer: What do we do with queries that need (some summarized) data from all the shards? We are talking about products that used a single database before as the single source of truth; think of reports with all the data available in a single place that could have been transformed, compressed, or summarized easily. Surely, we could run queries in parallel across all shards, combine the results, and transform the data the way we wanted to, but this is not always possible due to the nature of the queries. Furthermore, our goal was to reduce traffic on the payment clusters. We had products that needed to use the database in a ādata warehouseā style that ran heavy queries and generated reports.
So, to split the load even more, we created an aggregated (and summarized) version of the original data from all shards and combined it into a new database dedicated to such data warehouse operations. Hereās a simple diagram that illustrates how this setup looked like:
This was a great setup for many years, but we knew it would only scale for a while. You can also see it, right? Even with a summarized version of the data, the aggregated database would eventually grow larger than we would like it to. We donāt even take into account the new products that we have spanned over the years.
For such data warehouse applications, we recommend sticking to the best practices and technologies specialized for them. Go towards the Big Data world and explore OLAP technologies and Hadoop if needed for parallel computation. Postgres will still perform well, but when you start asking questions, maybe itās time to look elsewhere.
However, this is not the topic of discussion for this blog. Letās return to Postgres and unravel the problem partitioning brings in this setup.
How to partition an aggregate table with minimal downtime
Letās recap how we partition an existing table from the second blogpost in this series Table Partitioning at Adyen:
Rename the existing table to <table>_mammoth.
Create a table with the original table name.
Add the original table as the first child.
Add at least one more partition to accommodate the incoming data.
The problem is still hidden in the details. Letās partition this table together; We have the following tables and data (for simplicity, we only show three(3) shards):
All shards maintain a non-overlapping shard key, so we know which shard the data comes from, even when aggregated.
The aggregated table A contains data from all the shards, meaning it has a sparse data distribution given the partitioning key due to the shard keys. So, as the diagram illustrates, what partitioning range can we use to attach tableA_mammoth?
We need to specify a partitioning range when attaching it to its parent.
Data will still come in from different shards, meaning the new partitions must accommodate all these ranges from all different shards.
(In case you were wondering already) We can NOT attach tableA_mammoth from the global MIN to the global MAX (e.g. more specifically on the diagram, from 1000 to 3999). This would cause all the data (also future data) from all the current shards to land on this partition.
And NO, your partitions need to be mutually exclusive; otherwise, Postgres will fail to attach a partition.
For future partitions and data, we can still keep the original boundaries from the shard table. We can simply maintain the same ranges per shard key.
However, we still cannot attach the current partition since the data is sparse. What do we do? Do we move the data around so it accommodates the ranges we want? Remember, this table has been gathering data for many years, so we already have a 20TB+ table. Are we sure we want to move this data around?
Do we give up partitioning this table? Definitely not! We just get creative!
How the default partition saved the day
The solution goes against our best practices: we attach tableA_mammoth as a DEFAULT partition, so we do not specify any range.
It seems like the DEFAULT partition can save us in this particular use case. But if we want to go down this road, we must do it properly. We have a few extra challenges to conquer:
What happens if we attach a new partition to the parent table?
One would expect the partition to be attached without any issues. But if we try, weāll see that the actual ATTACH command hangs. Why?
When you try to ATTACH a partition, Postgres will not simply do it because you say so. In fact, Postgres needs to validate that the partition youāre trying to ATTACH does not overlap with other existing partitions; otherwise, Postgres could face data corruption. Hence, Postgres will attempt a full table scan on the DEFAULT partition, which is unrealistic for a TB-sized table.
We have explained this problem thoroughly in our previous article, Maintenance Under Pressure: " Before adding a new partition to a partitioned table with a default partition, this default partition might contain data that could belong to the added partition. For example, the default partition might contain value 35 at the moment a new partition is added for values from 30 to 40. So, naturally, the database has to check this to prevent partition constraint violations. As you might expect, the only way to check this is to do a full table scan on the default partition with an ACCESS EXCLUSIVE lock on the default partition to stop new data coming in. Now our safety net is suddenly preventing us from easily adding new partitions to our partitioned table. Because of it, we now need an ACCESS EXCLUSIVE lock every time when attaching new partitions. So much about minimizing locks!"
For this reason, we need to create CHECK constraints before attaching a partition. We need to do exactly the same for the DEFAULT partition and ensure this partition simulates a frozen partition rather than a safety net. We need to prevent Postgres from scanning the whole table, and we need to do it with great caution! Thatās why we use our own partitioning functions, which are battle-tested and, if you forgot, they are public on our GitHub repository.
What about LOCKs? Will this CHECK constraint enable us to freely attach future partitions?
In our previous article, we slipped what we call a spoiler question.
Spoiler question: Do you think this is all the locking that is involved?
What do you think? We now have non-overlapping CHECK constraints on the DEFAULT, and the new partition is waiting to be attached. We prevent Postgres from doing a full table scan on the DEFAULT partition. Thus, when we try to ATTACH, everything is in order. Wrong! We couldnāt ignore one detail. Postgres needs a LOCK on the DEFAULT partition to update the partition constraint. What does this look like?
Although our DEFAULT partition is, in fact, frozen, and no new data can come in, Postgres still needs to ensure it is made explicit when attaching new partitions.
So we have arrived at the most major drawback of this whole partitioning effort: every time we ATTACH a new partition, we need a (very short) lock on the DEFAULT partition. We canāt change the rules about locking, but what we can do is minimize the probability the lock will be disruptive. We know for a fact that the most recent data is the data we need the most for daily operations. If we need a lock on a table, it is probably easier when this table contains mostly older data. For this reason, we created over a year of new partitions to accommodate the new incoming data. By the time we need to create new partitions, the data in the default partition is over a year old and not so heavily used anymore. To ensure we donāt disrupt the application when we are adding new partitions, we use the Lock, release, and retry strategy described in our previous article, which makes the whole process easy and fast.
We started with a problem, exhausted all the solutions following the best practices, got creative, pushed our Postgres knowledge to its limits (and beyond), and came up with a solution that, one year later, has led to no problems. We count this one as a success story!
Have we missed anything? Ahh, yesā¦ learning! We have learnt so much about Postgres along the way! Building a solid knowledge base on the technologies we are using is extremely valuable at Adyen! Undoubtedly, this helped us solve a current problem, but it will also help us with the next partitioning case,the next locking issues, and so on. Itās a marathon of learning, and however much we would understand and work with Postgres, thereās still so much yet to be discovered. And we are happy to take on this journey!
Fresh insights, straight to your inbox
By submitting your information you confirm that you have read Adyen's Privacy Policy and agree to the use of your data in all Adyen communications.