This blog is an adaptation from a conference talk given earlier this year. If you prefer a video format, have a look there!
My wife and I are quite different in our shopping habits. She likes analysing the market trends and she’s good at deciding what she likes. She’s even better at deciding whether to buy or not - disclaimer: usually she does buy it, and then our entrance at home looks like a package pick-up point (not true, but also not entirely not-true).
I, however, have a different poison. When I am looking for something, say a new set of headphones, I analyse and analyse again, and at some point I conclude that a certain product is probably the best fit for what I am looking for. Then, I check, yet again, the distance for the second choice and after that I might end up going back to the fundamental question of whether I actually needed it in the first place.
I am quite frugal, so if I am not very enthusiastic about the top choice I will probably end up discarding it (and bloating those non-conversion metrics for the A/B test behind the ecommerce site that leaves the team scratching their head about what is wrong with their website).
In my defence I will say that if I am sure I need something and there is a clear market winner, I buy immediately and I do not think about it anymore.
The point is, choosing what to buy and when to buy it, is quite transcendental. Choosing your tech stack is a somehow similar problem. You need to understand and reflect very well on a number of things:
Whether you actually need it, or is a fun exciting but limited-value exercise (hello ChatGPT demos 🙊)
How do you sweep and track the market and assess which tool or framework to adopt?
Should you build or buy?
How fast do you need it and whether you are sacrificing something instead?
Without thorough consideration of the above, a leadership team (often proud of their choice and/or invested in another way) may spiral the team downwards in terms of productivity and motivation. That’s why it is important to give it the right amount of love, and eventually make a well-informed choice together with the technical experts.
It is remarkably difficult if you end up in a situation where there’s a clear need for something better than what you have now but there is no industry standard or a clear choice. That’s where my shopper-persona would collapse, and a Feature Store, or should I say Platform (will come to that later), has been a primary example of this sort of conundrum. At Adyen we have gone through this exercise a number of times and we have learned, sometimes the hard way, how to go about these choices. It really boils down to two things that we embrace in our ethos: iterating and control.
Our first attempt to build a Feature Store
Let me use an example of a use case for a feature store at Adyen. Every payment that goes through Adyen - and we do a lot of those (860 billions USD in 2022) - undertakes a journey where a number of decisions are made through an inference service fueled by a machine learning model that we have trained and deployed. We have a few instances of those services with different purposes: our risk system (is the transaction fraud or legit), our authentication service (should we authenticate the user, and if so in which way), our routing algorithm, our transaction optimiser, our retry logic and others.
We are talking about a service that can take several thousands of requests per second per model and respond in less than 100 ms. The final goal of all these models is to land as many good transactions as possible in the most efficient way, without ending up in a chargeback or a retry or higher costs.
Now, these models need features, of course. They need features both at training time and at scoring time.
Let’s zoom in on the risk system. The service was initially built through rules across three different data sources:
Block/allow look-up-tables, powered by PostgreSQL.
Velocity database, powered by PostgreSQL, and able to provide information such as how many times has this card been used in this last minute.
Our “shopper” database, also powered by PostgreSQL.
The “shopper” database deserves its own paragraph. We have used, and maybe even abused, PostgreSQL in a very beautiful way: to identify shoppers in real time. This system in the end provides an elegant and simple graph algorithm by identifying communities of attributes (cards, emails, …) that relate to the same person. It does that very efficiently and very fastly, but it has its own complications around flexibility and scalability. My colleague Burak wrote a great article about it. [1].
When introducing a new approach based on a supervised classifier, we thought: hey, let’s include features about the merchant, that’ll boost AUCPR. Small note, a merchant in the fintech jargon is a seller or a company, take for example, Uber or Spotify. In this sense then, we’d include, as a feature, datapoints like the size of the merchant, the country of the merchant, for how long have we been processing for this merchant, the authorisation rate of that merchant across different sliding windows and so on.
However, many of these example features are slow-moving, medium cardinality, high data volume, features. And PostgreSQL wasn’t gonna cut it in terms of crunching all that volume. So we added a new database called “Feature Store” (read that as Dr. Evil with his famous quote hands).
We took advantage of the fact that we were sending all our transaction events to our Big Data Platform via Kafka. We collect all this info in the form of Hive tables and then we use Spark to crunch all this information, all beautifully orchestrated through Airflow. We had all this information and tooling available in our Big Data Platform because that’s where we train our models. We just need to use an abstraction layer to define the features (we chose Feast) and then deploy on another posgresql instance in the real-time flow (we love postgresql, in case you haven’t figured this out yet). The ML artifacts are deployed to our real-time platform through our wrapper around MLflow, called Alfred, that allows us to stage their rollout into ghost, test, canary live settings and default live modes.
The final picture to our first crack at a “Feature Store” looks like this.
And happy we were ever after, thinking that we had a “Feature Store”. And by “ever after” I mean a couple of months.
The build-vs-buy tradeoff
Hidden in all this there’s something that might have passed unnoticed,the redux of a very important lesson that we have learned through the years: our build-vs-buy trade-off.
There are great vendors that promise and deliver a seamless turn-key experience: it’s fast and it just “works”. It’s a very amenable choice, and I can only show my honest respect for these startups: they are like rain in the middle of a drought for a lot of companies that want to instantly get in the gig of Feature Stores, MLOps, Experimentation, Data Governance and any other sweet problem to solve. And they do a good job at it.
At Adyen, we like to stay in control and understand what happens under the hood. We also have a very solid principle: we won’t use a vendor for anything touching our core business. In this case, processing a payment is indeed core business and we don’t want to introduce a dependency on a third-party. That has two important implications that are worth calling out:
First, we do use vendors, but we are critical about which part of the system they impact. On the one hand we buy our laptops from a vendor - it wouldn’t be optimal if we had a team building laptops. On the other hand, because we are set in building for the long term, we believe in controlling all our supply-chain (take SpaceX, procuring their own screws even).
Second. If we don’t use vendors, do we then build in-house? We have made that choice in the past, and now in perspective, it was a mistake. Picture a top-performing engineer, machete in their teeth, mumbling a classical “hold my beer” and then proceeding to build from scratch something that already exists because “it will only take me a week to do it better”.
We have been there and after the first month I can guarantee the fun is over. Looking ahead at the feature parity roadmap sinks you in despair, the operational debt and preventable bugs itch you more than usual, and you end up going to bed every night thinking “why did we do this”.
So what’s the answer?
Well, open source.
We use open source as much as possible to build our infra and rails, and then we build on top of it our core business. We also contribute back by merging PRs and adding new features that we found useful. At the end of the day, the internet runs on open source.
Iterating on the Feature Store
I already gave away one bit of our ethos: strong control of our dependencies, fueled by long-term thinking. A second big trait of our way of thinking is the iteration culture.
At some point when building software, and even hardware systems, someone figured out that working in waterfall contracts doesn’t really help and that instead working in an agile way gets you further and faster, and it is also more fun. The point of agile is not to adopt scrum. The point of agile is to embrace that the MVP is minimum (and therefore rusty and barely presentable) and also viable (it works, it’s not a WIP commit), and from there onwards, you have to iterate and quickly.
We took a cold look at what we had built that we proudly called “feature store”, tried to remove any emotional attachments to it, and ended up concluding that it wasn’t actually great. We also conclude that we might want to do some soul-searching and write a requirement list about what we want the whole thing to do.
We ended up with a letter to Santa with everything we wanted. At least from there we could make a conscious choice about what we will not get given the cost and possibilities.
Feature Parity: the features and values on the training and scoring flows must be identical.
Retrieval latency: we need the inference service to work under 100 ms, so there’s that.
Recency: the features should not be old, we should be able to refresh them very fast.
Cardinality: we want to be able to store billions of features.
Distributed: we need instances of the feature store around the world, because we process globally.
Storage / Scalability: our transaction volume grows quite a lot every year, and we build for 20x.
Availability / Uptime: we need the system to be there 99.99999% of the time.
Self-service: ideally we want data scientists to go about themselves when prototyping and deploying new features
Complex calculation: some of these features can be complex to compute, that should be alright.
Feature diversity: it’d be great if we didn’t have to maintain three different databases and we just had one endpoint with all sorts of data inside.
After seeing that list we thought “wow, it’s a long list” but we also figured that there was an underlying difference between a pure storage place and a place where things are computed. And that’s also where we read Chip Huygen’s fantastic article on Feature Platforms [2], and it was one of those click/a-ha moments, when you confidently say out loud “we were building a feature platform, not a feature store” and you can hear the non-existent triumphant music behind you.
The main difference lies in facilitating the computing of features, apart from the storing and serving which is indeed captured under the definition of the feature store.
The new blueprint
Based on this we also saw the need to have a system that spans across two different platforms, our real-time platform (where payments, KYC, payouts, refunds and financial interactions with the world happen) and our big data platform (where we crunch the data). That’s not a surprise given that you have a need in two very different flows: your inference flow in real-time and your training flow off-line.
We needed an abstraction layer that would glue both systems and we chose to keep on using Feast, the open source package to allow data scientists and engineers to define features uniquely and ensure consistency across the two environments. We also evaluated LinkedIn’s Feather, but we deemed it too opinionated and opted for the openness of Feast.
The general idea behind the synching happening on the two environments is that some features will be computed on the real time flow and stored on hot storage and sync back to the cold storage (big data platform) while the slow-moving features will be computed on the big data platform, stored there (cold storage) and synched to the hot storage for inference.
While the batch computation engine and storage were already there, Spark and Hive/Spark/Delta, we still had a few choices to make regarding the on-line flow. For stream computing we are inclined to use Apache Flink – but we can define it later. For the storage layer we hit a dilemma across a few contenders: Redis, Cassandra, Cockroach and sweet old posgresql.
Here is where I will circle back to where I started: you need to make good decisions and that probably means that you need to involve technical experts. Even there, you also want to be able to tap into the wider organisation to make sure that you are not too biassed or forgetting anything. That’s why we have a TechRadar procedure where engineers can share ideas for technology contenders, spar, benchmark and eventually decide on which to adopt in a “makes sense” fashion.
We decided for Cassandra. Redis’ in-memory storage makes it quite expensive at the cardinality we are looking for, Cockroach is really keen on read-write consistency at the expense of speed and posgresql, well, didn’t cut it for our needs.
We still are evaluating choices for on-line computing engines (as said, Flink looks good) and feature monitoring, where it might be that we just use the monitoring stack available, largely consisting of Prometheus, Elastic and Grafana.
That’s an honest look at where we are today. We are making an informed choice and not shooting from the hip. We have determined what we need, we have also determined what is important to us and what we are willing to pay for it. We have analysed the market and open-source offering and are back to our beloved execution mode.
Even if there’s no clear and obvious choice, my shopper persona is still happily going through this procurement journey and enjoying the benefits of learning, discovering the possibilities and deciding. Because if we don’t get it right at first, we will build, fail, learn and iterate.
Want to learn more? Watch this recording of the conference talk that lead to this article.
Fresh insights, straight to your inbox
By submitting your information you confirm that you have read Adyen's Privacy Policy and agree to the use of your data in all Adyen communications.