The serverless way to serve billions of requests

Vijay Bantanur

December 3, 2019

Ever wonder what it was like to wait for a monthly statement to arrive to find any erroneous transactions on your account? Maybe you lived through it and still have the papercut scars to prove it. The digital revolution has made it possible to access your information when and where you need it and we are not too far from a futuristic world where relevant information comes to you as it happens and you don't need to go looking out for it. Capital One is on a journey to make almost everything we do real time, adapting cloud computing and big data engineering tools to bring a real time aspect to our solutions.

A team of smart engineers, product owners and scrum masters in my group have been working to bring personalized, real time, and relevant insights to customers. We look out for transactions that are not typical for our customers’ spending behavior; like restaurant transactions having an unusually high tip, an increase in recurring bill, start of a new free trial, multiple duplicate transactions, etc.

Over the years at Capital One, the Spark framework evolved into the technology of choice for high volume real-time streaming and batch needs. With great computational power comes higher operation and maintenance cost and we started seeing the pain of operating Spark infrastructure for real time streaming needs. So my team took on the challenge to find a simpler, low maintenance, yet highly scalable pattern and designed a Serverless Streaming solution around it.

Why Apache Spark is not a silver bullet for all real time streaming use-cases.

According to Databricks blog, Apache Spark is one of the fastest open source engines to handle high volume batch and streaming data. It's a bit of a no brainer. Why would we think of using something else in light of such outstanding performance?

Graph with blue bars detailing 40 Core Throughput on 10 Worker Nodes

Source: Databricks blog (https://databricks.com/blog/2017/10/11/benchmarking-structured-streaming-on-databricks-runtime-against-state-of-the-art-streaming-systems.html)

However, let's evaluate this in context of your application needs. For instance:

What are the typical records per second load your application has? If it is less than a couple thousand then Spark may be overkill.
To achieve high resilience, what is the minimum number of driver and worker containers your application needs to have? If you are on cloud think about distributing it across multiple availability zones and regions.
What about the number of containers required to install resource managers like Zookeeper to maintain Spark clusters?
Have you thought about region failures for your cloud or data center? If so, you might have experienced the architectural complexity of making Spark clusters available across regions.
Do you have a constant load all through the day? Awesome, it's good for your business. However, most real time systems have a cyclical load on a daily, weekly and yearly basis so make sure to think about your system utilization during off peak hours.
Spark jobs can fail anytime. Have you thought about error handling, monitoring, and automatic recoverability of jobs? It's a sizable developer effort to achieve this.
Don't forget about the developer cost, it's the biggest of all. Think about all the special skill sets engineers need in order to develop on Spark infrastructure such as programing languages like scala or Python, scripting knowledge to install and manage spark infrastructure, fine tuning Spark cache and slave nodes etc.

Although Apache Spark has real impressive results, a serverless streaming solution may be a better choice if you are concerned about the operational overhead of Apache Spark. In reality, most of the real time streaming use cases have a load way less than 1000 transactions per second. You don't need to deal with the infrastructure complexity of Spark to solve such a small load. Instead, use serverless streaming to simplify your code and drastically reduce cost and complexity.

What’s in a well architected streaming solution?

Having implemented several streaming related use-cases, I think the ideal streaming solution should address the following requirements:

Scaling: In modern application architecture, auto scaling should be one of the basic design considerations. In the era of cloud computing, you can get unlimited computing as you need it, hence there is no need to scale it for peak loads and pay extra. Although you can plan for major cyclical loads, it's hard to optimize it by the minute in traditional server based infrastructure. Ideally, applications should have the capability to auto heal and auto scale in the event of major spikes in requests.

Throttling: Streaming applications are typically designed to ingest thousands of requests per second and eventually filter to a more manageable scale. When unexpected spikes hit, streaming applications can scale out, but downstream blocking IOs (APIs, DB, etc) may not be able to. Because of this throttling becomes one of the fundamental requirements for any streaming application. Remember - your system is only as good as the weakest link.

Fault Tolerance: Invariably applications are connected with other resources like APIs, Databases etc.. It's difficult to avoid failures in the dependent systems, but at the same time you also want to safeguard your application against these issues. In a streaming application, fault tolerance is one of the critical requirements as you don't want to lose your data when your backend system is down.

Reusability: It's very common to focus on reuse verses recreating the solution. Degree of reuse depends on modularity and size of the component with microservices being the best example of reuse. By making the building blocks of streaming solutions’ small and configurable, we can emphasize component reuse across multiple applications.

Monitoring: Well, I have to admit that one of the best feelings is when I know what exactly is happening with my application. Imagine millions of messages/events are flowing through your application and you have the ability to trace each and every message and understand what exactly happened to that message. This becomes even more crucial when you are building critical customer facing applications and are required to find out what exactly happened to a specific customer event. So monitoring is very crucial for synchronous or asynchronous systems.

How did we build our serverless streaming architecture?

Given the above, how did we build our solution? Our serverless streaming architecture was modeled after event driven microservice architecture where each microservice is connected to one another using a message bus.

red and yellow organization chart showing connection between microservices and message bus

Event driven architecture inherently provides all the capabilities we need from a streaming solution. By overlaying managed services provided by cloud providers with event driven architecture, one can build serverless streaming solution.

If you can map the above pattern with managed services like AWS Lambda as a microservice and AWS Kinesis as message bus, you can achieve the event driven architecture using the serverless stacks.

We divided the overall architecture into three layers -Source, Sink and Processing.

Source: In this layer, microservices are only responsible for pulling the data from the sources.Think about this more as the entry point of the event into your streaming application. Example: Reading the events from the Kafka cluster.
Processing: This layer is responsible for processing the event which you got from the source layer. You can also think of this as a layer where you can have your application specific logic. Example: Filtering the events or calling the API to make decisions on the event. You can have one or more processing layers to map, reduce, or enhance your messages.
Sink: This is the last layer in your application where you are taking the final action on the event. Example: Storing the events into the data store or triggering other processes by making API calls.

Below is the mapping from Message Driven Architecture to the AWS Services.

diagram with purple, orange, blue, and green squares

From the above diagram, you may feel that there are lots of repetitive actions, specifically from Lambda to writing/reading from the Kinesis. Well, you can be creative and can build some kind of library for repetitive functionality.

At Capital One, we did exactly that and built our internal SDK to abstract the repetitive tasks. Our SDK has the below features:

Read and write from message bus: Writing and Reading the events from the message bus(Kinesis or others in future).
Exception handling and retries: There are two main retry categories, blocking and non-blocking. You can have the blocking errors retry when your backend application is failing until it is back. Non-blocking retries will be used when you only want to retry the specific event and it doesn't have any impact on the other events.
Secret management: This feature will be needed when you don't want to rely on storing the credentials on your serverless function. You can take your pick of enterprise secret management tools and integrate them as part of your library
Monitoring: We created our customized message envelope which has the metadata which helped us to track every message. The SDK can take that overhead from the developer and can insert/remove the envelope on entry/exit of each microservice.
Logging: For achieving a uniform experience across all the microservices you can build in a logging pattern in the SDK.
Message deduplication: As we know, most of the distributed fast data systems guarantee at-least one delivery. When you want to filter out the duplicate messages, you can think about abstracting it out as part of the library. You can use hashing or other methodologies to implement message deduplication with sub milliseconds latency.

Let’s look at how this solution implements the requirements of our ideal streaming solution

As we discussed earlier, any serverless streaming solution would need to address scaling, throttling, reusability, fault tolerance, and monitoring. So how did this stake up?

Scaling: This architectural pattern, in combination with scalable cloud, services inherently makes this possible.

Pattern uses Lambdas to implement microservices and connected via Kinesis. We only need to scale Lambdas that have high TPS, and as the messages get filtered out adjust the scale configuration accordingly.
By design, serverless functions are auto scalable. Example: If you are using Lambdas and Kinesis, you can scale the Kinesis if your message throughput increases from 2MB/Sec to 4 MB/Sec, which will also scale the Lambda functions that are connected to the Kinesis.

Throttling: The fundamental function of throttling is based on needing to hold your request if your input request rate is much higher than what your downstreams can support. The persistence nature of the message bus can help us here since you can only pick the number of messages which you can handle at one time and hold others. Example: If you are using Kinesis as a message bus you can specify the batch size which you can handle in your function.

Reusability: If we can build the source and sink microservices in such a way that they do not have any business functionality and are configuration driven then multiple teams can use them to consume events. Example: If you can build the source function to consume events from Kafka that allow for configuration on things like topic name, broker address etc, any team can take that function and deploy into their stack as per their needs without needing to make any code change.

The above can help us to achieve the code level reusability. The other reusability is reusability of the flow itself. If the message bus you selected for your architecture is a pub-sub based bus then you can have multiple subscribers to the same events. Example: You can just fan out your event to the two microservice without writing the single additional code.

Fault Tolerance: Again, the message bus can rescue us here. Think about if you are having the errors from your backend services, you can hold all/failed messages into the message bus and retry until your backend calls start succeeding.

Monitoring: Logging metadata payload as a part of SDK can help achieve uniform logs across different functions. You can also build one reusable function which can forward your logs to the preferred monitoring solution.

flow chart showing multicolored product logos

It sounds like serverless streaming solution is the silver bullet and I don't need Spark.

Not really. Apache Spark is a distributed computing platform and really shines for large scale distributed data processing loads. Spark remains the tool of choice when it comes to high volume computing and batch processing where data and compute functions can be distributed and performed in parallel. Typical examples include the heavy computing needs of machine learning use cases, the map/reduce paradigm involving several hundred files, or long running processes that deal with petabytes of data, etc. Spark can also be the tool of choice for the real time streaming world as well, but only if the volume is very high, running in hundreds of thousands of transactions per second.

Across Capital One we use multiple flavors of big data engineering tools. In my group, I use serverless streaming for high volume use cases such as generating meaningful alerts based on customers’ transactions at thousands of events per second and for low volume events like card reissues that run in tens of events per second. I also use Spark to process large transaction files to generate customers’ spend profiles using machine learning models. It all depends on specific needs.

Vijay Bantanur, Director, Software Engineering

Vijay Bantanur is a Director, Software Engineering at Capital One