Imagine that you’re a cloud architect at a large enterprise. You’re responsible for providing architectural patterns to use Serverless technologies that are performant, cost effective, and secure. Now imagine that your cloud footprint already includes thousands of applications and cloud resources. Where and how does Serverless fit? What patterns can bear the constraints induced by such a scale? What serverless offerings do you choose to utilize?
These are the questions that we embarked on a multi-year journey to answer at Capital One. Our journey began as a natural evolution from our experiences with containerization. While containers opened the door towards reducing operational burden and gaining efficiency, Serverless heralded the dawn of an entirely new age. In this new age, the possibilities appeared endless: not just static web sites, but also data analytics, identity management, batch processing, business intelligence, machine learning and more could be enabled by using Serverless.
In a company full of diverse needs, the appetite for Serverless was vast and it was critical for us to address as many use cases as possible in ways that could meet our functional, technical and security requirements.
On this journey, we partnered with several internal engineering teams and our partners at AWS, and together we learned what worked well, and what didn’t. One area of focus was defining patterns that leveraged event-based architecture. Event based architecture suits Serverless for a variety of reasons:
- Asynchronous call enables decoupled systems.
- Enables immutable, persistent, shareable events.
- Highly resilient to failure.
- Able to scale effectively.
- Highly observable and extensible system.
- Independently releasable.
- Independently optimizable.
Our focus narrowed in on such areas as:
- Static Website Hosting - Cloud native static website hosting in a private manner.
- API Server - Serving API responses in serverless fashion.
- Data Processing - ETL and batch jobs for terabytes of data.
- Rules Engine - Orchestrating workflows against various event triggers.
- CDN Customization - Adding logic to CDN routing.
- Multi-Region Resiliency - Enabling high availability using serverless offerings in multiple regions.
While several benefits were realized such as enabling efficient scaling, resiliency and cost, there were several lessons learned along the way.
- Need to optimize deployment architecture - Getting a serverless application to work is a victory in unto itself. Further benefits can be realized by optimizing the way the components are deployed and interact; for example co-locating service instantiations can reduce latency.
- Need to enable observability - Sometimes, performance of queries or functions could be slow. Determining the root cause of latency was a mystery without effective tooling; agent based strategies are not feasible for serverless monitoring and cloud native metrics are helpful indicators but don’t always pinpoint root cause. Thus, strategies need to be defined and implemented upfront to handle monitoring and logging.
- Limitations in routing - Using ALBs to support a serverless functions architecture is beneficial but do not allow the same level of routing as a traditional NGinx or HTTPD would allow.
- Scaling can cause issues - Lambda functions can scale very well in response to a surge in demand. However, if all of the functions use the same endpoint, it is possible that the successful scaling event results in a distributed denial of service (DDos) outcome.
We will be sharing more lessons learned at AWS re:Invent in a session called “Serverless at Scale” in hopes that others may learn from our work in this area. While your scale may or may not match our scale, the lessons learned and best practices are universal. We hope you can join us on Thursday, December 5th at 12:15pm PT at the Venetian, Level 4, Marcello 4403, and walk away with practical knowledge for how to apply Serverless at Scale to your needs.