Enhancing serverless observability with Python

Key strategies and tools to leverage Python for effective serverless observability.

The serverless computing space continues to be powered by Python’s vibrant ecosystem, making it the preferred choice for developers across multiple cloud providers. From each of the major cloud providers and across serverless functions to big data, the preferred language is Python. In this article, we’re going to focus exclusively on serverless functions and an oft-cited challenge, Python observability.

What it means to be serverless

Serverless cloud computing applications involve a trade-off, where cloud infrastructure providers give up control of the compute resources that underlie their code executions. This shift allows developers to focus on the value their code delivers, rather than on the operational requirements associated with running servers as virtual machines. This is highly appealing, as it enables on-demand billing based on compute time, which may be more cost-effective. Meanwhile the provider can manage all hypervisor, operating system and runtime upgrades in order to ensure that the developers’ code remains secure, compliant and isolated.

Given that all major providers support Python as a native runtime, developing serverless applications powered by Python just makes sense!

Explore #LifeAtCapitalOne

Feeling inspired? So are we.

How to observe serverless applications

A common concern when deciding to adopt a serverless architecture is observability. Without the ability to use secure shell (SSH) to access the underlying operating system, it will be difficult to: 

  • Access your log files on the server

  • Access your top data about your application performance

  • Aggregate business context about what your application is doing

  • Interrogate your application to see what happened in any specific transaction

It may be surprising to know that not only are all these tasks accessible within the serverless stack, but implementing comprehensive observability in a serverless manner is also a better practice compared to logging into an instance or container interactively.

Serverless observability starts with thoughtful application design. An observable system requires as much planning and design as test-driven development or ensuring that the business logic is handled correctly. Unlike most design considerations, serverless applications have evolved to a point where very little code needs to be written in order to achieve this design. In fact, in many cases, the code only needs to be instrumented.

Understanding logs, metrics and telemetry in serverless observability

Similar to how we write code in modern times using behavior-driven development (BDD) and test-driven development (TDD), we use Python observability-driven design (ODD) to build modern distributed applications, including those based on serverless technology. This means that we need to instrument code in a meaningful way and design the windows into our application’s behavior intentionally.

This can be broken down into three basic components: logs, metrics and telemetry.

Let’s consider a simple API for scheduling payments. The application should be able to:

  • Log an identifying attribute of the account.

  • Emit a metric to indicate a successful payment was scheduled.

  • Send telemetry to identify which parts of the scheduling process are being invoked in real time (e.g., a commit to the database and a message to queue up the payment when able).

Observability-driven development: instrument the code, measure and analyze findings, improve with changes

Logs

Logs are the most familiar part of this trio to most software engineers. For almost 25 years, standard logging formats have been published by documents like RFC-3164 (now RFC-5424) for the Syslog standard, and all modern languages have logging libraries that incorporate these conventions, including log levels. Logs are statements of facts, such as a code line where an error was encountered, what happened to the customer experience or simple information that a specific part of the application fired. The Python standard library’s logging module is no exception.

Logs have become such a standard monitoring tool in our development toolbox that they are often the first answer to any data observability question. This reliance on logs can be likened to the adage “When you’re comfortable with the hammer, everything starts looking like a nail.” However, while logs are crucial, relying solely on them for aggregation can lead to problems that are difficult to solve.

Metrics

Metrics are numeric measurements taken over time. Examples include the duration of function execution, the number of executions that timed out or business metrics like the outcome of the execution. Performance metrics are powerful because they allow you to summarize and analyze data by using simple sums and averages as well as descriptive statistical measurements such as percentile values.

Telemetry

Telemetry, on the other hand, provides actionable insights into code execution for each API call made or the methods invoked, depending on the code’s instrumentation. The scale of telemetry sets it apart. While telemetry measurements are valuable for any given serverless function, they become more insightful when they are taken as distributed telemetry. This approach brings together a full user flow with the same level of visibility and the ability to drill down in each specific area.

Adopting Python observability-driven development for serverless success

Serverless observability is not just a possibility; it’s an essential practice for modern application development. By leveraging Python’s robust ecosystem and the inherent capabilities of serverless infrastructure, developers can create highly observable systems that offer deep insights into application performance and behavior.

Designing for comprehensive observability is achievable and highly beneficial. By ensuring your serverless applications are observable from the start, you can save time and add significant value to your projects. This approach enhances both quality and functionality. 

Adopting Python ODD allows you to design your applications with built-in monitoring, logging and telemetry, ensuring you can proactively identify and address issues. This approach not only improves the reliability and performance of your serverless functions but also enhances your ability to deliver high-quality software rapidly.

As the serverless landscape continues to evolve, integrating an observability solution from the start will become increasingly important for your cloud environment. The strategies and practices discussed here will help you navigate the complexities of serverless observability, ensuring your applications are resilient, efficient and ready for future challenges.

Check out our full PyCon 2024 presentation: ‘Python Powered Serverless Observability’

For an even deeper dive into Python and how it powers the majority of the serverless world, check out what my colleague Brian McNamara, distinguished engineer, and I said during our PyCon 2024 speaking session. In our session, you’ll explore the community libraries that exist to improve application observability, including a step-by-step instrumentation of code. 

After watching this session recording, you’ll walk away with a clear understanding of how to design observability into your serverless development, as well as some fundamental tools that will enable you to effectively scale your services.

Explore Capital One’s serverless efforts and career opportunities

New to tech at Capital One? We’re all in on the cloud, serverless and open source:


Dan Furman, Distinguished Engineer

Dan is a solutions architect, open source enthusiast, and cloud native advocate. With 15 years of experience, he is inspired by the innovation, speed, and trends that become best practices across programming languages. Dan's on a mission to make software delivery approachable, strategic, cost effective, and timely by thoughtfully building our toolbox.