AWS Lambda Java tutorial: Reducing cold starts

From turtle to hurtle.

Sean O'Toole

April 28, 2020

Turtles are oft-maligned as being the slowpokes of the animal kingdom, and as a Java engineer in the Cloud I feel a sort-of spiritual connection to these animals. Join me on this journey and we’ll take our AWS Lambda Java functions from Bertie “the speedster” tortoise to the Zond 5 tortoises.

Hi, I’m Sean O’Toole. I’m a lead software engineer at Capital One in the UK. I attended AWS re:Invent at the end of 2019 and learned a lot of interesting things related to my role as a backend engineer working in the cloud. I will be writing a series of posts about the things I learned around cloud engineering practices at re:Invent to hopefully instill some of these best practices to the Java community at large. In this first installment - an AWS Lambda Java tutorial - we’re going to be looking at AWS Lambda and Java best practices to help improve performance around cold start boot times.

A lot of work has already been put in by AWS engineers to help with this problem, such as provisioned concurrency, but we should still endeavour to write more cloud-appropriate code on our side to get the performance we can. In this AWS Lambda tutorial, we’re going to be looking at 11 best practices to make our Java code more cloud-appropriate when we’re working in a resource constrained environment like lambda. This includes reducing dependencies, utilising more lambda-friendly libraries, reducing reflection, and a couple of lambda-specific tips and tricks to get better performance. So without further ado, let's dive right in and start taking your application from turtle to hurtle.

What's the problem with Java and AWS Lambda?

Java in lambda can be slow. It's pretty well known that Java has a particularly tough time with AWS Lambda cold start execution times, which is something that has been covered in AWS blogs and is widely published elsewhere. This behaviour is present even when controlling for things such as creating elastic network interfaces when running in a VPC- though this in itself has been improved recently by AWS changes to VPC networking.

Imagine a big lumbering turtle, covered in ice, waking up and still groggy from hibernation (brumation), and you’re pretty close to what an AWS Lambda cold start feels like using Java. A lot of this comes down to the way that we write our lambda functions using Java. There are things we have come to rely on or take for granted - things such as infrastructure with seemingly limitless CPU and memory, cheap reflection, and a single start-up for the lifetime of the service - that don’t hold true in the lambda environment. If we work within the constraints of the system we’re dealing with and make some improvements with those constraints in mind, we can go from a turtle-on-ice to a turtle on roller skates and say goodbye to sluggish cold-starts.

What's the solution? Try these AWS Lambda Java best practices

Well, we're in luck because there are a whole bunch of things we can do to improve the performance of our Java functions. In this tutorial I'll walk through 11 AWS Lambda Java best practices, and try to give some real world examples for how this could work.

Measure current performance

The first thing you're going to want to do is measure your current performance. Unfortunately, it isn’t quite as simple as grabbing a caliper in one hand and a turtle in the other. Instead, you can do this by deploying your function to AWS, invoking it, and then measuring how long it took for the invocation to finish. Much like the aforementioned caliper, there are tools available that you can use to measure the performance of your lambda. One such tool is Gatling, which you can use to invoke your lambda a bunch of times and then have a look at the response times in the report, which produces a nice visualisation and gives you the raw numbers.

grey slide containing bar chart with green and yellow bars and green and yellow pie chart next to it

Report I generated in Gatling

There's also an AWS instrumentation tool called X-Ray that is brilliant. The basic idea is that you instrument your code with X-Ray calls and then examine the results in the AWS Console to see how long things are taking. There are a number of built in recorders that can monitor HTTP calls, AWS SDK calls, etc., and the tool gives you automatic visualisation of the JVM initialisation time when instrumenting lambda functions.

Other than this there are third parties that provide similar functionality such as AppDynamics and DataDog that are worth looking into.

Allocate more memory

Elephants may never forget but a 200 year old tortoise has definitely seen some stuff. If you allocate more memory to your lambda function, it'll run faster. Okay, blog post over, can I stop typing now?

Alright, so this one is a bit of a cop-out. But if you find that your 128mb lambda function is taking too long to start up and execute, you can change it to a 3008mb lambda function and it'll run a lot faster. This is due in part to the linear scaling resource model in AWS Lambda, as Lambda allocates CPU power linearly in proportion to the amount of memory configured. This means more memory means more CPU, which means faster functions.

Unfortunately this also means more money. The downside to just scaling all your 128mb lambda functions to 3008mb is that it can also cost around 25x as much to run (Dependent on number and length of requests). This may sound pretty bad, but it might not be as huge a deal as it seems since it all depends on your traffic profile. On my team, we have a service with a traffic profile of around 300,000 calls a month with 99% of calls taking less than 100ms to complete. With this profile, moving from 128mb to 3008mb would increase our monthly bill from around $0.06 of compute charges to around $1.47. A big increase relatively, but on the surface of it, not a big deal realistically.

However, if your traffic profile is closer to a service owned by one of our sister teams, you could be looking at a much larger bill. Their traffic profile is closer to 400,000,000 requests per month taking around 300ms- this takes you from $2,500 a month with 128mb of memory, to a staggering $58,750. Additionally, you might have hundreds of services on lambda with traffic profiles between either of these two extremes, so keeping memory sizes low is definitely in your best interests in this case.

Reduce dependencies

Reducing the number of dependencies in your deployed jar will have a positive impact on the cold start time of your function. Originally I thought this was related to the size of the jar- more dependencies means packaging those dependencies in your jar, which means a bigger jar file. In theory this takes a longer time to pull down and is therefore slower. Turns out that whilst that’s true, it has a negligible impact in the grand scheme of things.

The jar is cached pretty much straight away, regardless of cold starts, and the network IO never really becomes a problem until you have a ridiculously large file. The actual problem with having a lot of dependencies is the number of classes you have to load.

Let’s take a sample SpringBoot application that just returns “Hello World” utilising the SpringBoot Getting Started guide.

This project loads ~5,900 classes at startup, and another ~200 upon making the first call. If you’re like me you’ve never had a need to really look into this- on my laptop this takes about 1.59 seconds to start up the server to accept HTTP calls, and jstat reports all classes had finished loading in about 1.8 seconds. This could be even quicker if you’re running on a big EC2 server with plenty of CPU and memory available.

What we’re missing from this is the fact that ~6,000 to serve an HTTP request is a ridiculously large number of classes. For perspective- the basic Java runtime that gets loaded if you run a “Hello World” jar is about 460 classes. The bare minimum to run a lambda using the aws-lambda-java-core dependency adds another 9 classes, so we’re at less than 500 classes for a fully functional lambda function. When this is hooked up using API Gateway or an Application Load Balancer, you’re able to serve an HTTP request all the same.

You can do this experiment yourself using jstat as mentioned before, which will give you the number of classes loaded and how long it took:

    $ java -jar 
$ jstat -class  1000

Alternatively you could use a more fully fledged tool like jVisualVM and perform a heap dump, which gives you a rundown of the classes loaded and how many instances of them are currently being used:

    $ jvisualvm --openpid

You could even use some verbose class logging when starting your jar, which gives you a list of all the individual classes that you’ve loaded. This way is great for picking up on which dependencies are costing you the most classes:

    $ java -cp your.jar -verbose:class MainClass | grep Loaded > loaded.txt

From this file you can perform some simple grep commands to figure out which dependencies are costing you the most.

For a practical example, let’s take the simple SpringBoot app from earlier. I’ll split the command and output down line by line rather than doing a magic one-liner so it’s easier to follow along.

    $ java -cp build/libs/springboot-app.jar -verbose:class org.springframework.boot.loader.JarLauncher > startup.txt
$ cat startup.txt | grep Loaded | sort > loaded-classes.txt

This will give you a text file containing around 6,000 lines of classes you’ve loaded from starting up your service which we can inspect to get some data. For example, how many of those classes are direct Spring dependencies?

    $ cat loaded-classes.txt | grep org.springframework | wc -l
2496

Ouch! That’s a lot of classes.

Now, I’m not advocating for completely re-inventing the wheel everywhere and never using another dependency again, but taking the time to properly inspect the dependencies you’re bringing in and performing a cost-benefit analysis of whether that’s worth the increased cost or the increased latency is something that we should all be doing.

FUN TURTLE FACT: On average, sea turtles lay 110 eggs in a nest, and average between 2 to 8 nests a season. They don’t parent their hatchlings, talk about reducing dependencies!

Use AWS SDK v2

Did you know that a new version of the AWS SDK was published at the end of 2018? I hadn’t used it until recently and was surprised to find out how it could help in this situation. V1 of the SDK was first available for use in March 2010, and whilst it continues to receive updates even today, the API and underlying infrastructure is still very similar to how it was built ten years ago. V2 was released for general consumption more recently in November 2018 and is much better optimised for serverless frameworks like lambda. Some of the benefits of using V2 over V1 are that it contains fewer dependencies, allows for non-blocking IO, and has better configuration options than the original library, including the ability to customise HTTP libraries utilising HTTP plugging, etc.

Don't forget to exclude transitive dependencies that you aren't using- V2 includes both Netty and Apache HTTP libraries which you typically won't need for most use cases.

Still using V1? Amazon provides a migration guide for updating your code to use V2 of the library instead of V1.

Use a basic HTTP client

The AWS SDK ships with a bunch of different HTTP client libraries that can be used to make SDK calls. These are all-singing, all-dancing libraries that can do connection pooling and a whole bunch of other stuff that's really useful in scenarios like when you’re on a server in a long-living process.

Lambda isn't one of those scenarios. Realistically, most of the time you're going to be making a single or perhaps a few HTTP calls and then returning a result to your client. You're unlikely to get a chance to even execute any of the other connections you've spun up as the lambda function can only handle a single incoming request at a time. This means any concurrent requests will go to a separate lambda function (with its own connection pool). Either that or you'll have finished with the connection you were already using when you passed back your response, so it's available to be used again by the next request to this lambda anyway.

The built-in Java HTTP client should be good enough for pretty much every use case in a lambda function, so use it where you can unless you have a compelling reason to use something more complex.

Fully specify AWS SDK clients

Okay, so you've moved from V1 to V2 of the AWS SDK, but there are still more improvements you can make. One of these is to fully specify the configuration for the individual SDK clients rather than using the auto-discovery you get as part of the provider chains.

You have control of the environment and your lambda function, so you should know where your credentials are coming from, the region you're running in, the service endpoint for the AWS service you're using, etc. By specifying these up front, you can ensure the SDK does not need to do any more work than it needs to during initialisation. A good example of this is if you don't specify an endpoint override the SDK reads and parses a big JSON file containing all the endpoints for all services in all regions using Jackson- so you've got IO and reflection here to work out something already known.

Remove expensive dependency injection frameworks

Whilst injections might be good for turtles, they’re not quite as healthy for Java in the lambda environment. A lot of the most commonly used dependency injection frameworks can be expensive to run. They'll use classpath scanning and reflection to create a whole bunch of objects when you ask them to and link them all together. This doesn't play nicely in the lambda environment as reflection is really slow in a memory and CPU limited environment.

If you're using a reflection based dependency injection framework, you've got two options for speed increases:

Move to a framework that isn't reflection based - such as Dagger. This uses annotations to pre-generate Java source code at compile time so you don't have to do any reflection or bytecode generation later on- which is a lot faster.
Remove the dependency injection framework altogether. This might sound drastic but stay with me. If your lambda function is small enough (which it should be), the benefits you get from having a whole framework for dependency injection aren't as obvious as you might think. A three-tier SpringBoot application with a whole bunch of dependencies multiple layers deep into the application definitely benefits from having a dependency injection framework handling everything for you. If we’re writing small lambda functions that don’t have several layers, a framework for dependency injection is a lot less useful. A useful lambda could have a hierarchy as small as Handler > Service > DynamoClient, which is a lot easier to manage. This removes the need for a DI framework and makes it so just initialising objects yourself and passing them into a constructor is an easier way to handle the dependency chain.

Eliminate reflection

Mirror, mirror, on the lake, what other changes shall I make? Get rid of reflection, or at least do your very best to reduce it where possible. As mentioned above, reflection is really slow in memory-constrained environments such as lambda. This means you should definitely try and avoid doing it yourself where possible, maybe even going as far as to start using different libraries than the ones we might regularly use without a second thought. Dependency injection is an obvious one from the previous point, and we’ve gone over some options for that. There are other areas such as JSON marshalling and unmarshalling using Jackson, that can sometimes be replaced in the same way by substituting for code generation libraries such as Moshi.

This can be easier said than done as the AWS SDK itself uses Jackson to perform its own unmarshalling. However, you might be able to get away with using something different in your own code. In some of our services we’ve had good experiences using Moshi for our ALB request and response marshalling and unmarshalling, utilising the RequestStreamHandler and operating on Input and Output streams which is faster than Jackson marshalling.

Initialise dependencies at initialisation time

You know the story about the tortoise and the hare? How even though the hare started off fast he got complacent and ended up losing the race? Well Lambda functions aren’t like that at all- the faster you start the faster you finish, and there’s a little trick you can use to make your tortoise start the race like a hare.

Let’s start with the theory. Lambda functions have two stages they go through when they're invoked- initialisation and runtime. Initialisation is only run when the lambda function is starting without an execution context, this is what we call a "cold start." After that it will try to reuse the execution context and therefore only run the runtime stage. The initialisation stage is responsible for everything that makes your function code invokable - JVM startup, object initialisation, etc.- everything required so that your handle method can be invoked in the runtime stage.

A less advertised fact of this two-stage approach is that you get boosted access to the CPU during initialisation, which is then throttled down for runtime. This means that any expensive operations are better off done at initialisation time, as they will complete more quickly with the access to more CPU.

In the real world this probably won't be an issue for most implementations- anything that is created as part of object instantiation will happen at initialisation time. This includes things such as static fields and blocks, instance fields and blocks, and constructor invocation. Most of the time we’ll be creating our dependencies at initialisation time, right? But it’s not an uncommon pattern where we defer objection creation for lazy loading something on first invocation because it’s expensive, and you’re inadvertently leaving it to be created when we’re under CPU throttling- which might make it end up taking longer than if we’d just created it upfront.

Prime dependencies at initialisation time

This one will probably be more useful than the previous step. After we've initialised our SDK clients and gotten them set up during the initialise stage, that ought to mean that we can use them in the runtime stage without issue, right?

Not so much. A lot of the AWS SDKs are lazily loaded- so even if you initialise the DynamoDB client beforehand, a lot of the actual initialisation won't happen until you come to make your GetItem or PutItem call. For example, PutItem in DynamoDB can take nine seconds to initialise Jackson marshallers, initiate connections, etc. the first time you call it when running with a small memory footprint.

You can bypass this expensive runtime initialisation by using the trick from above, but this time we're going to be making "priming" method calls rather than just instantiating objects. This doesn't feel nice, but if you move that DynamoDB PutItem call into the initialise stage, it can take as little as 700ms with access to the boosted CPU, and then subsequent calls in the runtime stage will have a primed client ready to go.

It might be worth using this method in your “healthcheck” calls- do a GetItem call using your DynamoDB client for a known item will both prime your client, and also check your connection to DynamoDB to make it a more useful healthcheck.

Use native executables via GraalVM

This section definitely deserves a deeper expansion than this blog post will go into, but one of the big takeaways for future development was to use a tool such as GraalVM in conjunction with Micronaut or Quarkus to produce native executables that run directly on the underlying OS as a native executable rather than executing on a JVM. It looks reasonably complicated to set up and has a lot of information that I won't be able to go through here, but it could be something interesting to look at further. This approach isn't exactly mature yet, so I would exercise caution before using this in a production service- but things are moving quickly in this area and it’s definitely something to pay attention to.

Resources to learn more:

The links above to the individual products contain Getting Started guides that should get you started relatively quickly.
This blog post dives into creating a HelloWorld application using Quarkus, which can get you up and running in a matter of minutes.
This post on Opsgenie goes in depth on the benefits of GraalVM with an example native Java running in AWS Lambda using the Golang runtime.

Conclusion

There are plenty of best practices we can follow to reduce AWS Lambda cold start times when using Java. We’ve covered areas such as measuring your current performance, reducing the number and the complexity of your dependencies, and the importance of reducing reflection. Additionally, we’ve briefly touched on some things like Quarkus and GraalVM that warrant deeper dives in the future.

I hope you’ve enjoyed this AWS Lambda Java tutorial. To learn more about the things we’ve covered I recommend watching this AWS re:Invent session on best practices for AWS Lambda and Java. It's pretty neat—you should check it out. But sadly, there aren’t any turtle references.

Sean O'Toole, Dev Engineer III, UK Tech Software Engineering

I am a keen technologist and I run a website that aims to teach TDD and Agile methodologies using katas (http://agilekatas.co.uk) which gets over 500 visits a month from all over the world and has numerous solutions on posted on GitHub by myself and others. I also contribute to open source software hosted in GitHub (https://github.com/SeanOToole/), which aims to help ease development for Java and Hybris developers. I have a number of personal projects in areas I am interested in- I have developed games for PC and mobile and also utility apps for Android.