Spark tuning: executor optimization for performance
Learn how Spark executor tuning improves performance with fat, thin and optimal executors for efficient applications.
Introduction to Spark executor tuning
Working with Spark is intimidating for new users. Distributed computing concepts to understand, parameters to tune and environments to configure, on top of writing the application code itself! It’s easy to use surface-level Spark knowledge to create an application that works, but developing efficient applications and achieving real performance improvements require a deeper dive into the details of Spark. One such case is Spark executor tuning.
Spark drivers, workers and executors
Spark is an open source unified analytics engine for distributed processing of large volumes of data. Computations are split out over a series of clusters, enabling parallel processing and in turn speeding up execution. Each Spark application runs on a driver node and a series of worker nodes.
The role of the driver in Spark applications
The driver is the “brain,” running the main method of the application. The driver is responsible for building execution plans and orchestrating the computational tasks performed. It analyzes, schedules and sends tasks to worker nodes.
The role of worker nodes in Spark clusters
In contrast, the workers are the “brawn.” Each worker carries out computation tasks and returns the results to the driver.
The role of executors in Spark jobs
Spark jobs are broken down into a series of stages, each composed of tasks that run in parallel on worker node executors. A task is the smallest unit of work in a Spark application. A Spark executor is a process that runs on a worker node in a cluster and executes tasks assigned to it by the driver. Executors perform the actual data computations of a Spark application.
Each executor is allocated a certain amount of memory and CPU cores for storing data and performing computation tasks. By default, Spark creates a single executor for each worker node in a cluster, but users can change the number of executors and the memory and CPU allocated to each executor. This can often lead to improved performance depending on the application.
Details for executors can be configured by using the following Spark parameters when setting up an application: –num-executors, –executor-cores, –executor-memory.
Fat vs. thin vs. optimal executors in Spark
Executors in Spark can be configured in different ways—fat, thin or optimally sized—each with trade-offs in performance, cost and fault tolerance.
Fat executors in Spark
Since Spark creates one executor for each worker by default, these executors are “fat.” They contain all the CPU cores and memory available to the worker node. Fat executors can be beneficial for certain use cases, such as when an application is processing a large amount of data or when managing several executors becomes a concern.
- Since a fat executor has more cores, more tasks can be run in parallel (typically 1 core = 1 task), which can improve application performance.
- An additional benefit of fat executors is enhanced data locality, as there is a greater chance of data being processed on a node where it is already stored. This reduces the amount of network traffic (data sent between workers), speeding up the application.
While there are benefits to using fat executors, there are potential downsides:
- As all the memory and CPU for a worker sit on one executor, there is potential for resources being underutilized if some cores or portions of memory remain unused.
- Fat executors have a lower fault tolerance in the event of an error, as all the resources for a worker are contained on a single executor. If that one executor goes down, the whole worker goes down.
Thin executors in Spark
In direct contrast to fat executors are thin executors. Thin executors are minimally sized, oftentimes containing a single CPU core (or a small number of cores) and a fraction of the memory available to a worker node.
- Similar to fat executors, thin executors also increase parallelism, in this case because there are more executors available.
- There is better fault tolerance due to the number of executors available in the cluster. If an executor goes down, it’s not the end of the world, and the amount of data being processed on each executor is small due to the limited memory on thin executors, so recomputation is easier.
Thin executors are not perfect either, and there are negatives:
- A high amount of network traffic occurs between the driver and executors on each worker node. Since thin executors have less memory and fewer CPU cores, there will be more data sent across more executors as the driver assigns tasks and receives the results from workers.
- For similar reasons, there is reduced data locality when using thin executors. More data is spread over more executors, and each executor has a smaller amount of memory, preventing it from storing a large quantity of data partitions locally.
Optimally sized executors in Spark
As the name implies, optimally sized executors are configured to contain the ideal amount of memory and CPU cores for each executor on a worker node. Optimal executors are the Goldilocks solution: not too big, not too small, just right. This can lead to improved application performance by potentially reducing the run time of the application while better utilizing the resources configured. Optimal executors are determined by following the rules below.
Rules for sizing optimal Spark executors
Sizing Spark executors correctly requires following a few best-practice rules:
- Leave out 1 CPU core and 1 GB of RAM for the operating system per worker node.
- Remove 1 executor (or 1 core and 1 GB of RAM) at the cluster level to account for resource management.
- When calculating executor memory, leave out a certain amount to account for the memory overhead of internal system processes. The amount to leave out is MAX (384 MB, 10% of executor memory).
- Ideally, have 3-5 CPU cores on each executor.
Example Spark configuration across 5 worker nodes
Given what we now know about executors, let’s walk through an example. We’ll use a sample Spark configuration with 5 worker nodes, each containing 12 CPU cores and 48 GB of RAM. Our base cluster looks like this:
After following rule 1, we are left with 11 cores and 47 GB of RAM on each worker node.
Across the cluster (all 5 nodes), we have:
Now, we follow rule 2. We could remove one full executor (which would likely be better for a thin executor case where we have many executors), but for this example, we will remove 1 GB of RAM and 1 core at the cluster level.
We want to use optimally sized executors across these 5 nodes and need to take into account rule 4, leveraging 3-5 cores per executor. We’ll choose 5 executors:
The Spark configurations set for this example would be as follows:
The following is a visualization of what the executors would look like in relation to the whole cluster, with each executor in purple:
Conclusion: Spark executor tuning matters
Spark executor tuning matters because performance, cost efficiency and reliability all hinge on how executors are configured. Ideally, this article provides a deeper understanding of what Spark executors are and how tuning them can lead to better performance and potential cost savings. As with many concepts in Spark, executor tuning is not an exact science and will likely require trial and error. This article should serve as an example that you can use to crunch the numbers and find the optimal configuration for your application. Enjoy tuning!

