Spark tuning: executor optimization for performance

Learn how Spark executor tuning improves performance with fat, thin and optimal executors for efficient applications.

Introduction to Spark executor tuning

Working with Spark is intimidating for new users. Distributed computing concepts to understand, parameters to tune and environments to configure, on top of writing the application code itself! It’s easy to use surface-level Spark knowledge to create an application that works, but developing efficient applications and achieving real performance improvements require a deeper dive into the details of Spark. One such case is Spark executor tuning.

Spark drivers, workers and executors

Spark is an open source unified analytics engine for distributed processing of large volumes of data. Computations are split out over a series of clusters, enabling parallel processing and in turn speeding up execution. Each Spark application runs on a driver node and a series of worker nodes.

The role of the driver in Spark applications

The driver is the “brain,” running the main method of the application. The driver is responsible for building execution plans and orchestrating the computational tasks performed. It analyzes, schedules and sends tasks to worker nodes.

The role of worker nodes in Spark clusters

In contrast, the workers are the “brawn.” Each worker carries out computation tasks and returns the results to the driver.

Diagram of a Spark job with driver and worker nodes running parallel tasks across executors in two stages.

The role of executors in Spark jobs

Spark jobs are broken down into a series of stages, each composed of tasks that run in parallel on worker node executors. A task is the smallest unit of work in a Spark application. A Spark executor is a process that runs on a worker node in a cluster and executes tasks assigned to it by the driver. Executors perform the actual data computations of a Spark application. 

Each executor is allocated a certain amount of memory and CPU cores for storing data and performing computation tasks. By default, Spark creates a single executor for each worker node in a cluster, but users can change the number of executors and the memory and CPU allocated to each executor. This can often lead to improved performance depending on the application.

Details for executors can be configured by using the following Spark parameters when setting up an application: –num-executors, –executor-cores, –executor-memory.

Fat vs. thin vs. optimal executors in Spark

Executors in Spark can be configured in different ways—fat, thin or optimally sized—each with trade-offs in performance, cost and fault tolerance.

Fat executors in Spark

Since Spark creates one executor for each worker by default, these executors are “fat.” They contain all the CPU cores and memory available to the worker node. Fat executors can be beneficial for certain use cases, such as when an application is processing a large amount of data or when managing several executors becomes a concern.

  • Since a fat executor has more cores, more tasks can be run in parallel (typically 1 core = 1 task), which can improve application performance.
  • An additional benefit of fat executors is enhanced data locality, as there is a greater chance of data being processed on a node where it is already stored. This reduces the amount of network traffic (data sent between workers), speeding up the application.

While there are benefits to using fat executors, there are potential downsides:

  • As all the memory and CPU for a worker sit on one executor, there is potential for resources being underutilized if some cores or portions of memory remain unused.
  • Fat executors have a lower fault tolerance in the event of an error, as all the resources for a worker are contained on a single executor. If that one executor goes down, the whole worker goes down.

Thin executors in Spark

In direct contrast to fat executors are thin executors. Thin executors are minimally sized, oftentimes containing a single CPU core (or a small number of cores) and a fraction of the memory available to a worker node.

  • Similar to fat executors, thin executors also increase parallelism, in this case because there are more executors available.
  • There is better fault tolerance due to the number of executors available in the cluster. If an executor goes down, it’s not the end of the world, and the amount of data being processed on each executor is small due to the limited memory on thin executors, so recomputation is easier.

Thin executors are not perfect either, and there are negatives:

  • A high amount of network traffic occurs between the driver and executors on each worker node. Since thin executors have less memory and fewer CPU cores, there will be more data sent across more executors as the driver assigns tasks and receives the results from workers.
  • For similar reasons, there is reduced data locality when using thin executors. More data is spread over more executors, and each executor has a smaller amount of memory, preventing it from storing a large quantity of data partitions locally.

Optimally sized executors in Spark

As the name implies, optimally sized executors are configured to contain the ideal amount of memory and CPU cores for each executor on a worker node. Optimal executors are the Goldilocks solution: not too big, not too small, just right. This can lead to improved application performance by potentially reducing the run time of the application while better utilizing the resources configured. Optimal executors are determined by following the rules below.

Rules for sizing optimal Spark executors

Sizing Spark executors correctly requires following a few best-practice rules:

  1. Leave out 1 CPU core and 1 GB of RAM for the operating system per worker node.
  2. Remove 1 executor (or 1 core and 1 GB of RAM) at the cluster level to account for resource management.
  3. When calculating executor memory, leave out a certain amount to account for the memory overhead of internal system processes. The amount to leave out is MAX (384 MB, 10% of executor memory).
  4. Ideally, have 3-5 CPU cores on each executor.

Example Spark configuration across 5 worker nodes

Given what we now know about executors, let’s walk through an example. We’ll use a sample Spark configuration with 5 worker nodes, each containing 12 CPU cores and 48 GB of RAM. Our base cluster looks like this:

Spark cluster with 5 worker nodes, each containing 12 CPU cores and 48 GB of RAM.

After following rule 1, we are left with 11 cores and 47 GB of RAM on each worker node.

Spark worker nodes after applying rule 1, each with 11 cores and 47 GB RAM available for executors.

Across the cluster (all 5 nodes), we have:

Spark cluster totals across 5 nodes showing 235 GB RAM (47 GB each) and 55 cores (11 cores each).

Now, we follow rule 2. We could remove one full executor (which would likely be better for a thin executor case where we have many executors), but for this example, we will remove 1 GB of RAM and 1 core at the cluster level.

Spark cluster totals after applying rule 2: 234 GB RAM (minus 1 GB) and 54 cores (minus 1 core) across executors.

We want to use optimally sized executors across these 5 nodes and need to take into account rule 4, leveraging 3-5 cores per executor. We’ll choose 5 executors:

Spark executor sizing example showing 5 cores per executor and ~21 GB memory per executor after overhead.

The Spark configurations set for this example would be as follows:

Spark config example showing 10 executors, 5 cores per executor and 21 GB memory per executor.

The following is a visualization of what the executors would look like in relation to the whole cluster, with each executor in purple:

Optimally sized Spark executors across 5 worker nodes, each with 12 CPU cores and 48 GB RAM split into two executors.

Conclusion: Spark executor tuning matters

Spark executor tuning matters because performance, cost efficiency and reliability all hinge on how executors are configured. Ideally, this article provides a deeper understanding of what Spark executors are and how tuning them can lead to better performance and potential cost savings. As with many concepts in Spark, executor tuning is not an exact science and will likely require trial and error. This article should serve as an example that you can use to crunch the numbers and find the optimal configuration for your application. Enjoy tuning!


This blog was co-authored by Rudra Sinha, Senior Manager; Andrew Baak, Principal Associate and Tatum Bair, Principal Associate

Rudra Sinha, Andrew Baak and Tatum Bair work in the data engineering space and boast extensive experience in data engineering and technical leadership. As a group of data engineering leaders, their passion is to explain their deep data engineering knowledge, acquired through extensive research, concept proofing and countless implementations.