Code Walked Out of a Spaghetti Bar…

Using instrumentation to detangle information by leveraging data collectors and log aggregators


I think we can all say that eating spaghetti is better than dealing with spaghetti code. So let’s tackle a tasty example of how instrumentation can help you unravel this situation.

🍝🍝🍝

Picture a developer writing an online spaghetti shopping cart application. The following line of code can be used to log a successful order:

    LOGGER.info("The spaghetti order was successfully placed");
  

Or what about this one to log an exception:

    ...
catch(SomeException ex) {
  LOGGER.error(“Something broke the meatball!!!”, ex);
....
  

This is what I like to call log it and forget it. The good news here is that the developer is logging...the not so good news is whether they can answer the following questions:

  • How many spaghetti orders were placed in the past 45 minutes (and they can’t query a production database)?
  • How many times was the meatball broken in the past 24 hours?

🍝🍝🍝

 Although the scenarios painted above might be trivial, it highlights the importance of instrumentation in your software application through the power of data collectors and log aggregators.

What is Instrumentation?

In their documentation, Oracle defined instrumentation as follows:

Instrumentation is the addition of byte-codes to methods for the purpose of gathering data to be utilized by tools. Since the changes are purely additive, these tools do not modify application state or behavior. Examples of such benign tools include monitoring agents, profilers, coverage analyzers, and event loggers.

As defined above, instrumentation can be applied in a lot of ways. This article is going to focus on the use case of event loggers, including some of the best practices within the industry.

Once Upon a Legacy System

With the popularity of microservices architecture, it’s likely that at some point in time you’ve been asked to abstract a feature from a legacy system into a microservice.  If your new microservice relies on other downstream microservices to fully perform its function it’s likely you’ve had questions about how to handle monitoring and alerting with regards to those downstream services. They probably went something like this:

  • “Should we be alerted if downstream services A,B,C,D,and E went down or just A,C, and E?”
  • “Should we be alerted at all if any of the downstream services are out for x amount of time?”
  • “How often do these downstream services go down?”
  • “How many successful calls has this microservice handled in x amount of time?”

Sounds familiar, right? Maybe you’ve used a state design pattern and queueing infrastructure so your new microservice was able to achieve stability regardless of the status of the downstream services. However, this does not negate the fact that your microservice, although stable, would not be able to perform its function if any of the downstream services were out. Given that this is the nature of microservices architecture, you’ll need to build the service to address these known scenarios.

Instrumentation should not interfere with the behavior of your code, but rather seamlessly extract information from it. Ensuing are some steps to implement this.

Log Event Data Model

Your log event data model is the building block that ensures you are capturing the vital information of different events happening in your code. It gives your logging event structure that can be easily parsed by data collectors. We used a hierarchical based model where the parent class has all the common information that every log event should have in our use case.

Going back to our favorite spaghetti shop to highlight how this was accomplished:

    public class EventLogRecord {
    private final String appName = AppConstants.APP_NAME;
    private final String logMessage;
    private final EventLogType eventLogType;

    public EventLogRecord (EventLogType eventLogType, String logMessage) {
        this.eventLogType = eventLogType;
       this.logMessage = logMessage;
    }

// getters 
...
}
  

Each event log is associated with an enum type:

    public enum EventLogType {
    DownStreamServiceCalled,
    ExceptionHappened,
    SpaghettiOrder,
    GeneralLogging
}
  

Any logging event can then inherit from the parent class and then implement any field it plans to support. In our example, the spaghetti order log event will have a data model like this:

    public class SpaghettiOrderEventLogRecord extends EventLogRecord {
	private final boolean orderSuccessful;
	public SpaghettiOrderEventLogRecord(EventLogType eventLogType, String logMessage, boolean orderSuccessful) {

		super(eventLogType, logMessage);
		this.orderSuccessful = orderSuccessful;
	}

	// getters
}
  

And the log event for a downstream service call would look like this:

    public class DownStreamServiceEventLogRecord extends EventLogRecord {
	private final Integer httpStatus;

	public DownStreamServiceEventLogRecord(EventLogType eventLogType, String logMessage, Integer httpStatus) {

		super(eventLogType, logMessage);
		this.httpStatus = httpStatus;
	}

	// getters
}
  

Then the exception log event would look like this:

    public class ExceptionEventLogRecord extends EventLogRecord {
	private final String stackTrace;
	public ExceptionEventLogRecord(EventLogType eventLogType, String logMessage, String stackTrace) {

		super(eventLogType, logMessage);
		this.stackTrace = stackTrace;
	}

	// getters
}
  

The data models above are just to highlight the trivial case of this article’s example. In a real world application, factory methods would be used to create these data models. At the point of logging these events, a JSON mapper like gson or jackson mapper can be used to log the event for a data collector to parse the object.

    ...
LOGGER.info(gson.toJson(eventLogRecord));
...
  

What is a Data Collector?

Data collectors are used to parse your log files and have them shipped to a NoSQL database, streaming service, or log aggregator. Fluentd and Logstash are examples of data collectors that parse log files. Both are equally capable in their abilities, however Fluentd has more plugins and is still compatible with Elasticsearch if you do not want to go full native with ELK stack.

There are a variety of ways to parse your log data and grok patterns are one of the most popular ways to do so. Grok patterns are essentially an abstracted regular expression that is used to find any specified pattern in your logs. Using our fictional spaghetti example, the following can be a log output for a successful spaghetti order:

    2020-04-09 08:13:30.736  INFO 11532 --- [nio-8443-exec-6] e.t.n.e.d.l.a.impl.DownStreamClass     : {"appName":"Spag-R-Us", "logMessage":"Spaghetti order was successful", "eventLogType":"SpaghettiOrder", "orderSuccessful": true}
  

The following grok pattern can then be applied by the data collector against the log output:

    %{TIMESTAMP_ISO8601:logDate}. %{LOGLEVEL:LogLevel}%{DATA:className} : %{GREEDYDATA:logMessageBody}
  

After applying the grok pattern the log output is transformed into this:

    {
  "logDate": [
    "2020-04-09 08:13:30.736"
  ],
  "LogLevel": [
    "INFO"
  ],
  "className": [
    " 11532 --- [nio-8443-exec-6] e.t.n.e.d.l.a.impl.DownStreamClass    "
  ],
  "logMessageBody": [
    "{"appName":"Spag-R-Us", "logMessage":"Spaghetti order was successful", "eventLogType":"SpaghettiOrder", "orderSuccessful": true}"
  ]
}
  

Then finally, the logMessageBody object is parsed into a JSON object by the data collector before being passed down to the log aggregator:

    {
	"appName":"Spag-R-Us",
	"logMessage":"Spaghetti order was successful",
	"eventLogType":"SpaghettiOrder",
	"orderSuccessful":true
}
  

At this point, we now have quantifiable data that the log aggregator can use for metrics calculations and further analysis.

What is a Log Aggregator?

A log aggregator is a software tool that collects logs all across a platform and stores them in a centralized location for a variety of purposes including alerting, troubleshooting or even machine learning. Elasticsearch and Graylog are examples of open source log aggregation software used by enterprises. In my team’s work we used Elasticsearch for log aggregation. Based on the log event data model we used, we were able to write Elasticsearch queries that we used for dashboard displays and monitoring alerting.

All this brings us back to the two questions that was asked in our spaghetti scenario

  • How many spaghetti orders were placed in the past 45 minutes (and they can’t query a production database)?
  • How many times was the meatball broken in the past 24 hours?

The first question is easily answered by designing a dashboard in the log aggregator that uses a query similar to this (implementation will vary for each log aggregator):

    COUNT("eventLogType":"SpaghettiOrder" AND "orderSuccessful": true)
  

And all log aggregators have an adjustable timespan that will produce results that occurred within the specified timespan. In our example, the timespan will be set for any occurrence in the past 45 minutes. This allows the query to be dynamic and easily viewable for any specified time period.

The second question as well has a very similar solution. The following query can be used, (implementation will vary for each log aggregator):

    COUNT("eventLogType":"ExceptionHappened" AND "logMessage": “Something broke the meatball!!!”)
  

Just like the first question, you can dynamically apply a timespan variable, in this case it will be set to 24 hours. Additionally, most log aggregators support query visualization like pie charts and bar charts which provide live updates as logs come in.

Twirling It All Back

The major points involved in logging instrumentation can be summarised as follows:

  • Build a data model to catalog the various events you expect to happen in your application.
  • Configure your data collector to parse your log and extract the data model you designed.
  • Configure your log aggregator to ingest data from your data collector.
  • Analyze, build dashboards and twirl.

Just like grandma’s spaghetti, the secret sauce to instrumentation is practice and seeing it be done firsthand. It might feel like extra coding, but the benefits pay off when you are able to provide valuable information without changing a line of code.

🍝🍝🍝


Anthony Okocha, Senior Software Engineer, Capital One

Software development is more than a career for me, it has been my hobby and passion in one form or another for the past 15 years. From development, testing and implementation, the ability of my team to reach our set goals in each of these phases is always paramount. Waving a dead chicken while standing on one foot (figuratively speaking of course) hoping your code compiles is one way to help your team, or you can just go the traditional route and do whatever it takes as a team member to ensure a project is completed...no poultry needed. In my roles as a team member or team leader, I have used humor as a means of effective communication and motivation. I do not believe my humor will get me a spot on Saturday Night Live, (SNL....it's IT we have to abbreviate), I do hope to write code that will one day control some of their comedy skits.


DISCLOSURE STATEMENT: © 2020 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.

Related Content