software engineering Oct 15, 2018

JSON Processing Pipelines with gron

As a polyglot programmer I strive to always employ the simplest approach and the best tools for the job. I have parsed JSON in Java, Python, and Go, but I think too many times we ignore the UNIX/Linux tools such as sed, awk, cut, etc. Too many programmers write hulking data parsers that are just overkill. With gron transformations, I find it easier to utilize these strong UNIX/Linux text editing, manipulation, and filtering tools.

While jq is powerful at parsing known JSON structures, its major shortcoming is that it requires one to know the JSON structure being parsed. gron is less restrictive and can be combined easily with the above Linux tools to build very powerful parsing pipelines, without having to know exactly where to expect a particular structure or value.

Installing gron

Instructions can be found here for installing gron. I used brew install gron, and then, for reasons that will be apparent later, I added the following alias:

alias norg=”gron — ungron”.

Make JSON greppable

Obviously, being text-based, JSON is already “greppable”. However, the strength of gron comes from its ability to split JSON into lines of what is referred to as “discrete assignments”.

Given the JSON snippet below (from an AWS EC2 CLI call):

{  
     "Reservations": [  
         {  
             "OwnerId": "<OWNER_ID>",   
             "ReservationId": "<RES_ID>",   
             "Groups": [],   
             "Instances": [  
                 {  
                     "Monitoring": {  
                         "State": "disabled"  
                     },   
                     "PublicDnsName": "",   
                     "State": {  
                         "Code": 16,   
                         "Name": "running"  
                     },   
                     "EbsOptimized": false,   
                     "LaunchTime": "2016-08-31T22:39:37.000Z",   
                     "PublicIpAddress": "<PUBLIC_IP>",   
                     "PrivateIpAddress": "<PRIVATE_IP>",   
                     "ProductCodes": [],   
                     "VpcId": "<VPC_ID>",   
                     "StateTransitionReason": "",   
                     "InstanceId": "<ID>",   
                     "ImageId": "<AMI_ID>",   
                     "PrivateDnsName": "<PRIVATE_DNS_NAME>",   
                     "KeyName": "<KEY_NAME>",   
                     "SecurityGroups": [...

gron will parse (cat ~/ec2.json | gron) and convert the JSON into lines of discrete assignments:

json = {};  
json.Reservations = [];  
json.Reservations[0] = {};  
json.Reservations[0].Groups = [];  
json.Reservations[0].Instances = [];  
json.Reservations[0].Instances[0] = {};  
json.Reservations[0].Instances[0].AmiLaunchIndex = 0;  
json.Reservations[0].Instances[0].Architecture = "x86_64";  
json.Reservations[0].Instances[0].BlockDeviceMappings = [];  
json.Reservations[0].Instances[0].BlockDeviceMappings[0] = {};  
json.Reservations[0].Instances[0].BlockDeviceMappings[0].DeviceName = "/dev/xvda";  
json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs = {};  
json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.AttachTime = "2016-08-21T22:00:41.000Z";  
json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.DeleteOnTermination = true;  
json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.Status = "attached";  
json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.VolumeId = "<VOL_ID>";  
json.Reservations[0].Instances[0].ClientToken = "<CLIENT_TOKEN>";  
json.Reservations[0].Instances[0].EbsOptimized = false;  
json.Reservations[0].Instances[0].Hypervisor = "xen";  
json.Reservations[0].Instances[0].ImageId = "<AMI_ID>";  
json.Reservations[0].Instances[0].InstanceId = "<ID>";  
json.Reservations[0].Instances[0].InstanceType = "t2.small";  
json.Reservations[0].Instances[0].KeyName = "<KEY_NAME>";  
json.Reservations[0].Instances[0].LaunchTime = "2016-08-31T22:39:37.000Z";  
json.Reservations[0].Instances[0].Monitoring = {};  
json.Reservations[0].Instances[0].Monitoring.State = "disabled";  
json.Reservations[0].Instances[0].NetworkInterfaces = [];  
json.Reservations[0].Instances[0].NetworkInterfaces[0] = {};  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Association = {};  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Association.IpOwnerId = "<OWNER_ID>";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Association.PublicDnsName = "";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Association.PublicIp = "<PUBLIC_IP>";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment = {};  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.AttachTime = "2016-08-21T22:00:40.000Z";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.AttachmentId = "<ENI_ID>";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.DeleteOnTermination = true;  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.DeviceIndex = 0;  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.Status = "attached";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Description = "Primary network interface";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Groups = [];  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Groups[0] = {};  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Groups[0].GroupId = "<SG_ID>";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Groups[0].GroupName = "Bastion";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].MacAddress = "<MAC_ADDRESS>";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].NetworkInterfaceId = "<ENI_ID>";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].OwnerId = "<OWNER_ID>";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddress = "<PRIVATE_IP>";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses = [];  
json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0] = {};  
json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Association = {};  
json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Association.IpOwnerId = "<OWNER_ID>";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Association.PublicDnsName = "";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Association.PublicIp = "<PUBLIC_IP>";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Primary = true;  
json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].PrivateIpAddress = "<PRIVATE_IP>";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].SourceDestCheck = true;  
json.Reservations[0].Instances[0].NetworkInterfaces[0].Status = "in-use";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].SubnetId = "<SUBNET_ID>";  
json.Reservations[0].Instances[0].NetworkInterfaces[0].VpcId = "<VPC_ID>";  
json.Reservations[0].Instances[0].Placement = {};  
json.Reservations[0].Instances[0].Placement.AvailabilityZone = "us-east-1a";  
json.Reservations[0].Instances[0].Placement.GroupName = "";  
json.Reservations[0].Instances[0].Placement.Tenancy = "default";  
json.Reservations[0].Instances[0].PrivateDnsName = "<DNS_NAME>";  
json.Reservations[0].Instances[0].PrivateIpAddress = "<PRIVATE_IP>";  
json.Reservations[0].Instances[0].ProductCodes = [];  
json.Reservations[0].Instances[0].PublicDnsName = "";  
json.Reservations[0].Instances[0].PublicIpAddress = "<PUBLIC_IP>";  
json.Reservations[0].Instances[0].RootDeviceName = "/dev/xvda";  
json.Reservations[0].Instances[0].RootDeviceType = "ebs";  
json.Reservations[0].Instances[0].SecurityGroups = [];...

Munging gron Output Through Command Line Pipelining

JSON is more compact than the gron output and suited for data structuring for transport and integration. While more verbose, the gron output is a more usable format for text searching, filtering, and manipulation via Linux’s text manipulation and filtering tools, or even sed and awk. For example, consider the following commands:

$ cat ~/ec2.json | gron | grep AvailabilityZone
json.Reservations[0].Instances[0].Placement.AvailabilityZone = “us-east-1a”;

The above command “pipeline” searches the gronned JSON for the text “AvailabilityZone” value and returns the discrete assignment line.

$ cat ~/ec2.json | gron | grep AvailabilityZone|cut -d\” -f2
us-east-1a

The above pipeline extracts the AvailabilityZone value via the Linux cut command.

$ cat ~/ec2s.json | gron | grep InstanceId | cut -d\” -f2
…
<ID_1>
<ID_2>
<ID_3>
…

The above pipeline pulls all the EC2 instance IDs from the AWS EC2 CLI output and creates a list of IDs.

Transforming JSON with gron and ungron (a.k.a. norg)

Earlier, I referenced the norg alias that pointed to the ungron command. With this command, gron will transform gron discrete assignments back into JSON. Consider the commands below:

Note: cat was removed and gron was called directly.

$ gron ~/ec2s.json | grep InstanceId | norg
...
{
      "Instances": [
        {
          "InstanceId": "<ID>"
        }
      ]
    },
    {
      "Instances": [
        {
          "InstanceId": "<ID>"
        }
      ]
    },
...

The above pipeline grons the JSON, greps for the InstanceId field, and then converts the lines of discrete assignments

(json.Reservations[999].Instances[0].InstanceId = “<ID>”;)

from the grepped gron output back into usable and simplified JSON.

$ gron ~/ec2s.json | egrep InstanceId\|ImageId | norg
...
    {
      "Instances": [
        {
          "ImageId": "<AMI_ID>",
          "InstanceId": "<ID>"
        }
      ]
    },
    {
      "Instances": [
        {
          "ImageId": "<AMI_ID>",
          "InstanceId": "<ID>"
        }
      ]
    },
...

The above pipeline adds ImageId to the transformed JSON using egrep (Yes, I know GNU has deprecated egrep in lieu of grep -E.) .

sed

sed is a powerful stream editor and is handy for executing find/replace algorithms on text files.

$ gron ~/ec2s.json | egrep InstanceId\|ImageId\|InstanceType | sed -e ‘s/Instances/node/g;s/ImageId/ami/g;s/InstanceType/type/g;s/InstanceId/id/g’ | norg
...
{
      "node": [
        {
          "ami": "<AMI_ID>",
          "id": "<ID>",
          "type": "t2.small"
        }
      ]
    },
    {
      "node": [
        {
          "ami": "<AMI_ID>",
          "id": "<ID>",
          "type": "t2.micro"
        }
      ]
    },
...

The above pipeline adds stream editing with sed to perform multiple inline string replacements.

$ gron ~/ec2s.json | egrep InstanceId\|ImageId\|InstanceType | sed -e ‘s/Instances/node/g;s/ImageId/ami/g;s/InstanceType/type/g;s/InstanceId/id/g’ | norg | tr -d ‘\n’ | sed “s/ //g”
...
{"node":[{"ami":"<AMI_ID>","id":"<ID>","type":"t2.small"}]},{"node":[{"ami":"<AMI_ID>","id":"<ID>","type":"t2.micro"}]},
...

The above pipeline adds the translate command, tr, to remove newline characters, and then another sed command to remove remaining whitespace. This is handy for minimizing JSON files.

Summary

gron converts structured JSON into lines of discrete assignments. That transformation enables the process to pipeline text to native tools like grep and sed to perform powerful text manipulation. Once manipulated, the discrete assignments can be transformed back into JSON via the gron -u| — ungron command. This makes gron a complement to existing tools like grep and sed, for munging (a.k.a. manipulating) JSON data.

Jimmy Ray
Lead Software Engineer

DISCLOSURE STATEMENT: These opinions are those of the author. Unless noted otherwise in this post, Capital One is not affiliated with, nor is it endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are the ownership of their respective owners. This article is © 2018 Capital One.