Handling JSON data with jq

CE West 2020-03-03 3 min read

For a new project I’ve unwillingly transitioned from the safety and comfort of CSV files to a scary new world of JSON files.

jq is a command-line JSON processor that was recommended to me by a colleague. Being written in C with no dependencies, it’s very easy to install.

Here’s what won me over:

jq is like sed for JSON data - you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.

Swoon!

A jq program/command filters a stream of JSON data in a way that should be familiar to users of sed, awk and grep. By default, the output to standard out is also nicely formatted JSON.

As an example, let’s get some information from the PDB about my favourite entry, 1CEW:

clare$ curl --request GET https://www.rcsb.org/pdb/json/describeMol?structureId=1cew 
{"id":"1CEW","polymerDescriptions":[{"entityNr":"1","length":"108","chain":{"id":"I"},"polymerDescription":{"description":"CYSTATIN"}}]}

Nobody has time to read unformatted JSON, so let’s pipe that through the simplest jq programme – jq '.' – which parses the JSON and makes no change other than formatting it nicely:

clare$ curl --request GET https://www.rcsb.org/pdb/json/describeMol?structureId=1cew | jq '.'
{
  "id": "1CEW",
  "polymerDescriptions": [
    {
      "entityNr": "1",
      "length": "108",
      "chain": {
        "id": "I"
      },
      "polymerDescription": {
        "description": "CYSTATIN"
      }
    }
  ]
}

We can retrive the values of keys that we care about using ‘.field’, which produces the value at the key ‘field’, or null if there is no value present:

clare$ curl --request GET https://www.rcsb.org/pdb/json/describeMol?structureId=1cew > test.json
clare$ cat test.json | jq '.id'
"1CEW"
clare$ cat test.json | jq '.name'
null

We can also filter for particular fields that we care about, outputting in JSON format:

clare$ cat test.json | jq '{id}'
{
  "id": "1CEW"
}

jq commands can be piped together using `|`

`.a | .b | .c` or, equivalently, `a.b.c`

For example, to get the length of the first entity, we can filter for `polymerDescription` and then filter for the field `length`

```console
clare$ cat test.json | jq '.polymerDescriptions[] | .length'
"108"
clare$ cat test.json | jq '.polymerDescriptions[].length'
"108"

You can index or slice arrays, for example:

  • .[0] – first element in the array
  • .[2] – third element in the array
  • .[-1] – last element
  • .[1-5] – second to fifth elements

For example, the result of our API request is only one JSON object, if it was many we could use jq '.[0]' to only get the first.

To iterate over the elements and perform the commands on each individually, use: .[]

Using jq, you can therefore easily manipulate JSON data including filtering, renaming, creating and subsetting fields:

clare$ cat test.json | jq '{id, protein: "Chicken egg white cystatin", chain: .polymerDescriptions[0].chain.id, length: .polymerDescriptions[0].length}'
{
  "id": "1CEW",
  "protein": "Chicken egg white cystatin",
  "chain": "I",
  "length": "108"
}

Maybe you want to extract all the objects from a large JSON file with a particular field value:

clare$ jq 'select(.thing=="thingofinterest")' large.json

If you’re saving to a file, you may prefer to use the option --compact-output/-c to generate a more compact output in which each JSON object is on a single line.

See the jq manual for more information and lots of other cool things you can do!