Making jq and awk work together

Introduction

I love and hate jq at the same time, and I'm not the only one

The language is too hard to grasp for an occasional user like me, who usually uses Python to deal with JSON. But yet, jq is very useful in constrained environments or if you need to grab quickly some value from some JSON file without any further treatment.

Another language usable in a constrained environment is awk and I love it. So why not using jq just to generate some lines-and-fields oriented representation for JSON that awk would be happy with ? Is it really a good idea ?

xml2 does it -- for almost two decades, so let's try to do the same with jq.

The data and the mission

We'll use this data as example:

{
    "students": [
        {"name":"Alice","age":32,"notes":[7,6,10],"misc":{"hobbies":"gardening"}},
        {"name":"Bob","age":42,"notes":[4,9,10],"misc":{"single":false}}
    ],
    "school": "Hackers' school",
    "class": null
}

... and we'll create a small report from it with awk. We could have done it in pure jq, but it's only for demo purpose ;)

The `jq` part

I'll go progressively from JSON to an awk friendly dataset, making a good intro to jq if you're not familiar with it.

Getting the school and class

jq uses . for keys filtering, so we end up like this:

$ jq -r '.school, .class' test.json

Hackers' school
null

Humph. null is pretty generic; we need an alternative value if a value is false or null. In jqlang, the alternative operator is // :

$ jq -r '.school, .class // "Unknown Class"' test.json

Hackers' school
Unknown Class

Getting the names and ages

jq uses [] as array indicator. You can then add .<keyname> to get values for children keys.

Then you're disappointed by trying to apply this:

jq -r '.school, .class // "Unknown Class",
       .students[].name, .students[].age' /dev/shm/bad.json

Hackers' school
Unknown Class
Alice
Bob
32
42

It's gonna be annoying to parse. So we need to "combine" values. Here is something better:

jq -r '.school, .class // "Unknown Class",
      (.students[] |.name,.age)' test.json

Hackers' school
Unknown Class
Alice
32
Bob
42

What are these parenthesis ? It means that it should display the result of that expression.

We also used the pipe | operator. It's used to combine various filters, here we're asking something like :

foreach s in students:
    print(s["name"])
    print(s["age"])

Getting the notes

Representing an array as a flat value is tricky:

jq -r '.school, .class // "Unknown Class",
      ( .students[] |.name,.age,
       (.notes | join("\u0000")) )' test.json | cat -v

Hackers' school
Unknown Class
Alice
32
7^@6^@10
Bob
42
4^@9^@10

So here I'm applying a filter that will join all array values with the NUL character. That's why I'm using cat -v to show you them as ^@.

Getting the misc part

Unlike the other parts, the data is varied, so we'll use to_entries[] to get keys and values:

jq -r '.school, .class // "Unknown Class",
      ( .students[] |.name,.age,
       (.notes | join("\u0000")),
       (.misc|to_entries[]|.key,.value|tostring) )' \
test.json | cat -v

Hackers' school
Unknown Class
Alice
32
7^@6^@10
hobbies
gardening
Bob
42
4^@9^@10
single
false

jq by default filters on keys and display values, so to_entries[] allows you to temporarily change the paradigm.

jq can't print non string values, hence the use of the tostring filter.

Right now, keys and values are separated, but it's ok for now.

Preparing the output for `awk`

In jqlang, you can interpolate strings with \(variable_or_expression), also we mark the end of each student record, because they could have had several misc infos (or a missing value):

jq -r '.school, .class // "Unknown Class",
      (.students[] |
       "name\u0000\(.name)",
       "age\u0000\(.age)",
       "notes\u0000\(.notes | join("\u0000"))",
       (.misc|to_entries[]|
        "misc\u0000\(.key)\u0000\(.value|tostring)" ),
       "__END_RECORD__")' \
test.json | cat -v
Hackers' school
Unknown Class
name^@Alice
age^@32
notes^@7^@6^@10
misc^@hobbies^@gardening
__END_RECORD__
name^@Bob
age^@42
notes^@4^@9^@10
misc^@single^@false
__END_RECORD__

Finally display the results with `awk`

This part assumes you're familiar with awk.

jq -r '.school, .class // "Unknown Class",
      (.students[] |
       "name\u0000\(.name)",
       "age\u0000\(.age)",
       "notes\u0000\(.notes | join("\u0000"))",
       (.misc|to_entries[]|
	    "misc\u0000\(.key)\u0000\(.value|tostring)" ),
       "__END_RECORD__")' \
test.json | awk -F '\0' '
       NR == 1      {print "School:", $0; next}
       NR == 2      {print "Class:", $0; next}
       NR == 3      {print "Students:\n"}
       $1 == "name" {name = $2}
       $1 == "age"  {age = $2}
       $1 == "notes" {
           total = 0
           for (i=2; i &lt;= NF; i++) {
               notes = notes " " $i
               total += $i
           }
           average = total / (NF - 1)
       }
       $1 == "misc" {
           misc = misc "\n\t\t" $2 ":" $3
       }
       $1 == "__END_RECORD__" {
           printf("\t%s (%d)", name, age)
           printf("\t\tNotes:%s (average: %f)", notes, average)
           printf("\n\tMisc:")
           print misc "\n"
           name = ""
           age = 0
           notes = ""
           average = 0
           misc = ""
       }
'

School: Hackers' school
Class: Unknown Class
Students:

        Alice (32)              Notes: 7 6 10 (average: 7.666667)
        Misc:
                hobbies:gardening

        Bob (42)                Notes: 4 9 10 (average: 7.666667)
        Misc:
                single:false

Conclusion

You are better off with Nushell

xml2 is way easier to use with awk, and no json2 has been created, but with JSONPath getting popular, it may happen one day (it's more tricky to implement than you think, I tried).

So, sure, you can chain jq and awk, but I don't think it's very efficient, excepted if you have some high value added. Can we consider the average as a high value added ? ... ahahah :P

I would not recommend it for complex, highly nested, JSON data though.