Introduction
I love and hate jq
at the same time, and I'm not the only one
The language is too hard to grasp for an occasional user like me, who usually
uses Python to deal with JSON. But yet, jq
is very useful in constrained
environments or if you need to grab quickly some value from some JSON file
without any further treatment.
Another language usable in a constrained environment is awk
and I love it. So
why not using jq
just to generate some lines-and-fields oriented
representation for JSON that awk would be happy with ? Is it really a good idea
?
xml2 does it -- for almost two
decades, so let's try to do the same with jq
.
The data and the mission
We'll use this data as example:
{
"students": [
{"name":"Alice","age":32,"notes":[7,6,10],"misc":{"hobbies":"gardening"}},
{"name":"Bob","age":42,"notes":[4,9,10],"misc":{"single":false}}
],
"school": "Hackers' school",
"class": null
}
... and we'll create a small report from it with awk
. We could have done it
in pure jq
, but it's only for demo purpose ;)
The jq
part
I'll go progressively from JSON to an awk
friendly dataset, making a good
intro to jq
if you're not familiar with it.
Getting the school and class
jq
uses .
for keys filtering, so we end up like this:
$ jq -r '.school, .class' test.json
Hackers' school
null
Humph. null
is pretty generic; we need an alternative value if a
value is false
or null
. In jqlang, the alternative
operator is //
:
$ jq -r '.school, .class // "Unknown Class"' test.json
Hackers' school
Unknown Class
Getting the names and ages
jq
uses []
as array indicator. You can then add
.<keyname>
to get values for children keys.
Then you're disappointed by trying to apply this:
jq -r '.school, .class // "Unknown Class",
.students[].name, .students[].age' /dev/shm/bad.json
Hackers' school
Unknown Class
Alice
Bob
32
42
It's gonna be annoying to parse. So we need to "combine" values. Here is something better:
jq -r '.school, .class // "Unknown Class",
(.students[] |.name,.age)' test.json
Hackers' school
Unknown Class
Alice
32
Bob
42
What are these parenthesis ? It means that it should display the result of that expression.
We also used the pipe |
operator. It's used to combine various filters,
here we're asking something like :
foreach s in students:
print(s["name"])
print(s["age"])
Getting the notes
Representing an array as a flat value is tricky:
jq -r '.school, .class // "Unknown Class",
( .students[] |.name,.age,
(.notes | join("\u0000")) )' test.json | cat -v
Hackers' school
Unknown Class
Alice
32
7^@6^@10
Bob
42
4^@9^@10
So here I'm applying a filter that will join all array values with the NUL
character. That's why I'm using cat -v
to show you them as ^@
.
Getting the misc part
Unlike the other parts, the data is varied, so we'll use to_entries[]
to get
keys and values:
jq -r '.school, .class // "Unknown Class",
( .students[] |.name,.age,
(.notes | join("\u0000")),
(.misc|to_entries[]|.key,.value|tostring) )' \
test.json | cat -v
Hackers' school
Unknown Class
Alice
32
7^@6^@10
hobbies
gardening
Bob
42
4^@9^@10
single
false
jq
by default filters on keys and display values, so
to_entries[]
allows you to temporarily change the paradigm.
jq
can't print non string values, hence the use of the tostring
filter.
Right now, keys and values are separated, but it's ok for now.
Preparing the output for awk
In jqlang, you can interpolate strings with \(variable_or_expression)
, also
we mark the end of each student record, because they could have had several misc
infos (or a missing value):
jq -r '.school, .class // "Unknown Class",
(.students[] |
"name\u0000\(.name)",
"age\u0000\(.age)",
"notes\u0000\(.notes | join("\u0000"))",
(.misc|to_entries[]|
"misc\u0000\(.key)\u0000\(.value|tostring)" ),
"__END_RECORD__")' \
test.json | cat -v
Hackers' school
Unknown Class
name^@Alice
age^@32
notes^@7^@6^@10
misc^@hobbies^@gardening
__END_RECORD__
name^@Bob
age^@42
notes^@4^@9^@10
misc^@single^@false
__END_RECORD__
Finally display the results with awk
This part assumes you're familiar with awk.
jq -r '.school, .class // "Unknown Class",
(.students[] |
"name\u0000\(.name)",
"age\u0000\(.age)",
"notes\u0000\(.notes | join("\u0000"))",
(.misc|to_entries[]|
"misc\u0000\(.key)\u0000\(.value|tostring)" ),
"__END_RECORD__")' \
test.json | awk -F '\0' '
NR == 1 {print "School:", $0; next}
NR == 2 {print "Class:", $0; next}
NR == 3 {print "Students:\n"}
$1 == "name" {name = $2}
$1 == "age" {age = $2}
$1 == "notes" {
total = 0
for (i=2; i <= NF; i++) {
notes = notes " " $i
total += $i
}
average = total / (NF - 1)
}
$1 == "misc" {
misc = misc "\n\t\t" $2 ":" $3
}
$1 == "__END_RECORD__" {
printf("\t%s (%d)", name, age)
printf("\t\tNotes:%s (average: %f)", notes, average)
printf("\n\tMisc:")
print misc "\n"
name = ""
age = 0
notes = ""
average = 0
misc = ""
}
'
School: Hackers' school
Class: Unknown Class
Students:
Alice (32) Notes: 7 6 10 (average: 7.666667)
Misc:
hobbies:gardening
Bob (42) Notes: 4 9 10 (average: 7.666667)
Misc:
single:false
Conclusion
You are better off with Nushell
xml2
is way easier to use with awk
, and no json2
has been created, but
with JSONPath getting popular, it may
happen one day (it's more tricky to implement than you think, I tried).
So, sure, you can chain jq
and awk
, but I don't think
it's very efficient, excepted if you have some high value added. Can we
consider the average as a high value added ? ... ahahah :P
I would not recommend it for complex, highly nested, JSON data though.