Maps¶
Miller data types are listed on the Data types page; here we focus specifically on maps.
On the whole, maps are as in most other programming languages. However, following the Principle of Least Surprise and aiming to reduce keystroking for Miller's most-used streaming-record-processing model, there are a few differences as noted below.
Types of maps¶
Map literals are written in curly braces with string keys any Miller data type (including other maps, or arrays) as values. Also, integers may be given as keys although they'll be stored as strings.
mlr -n put ' end { x = {"a": 1, "b": {"x": 2, "y": [3,4,5]}, 99: true}; dump x; print x[99]; print x["99"]; } '
{ "a": 1, "b": { "x": 2, "y": [3, 4, 5] }, "99": true } true true
As with arrays and argument-lists, trailing commas are supported:
mlr -n put ' end { x = { "a" : 1, "b" : 2, "c" : 3, }; print x; } '
{ "a": 1, "b": 2, "c": 3 }
The current record, accessible using $*
, is a map.
mlr --csv --from example.csv head -n 2 then put -q ' dump $*; print "Color is", $*["color"]; '
{ "color": "yellow", "shape": "triangle", "flag": "true", "k": 1, "index": 11, "quantity": 43.6498, "rate": 9.8870 } Color is yellow { "color": "red", "shape": "square", "flag": "true", "k": 2, "index": 15, "quantity": 79.2778, "rate": 0.0130 } Color is red
The collection of all out-of-stream variables, @*
, is a map.
mlr --csv --from example.csv put -q ' begin { @last_rates = {}; } @last_rates[$shape] = $rate; @last_color = $color; end { dump @*; } '
{ "last_rates": { "triangle": 5.8240, "square": 8.2430, "circle": 8.3350 }, "last_color": "purple" }
Also note that several built-in functions operate on maps and/or return maps.
Insertion order is preserved¶
Miller maps preserve insertion order. So if you write @m["y"]=7
and then @m["x"]=3
then any loop over
the map @m
will give you the kays "y"
and "x"
in that order.
String keys, with conversion from/to integer¶
All Miller map keys are strings. If a map is indexed with an integer for either
read or write (i.e. on either the right-hand side or left-hand side of an
assignment) then the integer will be converted to/from string, respectively. So
@m[3]
is the same as @m["3"]
. The reason for this is for situations like
operating on all records where it's important to
let people do @records[NR] = $*
.
Auto-create¶
Indexing any as-yet-assigned local variable or out-of-stream variable results in auto-create of that variable as a map variable:
mlr --csv --from example.csv put -q ' # You can do this but you do not need to: # begin { @last_rates = {} } @last_rates[$shape] = $rate; end { dump @last_rates; } '
{ "triangle": 5.8240, "square": 8.2430, "circle": 8.3350 }
This also means that auto-create results in maps, not arrays, even if keys are integers.
If you want to auto-extend an array, initialize it explicitly to []
.
mlr --csv --from example.csv head -n 4 then put -q ' begin { @my_array = []; } @my_array[NR] = $quantity; @my_map[NR] = $rate; end { dump } '
{ "my_array": [43.6498, 79.2778, 13.8103, 77.5542], "my_map": { "1": 9.8870, "2": 0.0130, "3": 2.9010, "4": 7.4670 } }
Auto-deepen¶
Similarly, maps are auto-deepened: you can put @m["a"]["b"]["c"]=3
without first setting @m["a"]={}
and @m["a"]["b"]={}
. The reason for this
is for doing data aggregations: for example if you want compute keyed sums, you
can do that with a minimum of keystrokes.
mlr --icsv --opprint --from example.csv put -q ' @quantity_sum[$color][$shape] += $rate; end { emit @quantity_sum, "color", "shape"; } '
color shape quantity_sum yellow triangle 9.8870 yellow circle 12.572000000000001 red square 17.011 red circle 2.9010 purple triangle 14.415 purple square 8.2430
Looping¶
See single-variable for-loops and key-value for-loops.
Map-valued fields in CSV files¶
See the flatten/unflatten page.