Skip to content
Quick links:   Flags   Verbs   Functions   Glossary   Release docs

Miller in 10 minutes

Miller verbs

Let's take a quick look at some of the most useful Miller verbs -- file-format-aware, name-index-empowered equivalents of standard system commands.

For most of this section we'll use our example.csv.

mlr cat is like system cat (or type on Windows) -- it passes the data through unmodified:

mlr --csv cat example.csv
color,shape,flag,k,index,quantity,rate
yellow,triangle,true,1,11,43.6498,9.8870
red,square,true,2,15,79.2778,0.0130
red,circle,true,3,16,13.8103,2.9010
red,square,false,4,48,77.5542,7.4670
purple,triangle,false,5,51,81.2290,8.5910
red,square,false,6,64,77.1991,9.5310
purple,triangle,false,7,65,80.1405,5.8240
yellow,circle,true,8,73,63.9785,4.2370
yellow,circle,true,9,87,63.5058,8.3350
purple,square,false,10,91,72.3735,8.2430

But mlr cat can also do format conversion -- for example, you can pretty-print in tabular format:

mlr --icsv --opprint cat example.csv
color  shape    flag  k  index quantity rate
yellow triangle true  1  11    43.6498  9.8870
red    square   true  2  15    79.2778  0.0130
red    circle   true  3  16    13.8103  2.9010
red    square   false 4  48    77.5542  7.4670
purple triangle false 5  51    81.2290  8.5910
red    square   false 6  64    77.1991  9.5310
purple triangle false 7  65    80.1405  5.8240
yellow circle   true  8  73    63.9785  4.2370
yellow circle   true  9  87    63.5058  8.3350
purple square   false 10 91    72.3735  8.2430

mlr head and mlr tail count records rather than lines. Whether you're getting the first few records or the last few, the CSV header is included either way:

mlr --csv head -n 4 example.csv
color,shape,flag,k,index,quantity,rate
yellow,triangle,true,1,11,43.6498,9.8870
red,square,true,2,15,79.2778,0.0130
red,circle,true,3,16,13.8103,2.9010
red,square,false,4,48,77.5542,7.4670
mlr --csv tail -n 4 example.csv
color,shape,flag,k,index,quantity,rate
purple,triangle,false,7,65,80.1405,5.8240
yellow,circle,true,8,73,63.9785,4.2370
yellow,circle,true,9,87,63.5058,8.3350
purple,square,false,10,91,72.3735,8.2430
mlr --icsv --ojson tail -n 2 example.csv
{
  "color": "yellow",
  "shape": "circle",
  "flag": "true",
  "k": 9,
  "index": 87,
  "quantity": 63.5058,
  "rate": 8.3350
}
{
  "color": "purple",
  "shape": "square",
  "flag": "false",
  "k": 10,
  "index": 91,
  "quantity": 72.3735,
  "rate": 8.2430
}

You can sort on a single field:

mlr --icsv --opprint sort -f shape example.csv
color  shape    flag  k  index quantity rate
red    circle   true  3  16    13.8103  2.9010
yellow circle   true  8  73    63.9785  4.2370
yellow circle   true  9  87    63.5058  8.3350
red    square   true  2  15    79.2778  0.0130
red    square   false 4  48    77.5542  7.4670
red    square   false 6  64    77.1991  9.5310
purple square   false 10 91    72.3735  8.2430
yellow triangle true  1  11    43.6498  9.8870
purple triangle false 5  51    81.2290  8.5910
purple triangle false 7  65    80.1405  5.8240

Or, you can sort primarily alphabetically on one field, then secondarily numerically descending on another field, and so on:

mlr --icsv --opprint sort -f shape -nr index example.csv
color  shape    flag  k  index quantity rate
yellow circle   true  9  87    63.5058  8.3350
yellow circle   true  8  73    63.9785  4.2370
red    circle   true  3  16    13.8103  2.9010
purple square   false 10 91    72.3735  8.2430
red    square   false 6  64    77.1991  9.5310
red    square   false 4  48    77.5542  7.4670
red    square   true  2  15    79.2778  0.0130
purple triangle false 7  65    80.1405  5.8240
purple triangle false 5  51    81.2290  8.5910
yellow triangle true  1  11    43.6498  9.8870

If there are fields you don't want to see in your data, you can use cut to keep only the ones you want, in the same order they appeared in the input data:

mlr --icsv --opprint cut -f flag,shape example.csv
shape    flag
triangle true
square   true
circle   true
square   false
triangle false
square   false
triangle false
circle   true
circle   true
square   false

You can also use cut -o to keep specified fields, but in your preferred order:

mlr --icsv --opprint cut -o -f flag,shape example.csv
flag  shape
true  triangle
true  square
true  circle
false square
false triangle
false square
false triangle
true  circle
true  circle
false square

You can use cut -x to omit fields you don't care about:

mlr --icsv --opprint cut -x -f flag,shape example.csv
color  k  index quantity rate
yellow 1  11    43.6498  9.8870
red    2  15    79.2778  0.0130
red    3  16    13.8103  2.9010
red    4  48    77.5542  7.4670
purple 5  51    81.2290  8.5910
red    6  64    77.1991  9.5310
purple 7  65    80.1405  5.8240
yellow 8  73    63.9785  4.2370
yellow 9  87    63.5058  8.3350
purple 10 91    72.3735  8.2430

Even though Miller's main selling point is name-indexing, sometimes you really want to refer to a field name by its positional index. Use $[[3]] to access the name of field 3 or $[[[3]]] to access the value of field 3:

mlr --icsv --opprint put '$[[3]] = "NEW"' example.csv
color  shape    NEW   k  index quantity rate
yellow triangle true  1  11    43.6498  9.8870
red    square   true  2  15    79.2778  0.0130
red    circle   true  3  16    13.8103  2.9010
red    square   false 4  48    77.5542  7.4670
purple triangle false 5  51    81.2290  8.5910
red    square   false 6  64    77.1991  9.5310
purple triangle false 7  65    80.1405  5.8240
yellow circle   true  8  73    63.9785  4.2370
yellow circle   true  9  87    63.5058  8.3350
purple square   false 10 91    72.3735  8.2430
mlr --icsv --opprint put '$[[[3]]] = "NEW"' example.csv
color  shape    flag k  index quantity rate
yellow triangle NEW  1  11    43.6498  9.8870
red    square   NEW  2  15    79.2778  0.0130
red    circle   NEW  3  16    13.8103  2.9010
red    square   NEW  4  48    77.5542  7.4670
purple triangle NEW  5  51    81.2290  8.5910
red    square   NEW  6  64    77.1991  9.5310
purple triangle NEW  7  65    80.1405  5.8240
yellow circle   NEW  8  73    63.9785  4.2370
yellow circle   NEW  9  87    63.5058  8.3350
purple square   NEW  10 91    72.3735  8.2430

You can find the full list of verbs at the Verbs Reference page.

Filtering

You can use filter to keep only records you care about:

mlr --icsv --opprint filter '$color == "red"' example.csv
color shape  flag  k index quantity rate
red   square true  2 15    79.2778  0.0130
red   circle true  3 16    13.8103  2.9010
red   square false 4 48    77.5542  7.4670
red   square false 6 64    77.1991  9.5310
mlr --icsv --opprint filter '$color == "red" && $flag == true' example.csv

Computing new fields

You can use put to create new fields which are computed from other fields:

mlr --icsv --opprint put '
  $ratio = $quantity / $rate;
  $color_shape = $color . "_" . $shape
' example.csv
color  shape    flag  k  index quantity rate   ratio              color_shape
yellow triangle true  1  11    43.6498  9.8870 4.414868008496004  yellow_triangle
red    square   true  2  15    79.2778  0.0130 6098.292307692308  red_square
red    circle   true  3  16    13.8103  2.9010 4.760530851430541  red_circle
red    square   false 4  48    77.5542  7.4670 10.386259541984733 red_square
purple triangle false 5  51    81.2290  8.5910 9.455127458968688  purple_triangle
red    square   false 6  64    77.1991  9.5310 8.099790158430384  red_square
purple triangle false 7  65    80.1405  5.8240 13.760388049450551 purple_triangle
yellow circle   true  8  73    63.9785  4.2370 15.09995279679018  yellow_circle
yellow circle   true  9  87    63.5058  8.3350 7.619172165566886  yellow_circle
purple square   false 10 91    72.3735  8.2430 8.779995147397793  purple_square

When you create a new field, it can immediately be used in subsequent statements:

mlr --icsv --opprint --from example.csv put '
  $y = $index + 1;
  $z = $y**2 + $k;
'
color  shape    flag  k  index quantity rate   y  z
yellow triangle true  1  11    43.6498  9.8870 12 145
red    square   true  2  15    79.2778  0.0130 16 258
red    circle   true  3  16    13.8103  2.9010 17 292
red    square   false 4  48    77.5542  7.4670 49 2405
purple triangle false 5  51    81.2290  8.5910 52 2709
red    square   false 6  64    77.1991  9.5310 65 4231
purple triangle false 7  65    80.1405  5.8240 66 4363
yellow circle   true  8  73    63.9785  4.2370 74 5484
yellow circle   true  9  87    63.5058  8.3350 88 7753
purple square   false 10 91    72.3735  8.2430 92 8474

For put and filter we were able to type out expressions using a programming-language syntax. See the Miller programming language page for more information.

Handling field names with spaces

Sometimes our data has field names with spaces, dots, etc in them.

For verbs such as sort, cut, etc., no special syntax is necessary:

mlr --csv cat spaces.csv
Zone,Total MWh
14,27.2
17,39.8
24,7.4
30,50.5
mlr --c2p sort -nr 'Total MWh' spaces.csv
Zone Total MWh
30   50.5
17   39.8
14   27.2
24   7.4

For put and filter expressions, use ${...}:

mlr --c2p put '${Total KWh} = ${Total MWh} * 1000' spaces.csv
Zone Total MWh Total KWh
14   27.2      27200
17   39.8      39800
24   7.4       7400
30   50.5      50500

See also the section on field names.

Multiple input files

Miller takes all the files from the command line as an input stream. But it's format-aware, so it doesn't repeat CSV header lines. For example, with input files data/a.csv and data/b.csv, the system cat command will repeat header lines:

cat data/a.csv
a,b,c
1,2,3
4,5,6
cat data/b.csv
a,b,c
7,8,9
cat data/a.csv data/b.csv
a,b,c
1,2,3
4,5,6
a,b,c
7,8,9

However, mlr cat will not:

mlr --csv cat data/a.csv data/b.csv
a,b,c
1,2,3
4,5,6
7,8,9

Chaining verbs together

Often we want to chain queries together -- for example, sorting by a field and taking the top few values. We can do this using pipes:

mlr --csv sort -nr index example.csv | mlr --icsv --opprint head -n 3
color  shape  flag  k  index quantity rate
purple square false 10 91    72.3735  8.2430
yellow circle true  9  87    63.5058  8.3350
yellow circle true  8  73    63.9785  4.2370

This works fine -- but Miller also lets you chain verbs together using the word then. Think of this as a Miller-internal pipe that lets you use fewer keystrokes:

mlr --icsv --opprint sort -nr index then head -n 3 example.csv
color  shape  flag  k  index quantity rate
purple square false 10 91    72.3735  8.2430
yellow circle true  9  87    63.5058  8.3350
yellow circle true  8  73    63.9785  4.2370

As another convenience, you can put the filename first using --from. When you're interacting with your data at the command line, this makes it easier to up-arrow and append to the previous command:

mlr --icsv --opprint --from example.csv sort -nr index then head -n 3
color  shape  flag  k  index quantity rate
purple square false 10 91    72.3735  8.2430
yellow circle true  9  87    63.5058  8.3350
yellow circle true  8  73    63.9785  4.2370
mlr --icsv --opprint --from example.csv \
  sort -nr index \
  then head -n 3 \
  then cut -f shape,quantity
shape  quantity
square 72.3735
circle 63.5058
circle 63.9785

Sorts and stats

Now suppose you want to sort the data on a given column, and then take the top few in that ordering. You can use Miller's then feature to pipe commands together.

Here are the records with the top three index values:

mlr --icsv --opprint sort -nr index then head -n 3 example.csv
color  shape  flag  k  index quantity rate
purple square false 10 91    72.3735  8.2430
yellow circle true  9  87    63.5058  8.3350
yellow circle true  8  73    63.9785  4.2370

Lots of Miller commands take a -g option for group-by: here, head -n 1 -g shape outputs the first record for each distinct value of the shape field. This means we're finding the record with highest index field for each distinct shape field:

mlr --icsv --opprint sort -f shape -nr index then head -n 1 -g shape example.csv
color  shape    flag  k  index quantity rate
yellow circle   true  9  87    63.5058  8.3350
purple square   false 10 91    72.3735  8.2430
purple triangle false 7  65    80.1405  5.8240

Statistics can be computed with or without group-by field(s):

mlr --icsv --opprint --from example.csv \
  stats1 -a count,min,mean,max -f quantity -g shape
shape    quantity_count quantity_min quantity_mean     quantity_max
triangle 3              43.6498      68.33976666666666 81.229
square   4              72.3735      76.60114999999999 79.2778
circle   3              13.8103      47.0982           63.9785
mlr --icsv --opprint --from example.csv \
  stats1 -a count,min,mean,max -f quantity -g shape,color
shape    color  quantity_count quantity_min quantity_mean      quantity_max
triangle yellow 1              43.6498      43.6498            43.6498
square   red    3              77.1991      78.01036666666666  79.2778
circle   red    1              13.8103      13.8103            13.8103
triangle purple 2              80.1405      80.68475000000001  81.229
circle   yellow 2              63.5058      63.742149999999995 63.9785
square   purple 1              72.3735      72.3735            72.3735

If your output has a lot of columns, you can use XTAB format to line things up vertically for you instead:

mlr --icsv --oxtab --from example.csv \
  stats1 -a p0,p10,p25,p50,p75,p90,p99,p100 -f rate
rate_p0   0.0130
rate_p10  2.9010
rate_p25  4.2370
rate_p50  8.2430
rate_p75  8.5910
rate_p90  9.8870
rate_p99  9.8870
rate_p100 9.8870

Unicode and internationalization

While Miller's function names, verb names, online help, etc. are all in English, Miller supports UTF-8 data. For example:

cat παράδειγμα.csv
χρώμα,σχήμα,σημαία,κ,δείκτης,ποσότητα,ρυθμός
κίτρινο,τρίγωνο,αληθινό,1,11,43.6498,9.8870
κόκκινο,τετράγωνο,αληθινό,2,15,79.2778,0.0130
κόκκινο,κύκλος,αληθινό,3,16,13.8103,2.9010
κόκκινο,τετράγωνο,ψευδές,4,48,77.5542,7.4670
μοβ,τρίγωνο,ψευδές,5,51,81.2290,8.5910
κόκκινο,τετράγωνο,ψευδές,6,64,77.1991,9.5310
μοβ,τρίγωνο,ψευδές,7,65,80.1405,5.8240
κίτρινο,κύκλος,αληθινό,8,73,63.9785,4.2370
κίτρινο,κύκλος,αληθινό,9,87,63.5058,8.3350
μοβ,τετράγωνο,ψευδές,10,91,72.3735,8.2430
mlr --c2p filter '$σχήμα == "κύκλος"' παράδειγμα.csv
χρώμα   σχήμα  σημαία  κ δείκτης ποσότητα ρυθμός
κόκκινο κύκλος αληθινό 3 16      13.8103  2.9010
κίτρινο κύκλος αληθινό 8 73      63.9785  4.2370
κίτρινο κύκλος αληθινό 9 87      63.5058  8.3350
mlr --c2p sort -f σημαία παράδειγμα.csv
χρώμα   σχήμα     σημαία  κ  δείκτης ποσότητα ρυθμός
κίτρινο τρίγωνο   αληθινό 1  11      43.6498  9.8870
κόκκινο τετράγωνο αληθινό 2  15      79.2778  0.0130
κόκκινο κύκλος    αληθινό 3  16      13.8103  2.9010
κίτρινο κύκλος    αληθινό 8  73      63.9785  4.2370
κίτρινο κύκλος    αληθινό 9  87      63.5058  8.3350
κόκκινο τετράγωνο ψευδές  4  48      77.5542  7.4670
μοβ     τρίγωνο   ψευδές  5  51      81.2290  8.5910
κόκκινο τετράγωνο ψευδές  6  64      77.1991  9.5310
μοβ     τρίγωνο   ψευδές  7  65      80.1405  5.8240
μοβ     τετράγωνο ψευδές  10 91      72.3735  8.2430
mlr --c2p put '$форма = toupper($форма); $длина = strlen($цвет)' пример.csv
цвет       форма       флаг   κ  индекс количество скорость длина
желтый     ТРЕУГОЛЬНИК истина 1  11     43.6498    9.8870   6
красный    КВАДРАТ     истина 2  15     79.2778    0.0130   7
красный    КРУГ        истина 3  16     13.8103    2.9010   7
красный    КВАДРАТ     ложь   4  48     77.5542    7.4670   7
фиолетовый ТРЕУГОЛЬНИК ложь   5  51     81.2290    8.5910   10
красный    КВАДРАТ     ложь   6  64     77.1991    9.5310   7
фиолетовый ТРЕУГОЛЬНИК ложь   7  65     80.1405    5.8240   10
желтый     КРУГ        истина 8  73     63.9785    4.2370   6
желтый     КРУГ        истина 9  87     63.5058    8.3350   6
фиолетовый КВАДРАТ     ложь   10 91     72.3735    8.2430   10

File formats and format conversion

Miller supports the following formats:

  • CSV (comma-separared values)
  • TSV (tab-separated values)
  • JSON (JavaScript Object Notation)
  • PPRINT (pretty-printed tabular)
  • XTAB (vertical-tabular or sideways-tabular)
  • NIDX (numerically indexed, label-free, with implicit labels "1", "2", etc.)
  • DKVP (delimited key-value pairs).

What's a CSV file, really? It's an array of rows, or records, each being a list of key-value pairs, or fields: for CSV it so happens that all the keys are shared in the header line and the values vary from one data line to another.

For example, if you have:

shape,flag,index
circle,1,24
square,0,36

then that's a way of saying:

shape=circle,flag=1,index=24
shape=square,flag=0,index=36

Other ways to write the same data:

CSV                   PPRINT
shape,flag,index      shape  flag index
circle,1,24           circle 1    24
square,0,36           square 0    36

JSON                  XTAB
{                     shape circle
  "shape": "circle",  flag  1
  "flag": 1,          index 24
  "index": 24         .
}                     shape square
{                     flag  0
  "shape": "square",  index 36
  "flag": 0,
  "index": 36
}

DKVP
shape=circle,flag=1,index=24
shape=square,flag=0,index=36

Anything we can do with CSV input data, we can do with any other format input data. And you can read from one format, do any record-processing, and output to the same format as the input, or to a different output format.

How to specify these to Miller:

  • If you use --csv or --json or --pprint, etc., then Miller will use that format for input and output.
  • If you use --icsv and --ojson (note the extra i and o) then Miller will use CSV for input and JSON for output, etc. See also Keystroke Savers for even shorter options like --c2j.

You can read more about this at the File Formats page.

If all record values are numbers, strings, etc., then converting back and forth between CSV and JSON is a matter of specifying input-format and output-format flags:

mlr --json cat example.json
{
  "color": "yellow",
  "shape": "triangle",
  "flag": "true",
  "k": 1,
  "index": 11,
  "quantity": 43.6498,
  "rate": 9.8870
}
{
  "color": "red",
  "shape": "square",
  "flag": "true",
  "k": 2,
  "index": 15,
  "quantity": 79.2778,
  "rate": 0.0130
}
mlr --ijson --ocsv cat example.json
color,shape,flag,k,index,quantity,rate
yellow,triangle,true,1,11,43.6498,9.8870
red,square,true,2,15,79.2778,0.0130

However, if JSON data has map-valued or array-valued fields, Miller gives you choices on how to convert these to CSV columns. For example, here's some JSON data with map-valued fields:

cat data/server-log.json
{
  "hostname": "localhost",
  "pid": 12345,
  "req": {
    "id": 6789,
    "method": "GET",
    "path": "api/check",
    "host": "foo.bar",
    "headers": {
      "host": "bar.baz",
      "user-agent": "browser"
    }
  },
  "res": {
    "status_code": 200,
    "header": {
      "content-type": "text",
      "content-encoding": "plain"
    }
  }
}

We can convert this to CSV, or other tabular formats:

mlr --ijson --ocsv cat data/server-log.json
hostname,pid,req.id,req.method,req.path,req.host,req.headers.host,req.headers.user-agent,res.status_code,res.header.content-type,res.header.content-encoding
localhost,12345,6789,GET,api/check,foo.bar,bar.baz,browser,200,text,plain
mlr --ijson --oxtab cat data/server-log.json
hostname                    localhost
pid                         12345
req.id                      6789
req.method                  GET
req.path                    api/check
req.host                    foo.bar
req.headers.host            bar.baz
req.headers.user-agent      browser
res.status_code             200
res.header.content-type     text
res.header.content-encoding plain

These transformations are reversible:

mlr --ijson --oxtab cat data/server-log.json | mlr --ixtab --ojson cat
{
  "hostname": "localhost",
  "pid": 12345,
  "req": {
    "id": 6789,
    "method": "GET",
    "path": "api/check",
    "host": "foo.bar",
    "headers": {
      "host": "bar.baz",
      "user-agent": "browser"
    }
  },
  "res": {
    "status_code": 200,
    "header": {
      "content-type": "text",
      "content-encoding": "plain"
    }
  }
}

See the flatten/unflatten page for more information.

Choices for printing to files

Often we want to print output to the screen. Miller does this by default, as we've seen in the previous examples.

Sometimes, though, we want to print output to another file. Just use > outputfilenamegoeshere at the end of your command:

mlr --icsv --opprint cat example.csv > newfile.csv
# Output goes to the new file;
# nothing is printed to the screen.
cat newfile.csv
color  shape    flag     index quantity rate
yellow triangle true     11    43.6498  9.8870
red    square   true     15    79.2778  0.0130
red    circle   true     16    13.8103  2.9010
red    square   false    48    77.5542  7.4670
purple triangle false    51    81.2290  8.5910
red    square   false    64    77.1991  9.5310
purple triangle false    65    80.1405  5.8240
yellow circle   true     73    63.9785  4.2370
yellow circle   true     87    63.5058  8.3350
purple square   false    91    72.3735  8.2430

Other times we just want our files to be changed in-place: just use mlr -I:

cp example.csv newfile.txt
cat newfile.txt
color,shape,flag,index,quantity,rate
yellow,triangle,true,11,43.6498,9.8870
red,square,true,15,79.2778,0.0130
red,circle,true,16,13.8103,2.9010
red,square,false,48,77.5542,7.4670
purple,triangle,false,51,81.2290,8.5910
red,square,false,64,77.1991,9.5310
purple,triangle,false,65,80.1405,5.8240
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
purple,square,false,91,72.3735,8.2430
mlr -I --csv sort -f shape newfile.txt
cat newfile.txt
color,shape,flag,index,quantity,rate
red,circle,true,16,13.8103,2.9010
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
red,square,true,15,79.2778,0.0130
red,square,false,48,77.5542,7.4670
red,square,false,64,77.1991,9.5310
purple,square,false,91,72.3735,8.2430
yellow,triangle,true,11,43.6498,9.8870
purple,triangle,false,51,81.2290,8.5910
purple,triangle,false,65,80.1405,5.8240

Also using mlr -I you can bulk-operate on lots of files: e.g.:

mlr -I --csv cut -x -f unwanted_column_name *.csv

If you like, you can first copy off your original data somewhere else, before doing in-place operations.

Lastly, using tee within put, you can split your input data into separate files per one or more field names:

mlr --csv --from example.csv put -q 'tee > $shape.".csv", $*'
cat circle.csv
color,shape,flag,k,index,quantity,rate
red,circle,true,3,16,13.8103,2.9010
yellow,circle,true,8,73,63.9785,4.2370
yellow,circle,true,9,87,63.5058,8.3350
cat square.csv
color,shape,flag,k,index,quantity,rate
red,square,true,2,15,79.2778,0.0130
red,square,false,4,48,77.5542,7.4670
red,square,false,6,64,77.1991,9.5310
purple,square,false,10,91,72.3735,8.2430
cat triangle.csv
color,shape,flag,k,index,quantity,rate
yellow,triangle,true,1,11,43.6498,9.8870
purple,triangle,false,5,51,81.2290,8.5910
purple,triangle,false,7,65,80.1405,5.8240
Back to top