Skip to content
Quick links:   Flags   Verbs   Functions   Glossary   Release docs

Special symbols and formatting

How can I handle commas-as-data in various formats?

CSV handles this well and by design:

cat commas.csv
"Xiao, Lin",administrator
"Khavari, Darius",tester

Likewise JSON:

mlr --icsv --ojson cat commas.csv
  "Name": "Xiao, Lin",
  "Role": "administrator"
  "Name": "Khavari, Darius",
  "Role": "tester"

For Miller's XTAB there is no escaping for carriage returns, but commas work fine:

mlr --icsv --oxtab cat commas.csv
Name Xiao, Lin
Role administrator

Name Khavari, Darius
Role tester

But for key-value-pairs and index-numbered formats, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters:

mlr --icsv --odkvp cat commas.csv
Name=Xiao, Lin,Role=administrator
Name=Khavari, Darius,Role=tester

One solution is to use a different delimiter, such as a pipe character:

mlr --icsv --odkvp --ofs pipe cat commas.csv
Name=Xiao, Lin|Role=administrator
Name=Khavari, Darius|Role=tester

To be extra-sure to avoid data/delimiter clashes, you can also use control characters as delimiters -- here, control-A:

mlr --icsv --odkvp --ofs '\001'  cat commas.csv | cat -v
Name=Xiao, Lin^ARole=administrator
Name=Khavari, Darius^ARole=tester

How can I handle field names with special symbols in them?

Simply surround the field names with curly braces:

echo 'x.a=3,y:b=4,z/c=5' | mlr put '${product.all} = ${x.a} * ${y:b} * ${z/c}'

How can I put single quotes into strings?

This is a little tricky due to the shell's handling of quotes. For simplicity, let's first put an update script into a file:

$a = "It's OK, I said, then 'for now'."
echo a=bcd | mlr put -f data/single-quote-example.mlr
a=It's OK, I said, then 'for now'.

So: Miller's DSL uses double quotes for strings, and you can put single quotes (or backslash-escaped double-quotes) inside strings, no problem.

Without putting the update expression in a file, it's messier:

echo a=bcd | mlr put '$a="It'\''s OK, I said, '\''for now'\''."'
a=It's OK, I said, 'for now'.

The idea is that the outermost single-quotes are to protect the put expression from the shell, and the double quotes within them are for Miller. To get a single quote in the middle there, you need to actually put it outside the single-quoting for the shell. The pieces are the following, all concatenated together:

  • $a="It
  • \'
  • s OK, I said,
  • \'
  • for now
  • \'
  • .

How to escape '?' in regexes?

One way is to use square brackets; an alternative is to use simple string-substitution rather than a regular expression.

cat data/question.dat
a=is it?,b=it is!
mlr --oxtab put '$c = gsub($a, "[?]"," ...")' data/question.dat
a is it?
b it is!
c is it ...
mlr --oxtab put '$c = ssub($a, "?"," ...")' data/question.dat
a is it?
b it is!
c is it ...

The ssub and gssub functions exist precisely for this reason: so you don't have to escape anything.

Latin-1 and UTF-8 character encodings

The ssub and gssub functions are also handy for dealing with non-UTF-8 strings such as Latin 1, since Go's regexp library -- which Miller uses -- requires UTF-8 strings. For example:

mlr -n put 'end {
  name = "Ka\xf0l\xedn og \xdeormundr";
  name = gssub(name, "\xde", "\u00de");
  name = gssub(name, "\xf0", "\u00f0");
  name = gssub(name, "\xed", "\u00ed");
  print name;
Kaðlín og Þormundr

More generally, though, we have the DSL functions latin1_to_utf8 and utf8_to_latin1 and the verbs latin1-to-utf8 and utf8-to-latin1. The former let you fix encodings on a field-by-field level; the latter, for all records (with less keystroking). (Latin 1 is also known as ISO/IEC 8859-1.)

In this example, all the inputs are convertible from Latin-1 to UTF-8, since Latin-1 already contains the German characters:


In this example, the English and German pangrams are convertible from UTF-8 to Latin-1, but the Russian one is not, since Latin-1 doesn't contain the Russian alphabet:


How to apply math to regex output?

  • Use parentheses for capture groups
  • Use \1, \2, etc. to refer to the captures
  • The matched patterns are strings, so cast them to int or float

See also the page on regular expressions.

echo "a=14°45'" | mlr put '$a =~"^([0-9]+)°([0-9]+)" {$degrees = float("\1") + float("\2") / 60}'