DSL syntax¶
Expression formatting¶
Multiple expressions may be given, separated by semicolons, and each may refer to the ones before:
ruby -e '10.times{|i|puts "i=#{i}"}' | mlr --opprint put '$j = $i + 1; $k = $i +$j'
i j k 0 1 1 1 2 3 2 3 5 3 4 7 4 5 9 5 6 11 6 7 13 7 8 15 8 9 17 9 10 19
Newlines within the expression are ignored, which can help increase legibility of complex expressions:
mlr --opprint put ' # Here is how to make a comment $nf = NF; $nr = NR; $fnr = FNR; $filenum = FILENUM; $filename = FILENAME ' data/small data/small2
a b i x y nf nr fnr filenum filename pan pan 1 0.346791 0.726802 5 1 1 1 data/small eks pan 2 0.758679 0.522151 5 2 2 1 data/small wye wye 3 0.204603 0.338318 5 3 3 1 data/small eks wye 4 0.381399 0.134188 5 4 4 1 data/small wye pan 5 0.573288 0.863624 5 5 5 1 data/small pan eks 9999 0.267481232652199086 0.557077185510228001 5 6 1 2 data/small2 wye eks 10000 0.734806020620654365 0.884788571337605134 5 7 2 2 data/small2 pan wye 10001 0.870530722602517626 0.009854780514656930 5 8 3 2 data/small2 hat wye 10002 0.321507044286237609 0.568893318795083758 5 9 4 2 data/small2 pan zee 10003 0.272054845593895200 0.425789896597056627 5 10 5 2 data/small2
Anything from a #
character to end of line is a code comment.
mlr --opprint filter '($x > 0.5 && $y < 0.5) || ($x < 0.5 && $y > 0.5)' \ then stats2 -a corr -f x,y \ data/medium
x_y_corr -0.7479940285189345
Expressions from files¶
The simplest way to enter expressions for put
and filter
is between single quotes on the command line (see also here for Windows). For example:
mlr --from data/small put '$xy = sqrt($x**2 + $y**2)'
a=pan,b=pan,i=1,x=0.346791,y=0.726802,xy=0.805298171415408 a=eks,b=pan,i=2,x=0.758679,y=0.522151,xy=0.9209970096813562 a=wye,b=wye,i=3,x=0.204603,y=0.338318,xy=0.3953750836016352 a=eks,b=wye,i=4,x=0.381399,y=0.134188,xy=0.40431623334340655 a=wye,b=pan,i=5,x=0.573288,y=0.863624,xy=1.036583592538489
mlr --from data/small put 'func f(a, b) { return sqrt(a**2 + b**2) } $xy = f($x, $y)'
a=pan,b=pan,i=1,x=0.346791,y=0.726802,xy=0.805298171415408 a=eks,b=pan,i=2,x=0.758679,y=0.522151,xy=0.9209970096813562 a=wye,b=wye,i=3,x=0.204603,y=0.338318,xy=0.3953750836016352 a=eks,b=wye,i=4,x=0.381399,y=0.134188,xy=0.40431623334340655 a=wye,b=pan,i=5,x=0.573288,y=0.863624,xy=1.036583592538489
You may, though, find it convenient to put expressions into files for reuse, and read them using the -f option. For example:
cat data/fe-example-3.mlr
func f(a, b) { return sqrt(a**2 + b**2) } $xy = f($x, $y)
mlr --from data/small put -f data/fe-example-3.mlr
a=pan,b=pan,i=1,x=0.346791,y=0.726802,xy=0.805298171415408 a=eks,b=pan,i=2,x=0.758679,y=0.522151,xy=0.9209970096813562 a=wye,b=wye,i=3,x=0.204603,y=0.338318,xy=0.3953750836016352 a=eks,b=wye,i=4,x=0.381399,y=0.134188,xy=0.40431623334340655 a=wye,b=pan,i=5,x=0.573288,y=0.863624,xy=1.036583592538489
If you have some of the logic in a file and you want to write the rest on the command line, you can use the -f and -e options together:
cat data/fe-example-4.mlr
func f(a, b) { return sqrt(a**2 + b**2) }
mlr --from data/small put -f data/fe-example-4.mlr -e '$xy = f($x, $y)'
a=pan,b=pan,i=1,x=0.346791,y=0.726802,xy=0.805298171415408 a=eks,b=pan,i=2,x=0.758679,y=0.522151,xy=0.9209970096813562 a=wye,b=wye,i=3,x=0.204603,y=0.338318,xy=0.3953750836016352 a=eks,b=wye,i=4,x=0.381399,y=0.134188,xy=0.40431623334340655 a=wye,b=pan,i=5,x=0.573288,y=0.863624,xy=1.036583592538489
A suggested use-case here is defining functions in files, and calling them from command-line expressions.
Another suggested use-case is putting default parameter values in files, e.g. using begin{@count=is_present(@count)?@count:10}
in the file, where you can precede that using begin{@count=40}
using -e
.
Moreover, you can have one or more -f
expressions (maybe one function per file, for example) and one or more -e
expressions on the command line. If you mix -f
and -e
then the expressions are evaluated in the order encountered.
Semicolons, commas, newlines, and curly braces¶
Miller uses semicolons as statement separators, not statement terminators. This means you can write:
mlr put 'x=1' mlr put 'x=1;$y=2' mlr put 'x=1;$y=2;' mlr put 'x=1;;;;$y=2;'
Semicolons are optional after closing curly braces (which close conditionals and loops as discussed below).
echo x=1,y=2 | mlr put 'while (NF < 10) { $[NF+1] = ""} $foo = "bar"'
x=1,y=2,3=,4=,5=,6=,7=,8=,9=,10=,foo=bar
echo x=1,y=2 | mlr put 'while (NF < 10) { $[NF+1] = ""}; $foo = "bar"'
x=1,y=2,3=,4=,5=,6=,7=,8=,9=,10=,foo=bar
Semicolons are required between statements even if those statements are on separate lines. Newlines are for your convenience but have no syntactic meaning: line endings do not terminate statements. For example, adjacent assignment statements must be separated by semicolons even if those statements are on separate lines:
mlr put ' $x = 1 $y = 2 # Syntax error ' mlr put ' $x = 1; $y = 2 # This is OK '
Trailing commas are allowed in function/subroutine definitions, function/subroutine callsites, and map literals. This is intended for (although not restricted to) the multi-line case:
mlr --csvlite --from data/a.csv put ' func f( num a, num b, ): num { return a**2 + b**2; } $* = { "s": $a + $b, "t": $a - $b, "u": f( $a, $b, ), "v": NR, } '
s,t,u,v 3,-1,5,1 9,-1,41,2
Bodies for all compound statements must be enclosed in curly braces, even if the body is a single statement:
mlr put 'if ($x == 1) $y = 2' # Syntax error
mlr put 'if ($x == 1) { $y = 2 }' # This is OK
Bodies for compound statements may be empty:
mlr put 'if ($x == 1) { }' # This no-op is syntactically acceptable