DSL errors and transparency¶
Handling for data errors¶
By default, Miller doesn't stop data processing for a single cell error. For example:
mlr --csv --from data-error.csv cat
x 1 2 3 text 4
mlr --csv --from data-error.csv put '$y = log10($x)'
x,y 1,0 2,0.3010299956639812 3,0.4771212547196624 text,(error) 4,0.6020599913279624
If you do want to stop processing, though, you have three options. The first is the
mlr -x flag:
mlr -x --csv --from data-error.csv put '$y = log10($x)'
x,y 1,0 2,0.3010299956639812 3,0.4771212547196624 mlr: data error at NR=4 FNR=4 FILENAME=data-error.csv mlr: field y: log10: unacceptable type string with value "text" mlr: exiting due to data error.
The second is to put
-x into your
The third is to set the
MLR_FAIL_ON_DATA_ERROR environment variable, which makes
Common causes of syntax errors¶
As soon as you have a programming language, you start having the problem What is my code doing, and why? This includes getting syntax errors -- which are always annoying -- as well as the even more annoying problem of a program which parses without syntax error but doesn't do what you expect.
The syntax-error message gives you line/column position for the syntax that couldn't be parsed. The cause may be clear from that information, or perhaps not. Here are some common causes of syntax errors:
;at end of line, before another statement on the next line.
Miller's DSL lacks the
Curly braces are required for the bodies of
forblocks, even when the body is a single statement.
mlr filterprints abstract syntax trees for your code. While not all details here will be of interest to everyone, certainly this makes questions such as operator precedence completely unambiguous.
Please see type-checking for type declarations and type-assertions you can use to make sure expressions and the data flowing them are evaluating as you expect. I made them optional because one of Miller's important use-cases is being able to say simple things like
mlr put '$y = $x + 1' myfile.datwith a minimum of punctuational bric-a-brac -- but for programs over a few lines long, I generally find that the more type-specification, the better.