Skip to content
Quick links:   Flags   Verbs   Functions   Glossary   Release docs

Glossary

$*

All key-value pairs in the current record, as a map.

For example, if myfile.csv has header line a,b,c, and the third line after the header is 7,8,9, then the third record processed by Miller will be the ordered list of key-value pairs a=7, b=8, c=9, and $* will be (using JSON formatting) {"a": 7, "b": 8, "c": 9 }.

@*

All out-of-stream variables, as a map. Synonymous with all.

For example, if out-of-stream variables @count = 3 and @sum = 55 have been assigned, then @* will be (using JSON formatting) {"count": 3, "sum": 55}.

absent

The data type obtained from accessing a missing key, e.g. $x when the current record has no field named x. See the null-data page.

all

All out-of-stream variables, as a map. Synonymous with @*.

array

A list of values, indexable by integers starting with 1 for the first value.

auxents

Stands for auxiliary entry points. These are effectively separate programs, but bundled together inside the Miller executable for convenience. For example, mlr termcvt converts from CR-LF to LF format or vice versa. See the auxiliary-commands page page for more information.

begin

A keyword in the Miller programming language indicating the start of a begin block within an instance of the put or filter verb. See begin/end blocks.

block

A group of statements between { and } in the Miller programming language, including if-statement bodies, for-loop bodies, begin-block bodies, end-block bodies, etc.

bool

A keyword for type declaration, used for variables taking boolean (true/false) values.

break

Used for exiting a for-loop or while-loop earlier than its top-of-loop continuation expression would have specific.

BZIP2 / .bz2

A data-compression format supported by Miller. Files compressed using BZIP2 compression normally end in.bz2.

call

A keyword used for invoking a user-defined subroutine.

colorization

Miller uses configurable colors for some output to the terminal. See the output-colorization page for more information.

compression

A technique for having disk files take up less space. See the compressed-data page for information on how Miller handles this.

continue

Used for jumping to the next iteration of a for-loop or while-loop without executing the remaining loop-body statements on the current iteration.

CSV

Stands for comma-separated values. A popular file format for tabular data, which Miller supports.

Cygwin

A collection of GNU and open-source tools which provide functionality similar to a Linux distribution on Windows. Miller can run inside Cygwin, but does not need to. See Miller on Windows.

data line

Any line after the first (header) line of a CSV or TSV file. The header line contains the keys for all records in the file; data lines contain values to be zipped together with those keys to form records. See record.

Note that a data line can contain more line in the sense that it can contain embedded newlines within double quotes: see also RFC 4180 and the Miller CSV documentation.

delimiter

A delimiter is something that goes in between each item in a list of things. For example, writing an array as [1,2,3,4,5], we can say that the comma character delimits the list items.

More specifically, in terms of Miller file formats, delimiter can be used as a synonym for separator.

division

Miller uses pythonic division for quotients of integers, with the exception that integer divided by integer is integer (not float) if the quotient can be represented exactly as an integer.

DKVP

Stands for delimited key-value pairs. A Miller-specific file format, with each line of a file being of the form x=1,y=2,z=3. For historical reasons, this is Miller's default format unless flags such as --csv are supplied. You can also make CSV your default format using a .mlrrc file.

do

A keyword which is used to indicate the start of a do-while loop in the Miller programming language.

DSL

Stands for domain-specific language. The Miller programming language is embedded within the put and filter verbs. It's a language with its own syntax and semantics; the Miller executable does not embed, say, Python or Lua as a language for put and filter statements. This makes the Miller programming language an embedded domain-specific language, or domain-specific language, or (more briefly) a DSL.

dump

A keyword in the Miller programming language which is used for printing variables to the screen (namely, to stdout). Largely synonymous with print, except that print with no arguments prints nothing, while dump with no arguments displays all currently defined out-of-stream variables.

See also dump statements.

edump

Same as dump, except it prints to stderr rather than stdout.

See also dump statements.

elif

A keyword which is used to indicate the else-if-part of an if-statement in the Miller programming language. In some languages this is elsif or else if; in Miller's programming language, elif.

else

A keyword which is used to indicate the else-part of an if-statement in the Miller programming language. See also elif.

emit, emitf, emitp

Three keywords in the Miller programming language for injecting new records into the record stream using the put or filter verbs.

See also the emit-statements section.

empty

Refers to the string with zero characters. For example, in a CSV file with header line a,b,c and data ,, the three fields are empty; with data 1,2, the first two fields (a and b) are not empty and the third field c is empty.

end

A keyword in the Miller programming language indicating the start of an end block within an instance of the put or filter verb. See begin/end blocks.

ENV

A keyword in the Miller programming language for accessing a readable/writable map of environment variables

eprint

Same as print, except it prints to stderr rather than stdout.

eprintn

Same as printn, except it prints to stderr rather than stdout.

false

A keyword in the Miller programming language for the boolean literal; signified by False in Python; in some languages (such as C) signified by the zero integer value.

field

A single key-value pair within a record.

file format

A standard way for encoding information within a text file. Examples include CSV, TSV, and JSON. See the file-formats page for information on which file formats Miller handles.

FILENAME

A built-in variable in the Miller programming language referring to the name of the current file being processed as Miller streams through your data.

See the section on built-in variables.

FILENUM

A built-in variable in the Miller programming language referring to the one-up index of the current file being processed as Miller streams through your data.

See the section on built-in variables.

filter

Along with put, one of the Miller verbs which use the Miller programming language.

Also, a keyword which you can use within put statements: see the page on DSL filter statements.

See the DSL overview.

flatten

To convert map-valued and/or array-valued fields to something representable in CSV and other non-JSON file formats -- either by JSON-stringifying them or by key spreading. See the flatten/unflatten page.

See also unflatten.

float

A floating-point number as a value in Miller records, and in the Miller programming language. Floats interconvert seamlessly with integers using Miller's arithmetic rules, so usually you only need to think of numbers, rather than ints and floats separately.

Also, float is a keyword for type declaration.

FNR

Like NR but resets to 1 at the start of each file in the input stream. If you have mlr ... a.csv b.csv where a.csv has 10 records and b.csv has 20, then FNR will be 10 on the last record of a.csv, then it will have value 1 on the first record of b.csv.

See also the section on built-in variables.

for

A keyword which is used to indicate the start of a for-loop in the Miller programming language.

format

See file format.

func

A keyword used for defining a user-defined functions in the Miller programming language.

funct

A type declaration used for local variables, function arguments, and function return values which are (named) user-defined functions or (unnamed) function literals.

See the variables page for examples.

function

A bit of callable code in the Miller programming language which takes zero or more arguments, and optionally returns a value.

See the page on built-in functions to see functions which are present in Miller.

See the page on user-defined functions for how to write your own functions.

function literal

A function without a name, like func(a,b) {return a + 2*b + 7}, assigned to a local variable or passed to a higher-order function like apply or sort. See the section on function literals.

GZIP / .gz

A data-compression format supported by Miller. Files compressed using GZIP compression normally end in .gz.

hashmap

See map.

header line

The first line of a CSV or TSV file. It contains the keys for all records in the file; subsequent lines contain values to be zipped together with those keys to form records. See record.

Note that a header line can contain more line in the sense that it can contain embedded newlines within double quotes: see also RFC 4180 and the Miller CSV section.

heterogeneity

Referring to data where not all records have the same keys, in the same order. See the record-heterogeneity page.

higher-order function

A function which takes another function as an argument, such as select or apply. See the page on higher-order functions.

homogeneity

Referring to data where all records have the same keys, in the same order. See the record-heterogeneity page.

if

A keyword which is used to indicate the start of an if-statement in the Miller programming language.

IFS

Stands for input field separator. See the separators page.

in

A keyword in the Miller programming language for single-variable for-loops and key-value for-loops.

in-place

Indicates that a file will be modified after processing. Miller's default mode is to read one or more files (or standard input on a pipe) and to write to standard output. This normally goes to the terminal, but can be redirected to another pipe, or an output file -- for example, mlr --csv sort myfile.csv (prints sorted output to the terminal), mlr --csv sort myfile.csv | some-other-command, or mlr --csv sort myfile.csv > newfile.csv. In all these cases, the original myfile.csv is left unmodified. But using Miller's -I flag, we can update the original file: e.g. mlr -I --csv sort myfile.csv won't print the sorted output to the terminal, but rather will write it back to myfile.csv.

See also the section on in-place mode.

int

A 64-bit signed integer as a value in Miller records, and in the Miller programming language. Ints interconvert seamlessly with floats using Miller's arithmetic rules, so usually you only need to think of numbers, rather than ints and floats separately.

Also, int is a keyword for type declaration.

IPS

Stands for input pair separator. See the separators page.

IRS

Stands for input record separator. See the separators page.

JSON

Stands for JavaScript object notation. A popular file format for tabular data supported by Miller.

JSON Lines

A file format related to JSON, supported by Miller. Key points are that every record is an object written on a single line, without need to be wrapped an outermost list. This format helps people interoperate with non-JSON-aware tools in the Unix toolkit which generally operate on lines.

key

The string index in a map. Also, the name of a field in a record.

keyword

A reserved name in the Miller programming language which you can't use for any other purpose. For example, if, for, and while are keywords; trying to define a local variable if = 3 will result in a parse error.

line

A subsequence of a text file in between line-ending symbols such as the special linefeed character. Tools in the Unix toolkit generally operate on lines; Miller is designed to do that (using the NIDX format flags), as well as non-line-oriented formats such as CSV, TSV, JSON, and others.

local variable

A variable in the Miller programming language whose extent is limited to the expression in which it appears; contrast out-of-stream variables which endure across the entire record stream. See the section on local variables.

manpage / manual page

A form of on-line help which is common in Unix-like operating systems, including MacOS and BSD variants.

If you've installed Miller using your system's package-install tools (versus say building Miller from source), you can probably see Miller's manual page using man mlr at a terminal prompt. Regardless, you can find the same content within this documentation site.

map

A data structure in the Miller programming language containing an ordered sequence of key-value pairs. See the maps page for more information.

Note that Miller operates on records by treating them as maps.

.mlrrc

A file you can create, nominally in your home directory, to customize the default flag-settings used by Miller. For example, while Miller's default file format is DKVP, you can make the default format be CSV so that instead of mlr --csv sort myfile.csv you can simply do mlr sort myfile.csv. See the customization page.

MSYS2

MSYS2 is a collection of tools and libraries providing an easy-to-use environment for building, installing and running native Windows software. Miller on Windows no longer requires this as of Miller version 6.

M_E

A built-in variable in the Miller programming language referring to the mathematic constant e. The M is for math.

M_PI

A built-in variable in the Miller programming language referring to the mathematic constant π. The M is for math.

NF

Stands for number of fields. A read-only built-in variable in the Miller programming language which shows the number of fields in the current record.

NIDX

Stands for numerically indexed. This is a format directive telling Miller to process files one line at a time, splitting lines into fields, with the resulting fields indexed one-up as in the Unix toolkit.

See also the file-formats page.

NR

Stands for number of records. Unlike NF, which counts definitely the total number of fields within the current record, since Miller is streaming the NR built-in variable counts the number of records so far, counting upward from one. So, on the first record the NR variable will have value 1, on the second record the NR variable will have value 2, and so on.

This increments a total count across files, so if you have mlr ... a.csv b.csv where a.csv has 10 records and b.csv has 20, then NR will be 30 on the last record of b.csv.

See also FNR.

See also the section on built-in variables.

null

This term is used in various programming languages to indicate the absence of something: for example, neither true nor false, but rather unspecified or no data available here. Miller has more than one kind: see the page on null/empty/absent data.

num

The num keyword is used for type declaration in the Miller programming language. The num type encompasses both int and float. Ints and floats interconvert seamlessly using Miller's arithmetic rules, so usually you only need to think of numbers, rather than ints and floats separately.

OFS

Stands for output field separator. See the separators page.

one-up

A way of indexing arrays. If x=["a", "b", "c"], then using one-up indexing, x[1] is "a", x[2] is "b", and x[3] is "c". Miller uses one-up indexing. Contrast zero-up indexing.

See also the arrays page, as well as the page on differences from other programming languages.

oosvar

A whimsical shorthand for out-of-stream variable.

OPS

Stands for output pair separator. See the separators page.

ORS

Stands for output record separator. See the separators page.

Out-of-stream variable

Variables, prefixed with the @ sigil, which persist their values across multiple records in the Miller programming language. See out-of-stream variables for more information.

PPRINT

A Miller-specific file format for key-value pairs, with columns vertically aligned for easy visual scanning.

print

A keyword in the Miller programming language for printing things to the terminal, with final newline printed for you.

See also printn which does not insert the final newline.

See also emit which inserts new records into the record stream.

printn

A keyword in the Miller programming language for printing things to the terminal, with no final newline printed for you.

See also print which does insert the final newline.

put

Along with filter, one of the Miller verbs which use the Miller programming language.

See the DSL overview.

ragged

Referring to data where not all records have the same number of keys, particularly in a malformed-CSV context. See the record-heterogeneity page.

record

An ordered list of key-value pairs.

Miller's fundamental streaming operation is to read one record at a time from input file(s) you specify, using some input format; transforming those records using one or more verbs you specify; then printing them out in some output format.

For CSV files, each record gets its keys from the file's header line, zipped together with values from a given data line's data line. For example, if myfile.csv has header line a,b,c, and the third line after the header is 7,8,9, then the third record processed by Miller will be the ordered list of key-value pairs a=7, b=8, c=9.

For JSON files, each record is a JSON object which isn't nested inside another one.

See also the Miller command structure page.

rectangular

Referring to data where all records have the same keys, in the same order. Synonymous with homogeneous. See the record-heterogeneity page.

REPL

Stands for read-evaluate-print loop, such as when you invoke python with no arguments: a place where you can type 1+2 and get 3. Miller has a REPL you can use.

return

A keyword in the Miller programming language which is used for returning control from a function to its caller, optionally returning a value from the function.

semicolon

Semicolons are used to delimit statements in the Miller programming language.

separator

Used in two senses:

(1) In some programming languages, such as C, C++, and Java, semicolons are required after every statement; in others such as Python, they're not required at all; in yet others, they're required in between statements but are optional after the last. Miller is in the third category, so we can say that semicolons are separators, not terminators, within the Miller programming language.

(2) Refers to character sequences which separate records from one another (like newlines, sometimes), fields from one another (like commas in CSV), and keys from values in key-value pairs (= or :, perhaps). See the separators page for more information.

sparse

Referring to data where not all records have the same keys. See the record-heterogeneity page.

stderr

A keyword in the Miller programming language for print, dump, and tee statements indicating that data are to be sent to the standard error.

stdout

A keyword in the Miller programming language for print, dump, and tee statements indicating that data are to be sent to the standard output.

str

A keyword for type declaration, indicating that a variable is intended to be of type string.

streaming

Refers to operations which can be done a record at a time, so (a) output is produced as input records arrive, before end of input stream, and (b) memory usage is typically bounded. The latter means that a streaming processor can operate on data files larger than system memory. Most of Miller's operations are streaming; some (such as sort) need to see all data before producing any output, and are non-streaming. Please see the page on Streaming processing and memory usage.

subr

A keyword used for defining a user-defined subroutine.

subroutine

A user-definable bit of code in the Miller programming language, intended to be called for its side effects rather than for returning a value.

tee

In Unix-like and other systems, a tee is a command which reads standard input and writes both standard output and a specified file -- duplicating its output. The name comes from the T-splitter used in plumbing whose shape looks like the capital letter T.

One particular use-case is to snapshot data at an intermediate point in a processing pipeline -- e.g. thing1 | thing2 | tee output2.dat | thing3 | thing4.

Miller has a tee in two places: (1) a verb you can insert into a Miller then-chain, and (2) an output statement in the Miller programming language. Using the latter, you have the additional option of using a tee-to file name which is variable, perhaps depending on the current record. For example, if you have a large file with an id column, you can split it into several files, one for each distinct id. See the section on tee statements for an example.

terminator

Used in two senses:

(1) Refers to whichever character sequence terminates a line of text, such as newline/linefeed (LF) or a carriage-return-linefeed pair (CR/LF). See also https://en.wikipedia.org/wiki/Newline.

(2) In some programming languages, such as C, C++, and Java, semicolons are required after every statement; in others such as Python, they're not required at all; in yet others, they're required in between statements but are optional after the last. Miller is in the third category, so we can say that semicolons are separators, not terminators, within the Miller programming language.

toolkit

See Unix toolkit.

true

A keyword in the Miller programming language for the boolean literal; signified by True in Python; in some languages (such as C) signified by non-zero integer values.

TSV

Stands for tab-separated values. A popular file format for tabular data (tab-separated values) supported by Miller.

UDF

A user-defined function in the Miller programming language.

unflatten

To undo the flatten operation, restoring map-valued and/or array-valued fields encoded in CSV and other non-JSON file formats for JSON output. See the flatten/unflatten page.

Unix toolkit

The term Unix toolkit refers to a collection of command-line programs present in Unix and Unix-like operating systems, BSD variants, MacOS, etc. Examples include awk, sed, grep, cat, and cut. Common characteristics include processing data files one line at a time, reading input from one or more files, reading input from standard input if no files are specified, writing output to standard output, and connecting the output of one program to the input of another using pipes. Miller is designed explicitly to work well in this paradigm alongside items in the Unix toolkit. Moreover, several of Miller's verbs are designed to imitate some of the programs in the Unix toolkit, but with ability to operate on richer file formats such as CSV, TSV, JSON, and others.

unnamed function

See function literal.

unset

A keyword in the Miller programming language for removing the definition of a local or out-of-stream variable, or for removing a key from the current record.

See the DSL unset statements page.

unsparse

Transforming data so that all records have the same keys, by filling in default values. See the record-heterogeneity page.

value

The thing indexed by a key in a map. Miller values take one of Miller's data types. See also record.

var

A keyword for type declaration. It means a variable can have any type, which in itself is not useful; its usefulness comes from letting you declare a new variable, in an inner scope, of the same name as another in an outer scope.

variable

A way to access data by name within the Miller programming language. See the DSL variables page.

verb

One of the ways you ask Miller to transform your data as it processes it. Many of Miller's verbs such as sort and cut are file-format-aware analogues of tools in the Unix toolkit.

See the List of verbs page.

while

A keyword which is used to indicate the start of a while-loop, and also used in do-while loops, in the Miller programming language.

XTAB

Stands for transposed tabular. A Miller-specific file format for key-value pairs: it's a vertical-tabular format useful for looking a files with a large number of columns. Example: mlr --icsv --oxtab head -n 1 widefile.csv.

zero-up

A way of indexing arrays. If x=["a", "b", "c"], then using zero-up indexing, x[0] is "a", x[1] is "b", and x[2] is "c". Miller uses one-up indexing.

See also the arrays page, as well as the page on differences from other programming languages.

ZLIB / .z

A data-compression format supported by Miller. Files compressed using ZLIB compression normally end in .z.