Quick links:   Flags   Verbs   Functions   Glossary   Release docs

# Sorting

Miller gives you three ways to sort your data:

## Sorting records: the sort verb

The `sort` verb (see its documentation for more information) reorders entire records within the data stream. You can sort lexically (with or without case-folding) or numerically, ascending or descending; and you can sort primary by one column, then secondarily by another, etc.

Input data:

```mlr --c2p cat example.csv
```
```color  shape    flag  k  index quantity rate
yellow triangle true  1  11    43.6498  9.8870
red    square   true  2  15    79.2778  0.0130
red    circle   true  3  16    13.8103  2.9010
red    square   false 4  48    77.5542  7.4670
purple triangle false 5  51    81.2290  8.5910
red    square   false 6  64    77.1991  9.5310
purple triangle false 7  65    80.1405  5.8240
yellow circle   true  8  73    63.9785  4.2370
yellow circle   true  9  87    63.5058  8.3350
purple square   false 10 91    72.3735  8.2430
```

Sorted numerically ascending by rate:

```mlr --c2p sort -n rate example.csv
```
```color  shape    flag  k  index quantity rate
red    square   true  2  15    79.2778  0.0130
red    circle   true  3  16    13.8103  2.9010
yellow circle   true  8  73    63.9785  4.2370
purple triangle false 7  65    80.1405  5.8240
red    square   false 4  48    77.5542  7.4670
purple square   false 10 91    72.3735  8.2430
yellow circle   true  9  87    63.5058  8.3350
purple triangle false 5  51    81.2290  8.5910
red    square   false 6  64    77.1991  9.5310
yellow triangle true  1  11    43.6498  9.8870
```

Sorted lexically ascending by color; then, within each color, numerically descending by quantity:

```mlr --c2p sort -f color -nr quantity example.csv
```
```color  shape    flag  k  index quantity rate
purple triangle false 5  51    81.2290  8.5910
purple triangle false 7  65    80.1405  5.8240
purple square   false 10 91    72.3735  8.2430
red    square   true  2  15    79.2778  0.0130
red    square   false 4  48    77.5542  7.4670
red    square   false 6  64    77.1991  9.5310
red    circle   true  3  16    13.8103  2.9010
yellow circle   true  8  73    63.9785  4.2370
yellow circle   true  9  87    63.5058  8.3350
yellow triangle true  1  11    43.6498  9.8870
```

## Sorting fields within records: the sort-within-records verb

The `sort-within-records` verb (see its documentation for more information) leaves records in their original order in the data stream, but reorders fields within each record. A typical use-case is for given all records the same column-ordering, in particular for converting JSON to CSV (or other tabular formats):

```cat data/sort-within-records.json
```
```{
"a": 1,
"b": 2,
"c": 3
}
{
"b": 4,
"a": 5,
"c": 6
}
{
"c": 7,
"b": 8,
"a": 9
}
```
```mlr --ijson --opprint cat data/sort-within-records.json
```
```a b c
1 2 3

b a c
4 5 6

c b a
7 8 9
```
```mlr --ijson --opprint sort-within-records data/sort-within-records.json
```
```a b c
1 2 3
5 4 6
9 8 7
```

## The sort function by example

• It returns a sorted copy of an input array or map.
• Without second argument, uses the natural ordering.
• With second which is string, takes sorting flags from it: `"f"` for lexical or `"c"` for case-folded lexical, and/or `"r"` for reverse/descending.
```mlr -n put '
end {
# Sort array with natural ordering
print sort([5,2,3,1,4]);
}
'
```
```[1, 2, 3, 4, 5]
```
```mlr -n put '
end {
# Sort array with reverse-natural ordering
print sort([5,2,3,1,4], "r");
}
'
```
```[5, 4, 3, 2, 1]
```
```mlr -n put '
end {
# Sort array with custom function: natural ordering
print sort([5,2,3,1,4], func(a,b) { return a <=> b});
}
'
```
```[1, 2, 3, 4, 5]
```
```mlr -n put '
end {
# Sort array with custom function: reverse-natural ordering
print sort([5,2,3,1,4], func(a,b) { return b <=> a});
}
'
```
```[5, 4, 3, 2, 1]
```
```mlr -n put '
end {
# Sort map with natural ordering on keys
print sort({"c":2, "a": 3, "b": 1});
}
'
```
```{
"a": 3,
"b": 1,
"c": 2
}
```
```mlr -n put '
end {
# Sort map with reverse-natural ordering on keys
print sort({"c":2, "a": 3, "b": 1}, "r");
}
'
```
```{
"c": 2,
"b": 1,
"a": 3
}
```
```mlr -n put '
end {
# Sort map with custom function: natural ordering on values
print sort({"c":2, "a": 3, "b": 1}, func(ak,av,bk,bv){return av <=> bv});
}
'
```
```{
"b": 1,
"c": 2,
"a": 3
}
```
```mlr -n put '
end {
# Sort map with custom function: reverse-natural ordering on values
print sort({"c":2, "a": 3, "b": 1}, func(ak,av,bk,bv){return bv <=> av});
}
'
```
```{
"a": 3,
"c": 2,
"b": 1
}
```

In the rest of this page we'll look more closely at these variants.

## Simple sorting of arrays

Using the `sort` function, you can get a copy of an array, sorted by its values -- optionally, with reversed order, and/or lexical/case-folded sorting. The first argument is an array to be sorted. The optional second argument is a string containing any of the characters `n` for numeric (the default anyway), `f` for lexical, or `c` for case-folded lexical, and `r` for reverse. Note that `sort` does not modify its argument; it returns a sorted copy.

Also note that all the flags to `sort` allow you to operate on arrays which contain strings, floats, and booleans; if you need to sort an array whose values are themselves maps or arrays, you'll need `sort` with function argument as described further down in this page.

```cat data/sorta-example.csv
```
```key,values
alpha,4;6;1;5
beta,7;9;9;8
gamma,11;2;1;12
```

Default sort is numerical ascending:

```mlr --c2p --from data/sorta-example.csv put '
\$values = splita(\$values, ";");
\$values = sort(\$values);        # default flags
\$values = joinv(\$values, ";");
'
```
```key   values
alpha 1;4;5;6
beta  7;8;9;9
gamma 1;2;11;12
```

Use the `"r"` flag for reverse, which is numerical descending:

```mlr --c2p --from data/sorta-example.csv put '
\$values = splita(\$values, ";");
\$values = sort(\$values, "r");   # 'r' flag for reverse sort
\$values = joinv(\$values, ";");
'
```
```key   values
alpha 6;5;4;1
beta  9;9;8;7
gamma 12;11;2;1
```

Use the `"f"` flag for lexical ascending sort (and `"fr"` would lexical descending):

```mlr --c2p --from data/sorta-example.csv put '
\$values = splita(\$values, ";");
\$values = sort(\$values, "f");   # 'f' flag for lexical sort
\$values = joinv(\$values, ";");
'
```
```key   values
alpha 1;4;5;6
beta  7;8;9;9
gamma 1;11;12;2
```

Without and with case-folding:

```cat data/sorta-example-text.csv
```
```key,values
alpha,cat;bat;Australia;Bavaria;apple;Colombia
alpha,cat;bat;Australia;Bavaria;apple;Colombia
```
```mlr --c2p --from data/sorta-example-text.csv put '
\$values = splita(\$values, ";");
if (NR == 1) {
\$values = sort(\$values, "f"); # 'f' flag for (non-folded) lexical sort
} else {
\$values = sort(\$values, "c"); # 'c' flag for case-folded lexical sort
}
\$values = joinv(\$values, ";");
'
```
```key   values
alpha Australia;Bavaria;Colombia;apple;bat;cat
alpha apple;Australia;bat;Bavaria;cat;Colombia
```

## Simple sorting of maps within records

Using the `sort` function, you can sort a map by its keys.

Since `sort` only gives you options for sorting a map by its keys, if you want to sort a map by its values you'll need `sort` with function argument as described further down in this page.

Also note that, unlike the `sort-within-record` verb with its `-r` flag, `sort` doesn't recurse into submaps and sort those.

```cat data/server-log.json
```
```{
"hostname": "localhost",
"pid": 12345,
"req": {
"id": 6789,
"method": "GET",
"path": "api/check",
"host": "foo.bar",
"host": "bar.baz",
"user-agent": "browser"
}
},
"res": {
"status_code": 200,
"content-type": "text",
"content-encoding": "plain"
}
}
}
```
```mlr --json --from data/server-log.json put '
\$req = sort(\$req);      # Ascending here
\$res = sort(\$res, "r"); # Descending here
'
```
```{
"hostname": "localhost",
"pid": 12345,
"req": {
"host": "bar.baz",
"user-agent": "browser"
},
"host": "foo.bar",
"id": 6789,
"method": "GET",
"path": "api/check"
},
"res": {
"status_code": 200,
"content-type": "text",
"content-encoding": "plain"
}
}
}
```

## Simple sorting of maps across records

As discussed in the page on operating on all records, while Miller is normally streaming (we operate on one record at a time), we can accumulate records in an array-valued or map-valued out-of-stream variable, then operate on that record-list in an `end` block. This includes the possibility of accumulating records in a map, then sorting the map.

Using the `f` flag we're sorting the map keys (1-up NR) lexically, so we have 1, then 10, then 2:

```mlr --icsv --opprint --from example.csv put -q '
begin {
@records = {};  # Define as a map
}
\$nr = NR;
@records[NR] = \$*; # Accumulate
end {
@records = sort(@records, "f");
for (_, record in @records) {
emit1 record;
}
}
'
```
```color  shape    flag  k  index quantity rate   nr
yellow triangle true  1  11    43.6498  9.8870 1
purple square   false 10 91    72.3735  8.2430 10
red    square   true  2  15    79.2778  0.0130 2
red    circle   true  3  16    13.8103  2.9010 3
red    square   false 4  48    77.5542  7.4670 4
purple triangle false 5  51    81.2290  8.5910 5
red    square   false 6  64    77.1991  9.5310 6
purple triangle false 7  65    80.1405  5.8240 7
yellow circle   true  8  73    63.9785  4.2370 8
yellow circle   true  9  87    63.5058  8.3350 9
```

## Custom sorting of arrays within records

Using the `sort` function, you can sort an array by its values, using another function (which you specify -- see the page on user-defined functions) for comparing elements.

• Your function must take two arguments, which will range over various pairs of values in your array;
• It must return a number which is negative, zero, or positive depending on whether you want the first argument to sort less than, equal to, or greater than the second, respectively.

For example, let's use the following input data. Instead of having an array, it has some semicolon-delimited data in a field which we can split and sort:

```cat data/sortaf-example.csv
```
```key,values
alpha,5;2;8;6;1;4;9;10;3;7
```

In the following example we sort data in several ways -- the first two just recaptiulate (for reference) what `sort` with default flags already does; the third is novel:

```mlr --icsv --ojson --from data/sortaf-example.csv put '

# Same as sort(\$values)
func forward(a,b) {
return a <=> b
}

# Same as sort(\$values, "r")
func reverse(a,b) {
return b <=> a
}

# Custom sort
func even_then_odd(a,b) {
ax = a % 2;
bx = b % 2;
if (ax == bx) {
return a <=> b
} elif (bx == 1) {
return -1
} else {
return 1
}
}

split_values = splita(\$values, ";");
\$forward = sort(split_values, forward);
\$reverse = sort(split_values, reverse);
\$even_then_odd = sort(split_values, even_then_odd);
'
```
```{
"key": "alpha",
"values": "5;2;8;6;1;4;9;10;3;7",
"forward": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"reverse": [10, 9, 8, 7, 6, 5, 4, 3, 2, 1],
"even_then_odd": [2, 4, 6, 8, 10, 1, 3, 5, 7, 9]
}
```

## Custom sorting of arrays across records

As noted above, we can use the operating-on-all-records paradigm to accumulate records in an array-valued or map-valued out-of-stream variable, then operate on that record-list in an `end` block. This includes the possibility of accumulating records in an array, then sorting the array.

Note that here the array elements are maps, so the `a` and `b` arguments to our functions are maps -- and we have to access the `index` field using either `a["index"]` and `b["index"]`, or (using the dot operator for indexing) `a.index` and `b.index`.

```mlr --icsv --opprint --from example.csv put -q '
# Sort primarily ascending on the shape field, then secondarily
# descending numeric on the index field.
func cmp(a, b) {
cmp1 = a.shape <=> b.shape;
if (cmp1 != 0) {
return cmp1
} else {
return b.index <=> a.index;
}
}
begin {
@records = [];  # Define as an array, else auto-create will make a map
}
@records[NR] = \$*; # Accumulate
end {
@records = sort(@records, cmp);
for (record in @records) {
emit1 record;
}
}
'
```
```color  shape    flag  k  index quantity rate
yellow circle   true  9  87    63.5058  8.3350
yellow circle   true  8  73    63.9785  4.2370
red    circle   true  3  16    13.8103  2.9010
purple square   false 10 91    72.3735  8.2430
red    square   false 6  64    77.1991  9.5310
red    square   false 4  48    77.5542  7.4670
red    square   true  2  15    79.2778  0.0130
purple triangle false 7  65    80.1405  5.8240
purple triangle false 5  51    81.2290  8.5910
yellow triangle true  1  11    43.6498  9.8870
```

## Custom sorting of maps within records

Using the `sort` function, you can sort a map using a function which you specify (see the page on user-defined functions) for comparing keys and/or values.

• Your function must take four arguments, which will range over various pairs of key-value pairs in your map;
• It must return a number which is negative, zero, or positive depending on whether you want the first argument to sort less than, equal to, or greater than the second, respectively.

For example, we can sort ascending or descending by map key or map value:

```mlr -n put -q '
func f1(ak, av, bk, bv) {
return ak <=> bk
}
func f2(ak, av, bk, bv) {
return bk <=> ak
}
func f3(ak, av, bk, bv) {
return av <=> bv
}
func f4(ak, av, bk, bv) {
return bv <=> av
}
end {
x = {
"c":1,
"a":3,
"b":2,
};

print sort(x, f1);
print sort(x, f2);
print sort(x, f3);
print sort(x, f4);
}
'
```
```{
"a": 3,
"b": 2,
"c": 1
}
{
"c": 1,
"b": 2,
"a": 3
}
{
"c": 1,
"b": 2,
"a": 3
}
{
"a": 3,
"b": 2,
"c": 1
}
```

## Custom sorting of maps across records

We can modify our above example just a bit, where we accumulate records in a map rather than an array. Here the map keys will be `NR` values `"1"`, `"2"`, etc.

Why would we do this? When we're operating across all records and keeping all of them -- densely -- accumulating them in an array is fine. If we're only taking a subset -- sparsely -- and we want to retain the original `NR` as keys, using a map is handy, since we don't need continguous keys.

```mlr --icsv --opprint --from example.csv put -q '
# Sort descending numeric on the index field
func cmp(ak, av, bk, bv) {
return bv.index <=> av.index
}
begin {
@records = {};  # Define as a map
}
@records[NR] = \$*; # Accumulate
end {
@records = sort(@records, cmp);
for (_, record in @records) {
emit1 record;
}
}
'
```
```color  shape    flag  k  index quantity rate
purple square   false 10 91    72.3735  8.2430
yellow circle   true  9  87    63.5058  8.3350
yellow circle   true  8  73    63.9785  4.2370
purple triangle false 7  65    80.1405  5.8240
red    square   false 6  64    77.1991  9.5310
purple triangle false 5  51    81.2290  8.5910
red    square   false 4  48    77.5542  7.4670
red    circle   true  3  16    13.8103  2.9010
red    square   true  2  15    79.2778  0.0130
yellow triangle true  1  11    43.6498  9.8870
```