Quick links: Flags Verbs Functions Glossary Release docs

The Miller Agent Skill¶

As of Miller version 6.20, released in July 2026, there are two main ways to get your AI to know about a software tool (Miller, or others): agent skills, and MCP. (See Miller and AI for an introduction.)

Miller ships a built-in Agent Skill -- a single SKILL.md file -- inside the mlr executable, so agents that read skills directly from disk (Claude Code, and other tools that support the Agent Skills format) can discover and drive Miller without scraping help text or guessing at flags.

The skill is plain markdown with a YAML frontmatter header, placed where your agent already looks for skills. The agent reads it into context once, the same way it reads any other instructions, and from then on it runs mlr commands via whatever shell-executing tool it already has.

Here's what the skill file looks like:

mlr skill print | head -n 15

---
name: miller
description: >
  Drive Miller (mlr) to process CSV/TSV/JSON/etc. data. Use when constructing
  mlr command lines: discover capabilities from the catalog rather than
  guessing, learn the data's shape before writing expressions, validate DSL
  before running, and recover from failures via structured errors.
---

# Miller agent playbook

Miller (`mlr`) is a command-line data processor for CSV, TSV, JSON, JSON
Lines, and other tabular/record formats, with SQL-like verbs (`cut`, `sort`,
`join`, `stats1`, ...) and an awk-like DSL (`put`, `filter`).

For more background on the mlr commands the agent runs on your behalf, please see Miller AI internals.

Setup¶

Write the skill file to Claude Code's personal skills directory (do this before starting your claude session):

mlr skill install ~/.claude/skills/miller

Wrote /Users/kerl/.claude/skills/miller/SKILL.md

For Codex and Gemini:

mlr skill install ~/.agents/skills/miller

With no argument, install writes to .claude/skills/miller/SKILL.md under the current directory instead. This is handy for a project-scoped skill checked into that project's repo rather than one installed for every project on your machine:

mlr skill install

Wrote .claude/skills/miller/SKILL.md

There's no "uninstall" subcommand, since install only ever writes one plain file. Removing it is an ordinary file operation:

rm -rf ~/.claude/skills/miller

Then -- just interact with your agent as always! When you say something like describe the data file example.csv, the agent will already know how to use Miller to help answer that question.

What the Miller skill maps to¶

You don't have to type skill or anything else special in your agent session: rather you've empowered the agent to discover things about Miller for itself. But if you're curious what's actually placed in front of it:

mlr skill --help

Usage: mlr skill {print|install} [options]
Puts the Miller Agent Skill (SKILL.md) where a coding agent can find it.
This is the same playbook mlr mcp serves as its "miller-playbook"
prompt/resource, packaged for agents that read Agent Skills from disk.

Subcommands:
  print          Write the skill content to stdout.
  install [DIR]  Write DIR/SKILL.md, creating DIR if needed.
                 Default DIR is .claude/skills/miller

 -h or --help   Show this message.

And here's the file itself -- the whole thing, not an excerpt, since this and nothing else is what the agent has to go on:

mlr skill print

---
name: miller
description: >
Drive Miller (mlr) to process CSV/TSV/JSON/etc. data. Use when constructing
mlr command lines: discover capabilities from the catalog rather than
guessing, learn the data's shape before writing expressions, validate DSL
before running, and recover from failures via structured errors.
---

# Miller agent playbook

Miller (`mlr`) is a command-line data processor for CSV, TSV, JSON, JSON
Lines, and other tabular/record formats, with SQL-like verbs (`cut`, `sort`,
`join`, `stats1`, ...) and an awk-like DSL (`put`, `filter`).

Work this loop. Each step exists to prevent a specific, common failure.

## 1. Discover — never invent names

Everything valid is in the catalog; anything not in the catalog does not
exist. Hallucinated flag/function names are the top failure mode.

- Route an intent: `which` with e.g. `"join two files on a key"` → ranked
candidates. `confident: true` means a name matched; trust the top hit.
- Browse cheaply: `list_capabilities` with `index: true` → every
verb/function/flag/keyword with one-line summaries.
- Drill in: `list_capabilities` with `kind: "verb", names: ["join"]` → the
full entry. Prefer the structured `options` list (flag, arg, type, enum
`values`) when present; `usage_text` is the prose fallback.
- The whole catalog is cacheable against `(mlr_version,
catalog_schema_version)` — re-fetch only when either changes.

## 2. Constrain — learn the data before touching it

Call `describe_data` on the input first. It returns, per field: name, types
seen with counts, occurrence count, null count, cardinality, min/max, and —
for low-cardinality fields — every distinct value.

- Copy field names exactly from `describe_data`; never guess casing or
spelling.
- For flags like `-g` (group-by) and DSL comparisons, use values from the
`values` array, not values you expect to exist.
- Fields whose `count` is less than other fields' are absent in some records:
guard DSL with `is_present($field)`.

## 3. Validate — check DSL before spending a run

Before any `run` that includes `put` or `filter`, call `validate_dsl` with the
expression. Cost: parse-only, no data read. On `valid: false`, the `error`
document has `kind`, `hint`, and `did_you_mean` — apply the hint, don't
re-guess syntax.

## 4. Run — and read errors structurally

Call `run` with argv as a list, one element per shell word (no shell quoting):

{"args": ["--icsv", "--ojson", "cat", "data.csv"]}

Command-line shape rules that prevent most argv errors:

- Main flags (I/O formats etc.) come **before** the verb: `mlr --icsv sort -f name f.csv`.
- Format shorthands: `--icsv --ojson` (separate in/out), `--csv`/`--c2j` etc. (combined).
- Chain verbs with `then`: `["--icsv", "sort", "-f", "k", "then", "head", "-n", "3", "f.csv"]`.
- If a field value being compared in `filter` might collide with a verb flag,
end verb flags with `--` before filenames.
- Inline data goes in `stdin_text`; files go at the end of `args`.

On failure, `exit_code` is nonzero and `error` (when present) carries `kind`,
`hint`, and `did_you_mean` — `hint` is often a corrected command line; prefer
executing it over reasoning from the message. `stdout_truncated: true` means
the output exceeded the server's cap: narrow the query (e.g. `head`, `cut`)
rather than re-running the same command.

## Notes

- `run` cannot execute external commands (DSL `system`/`exec`, piped
redirects, `--prepipe`) unless the server was started with `--allow-shell`;
such calls fail cleanly. It **can** write files via `tee`, `split`, and DSL
output redirects — treat it as a write-capable tool.
- Long inputs: prefer `describe_data` + targeted verbs over dumping whole
files through `run`.
- One record format in, another out: Miller is format-to-format; there is no
separate conversion step.

That playbook is prose, not named tools, but it rests on the Miller features documented in the Miller AI internals page.

What using the Miller skill looks like in practice¶

There's no server status to check and no tool list to browse -- the skill is just text the agent already has -- so "in practice" mostly looks like an ordinary conversation. Say you're looking at example.csv for the first time:

You: In example.csv, show me the red rows.

Without the skill, a plausible guess for the DSL is $color == "Red" -- and Miller silently returns nothing for it, since the real values are lowercase. With the skill installed, the agent runs mlr --icsv --ojson describe example.csv on your behalf first, sees the real value set for color (yellow, red, purple), and only then answers:

Agent: Four rows have color = red: rows 2, 3, 4, and 6.

The full worked version of this example, including the exact commands run at each step, is in Miller and AI.

A note on sandboxing¶

The MCP server enforces a sandbox by construction: subprocesses it spawns run with MLR_NO_SHELL=1 unless you start it with --allow-shell, so an agent-constructed command line can't execute external commands even if the agent wanted it to.

The skill file has no equivalent enforcement. It's advisory text, not a wrapper around subprocess execution -- nothing stops an agent from running mlr put 'end{print system("whatever")}' with your full shell permissions if it decides to. If you want that guarantee with the skill alone, set the MLR_NO_SHELL environment variable yourself (or pass --no-shell explicitly), rather than relying on the playbook text for isolation. If you want the enforced version, register the MCP server instead of, or alongside, the skill.