T26 — Arity-based command normalization

ARITY map 450+ entries: npm run → 3, git config → 3, touch → 1. Normalize "npm run build --watch" → "npm run build" để suggest pattern "npm run build *" cho user.

Nhóm: E — Permission ModelFile: permission/arity.ts · Lines 1–161ID: E.3 / T26Status: Stable

Tổng quan Permission

Tại sao quan trọng. Khi agent run "npm run build --watch --verbose", permission dialog hiển thị gì? Nếu hiện raw string → user approve pattern quá narrow ("chỉ lần này"), cần approve lại khi flags thay đổi. Nếu hiện "npm run build" → user approve pattern "npm run build *" — bao gồm tất cả variants của cùng build script. ARITY map định nghĩa "bao nhiêu token tạo nên human-readable command name" cho 450+ common tool. Suggestion chính xác = UX tốt + coverage đúng.

UX insight: User nghĩ theo "command" (npm run build, git commit), không theo shell string đầy đủ. Arity normalization ánh xạ raw shell → user mental model → pattern suggestion phù hợp.

Phân tích code chi tiết Anatomy

ARITY map — longest-prefix match

permission/arity.ts — ARITY map (excerpt)

{`
// Key = prefix, value = số token của "human command name"
const ARITY: Record<string, number> = {
  // Single-token commands (arity = 1: chỉ lấy token đầu)
  touch: 1,  ls: 1,  cat: 1,  echo: 1,  mkdir: 1,  cp: 1,  mv: 1,  rm: 1,

  // Two-token commands (arity = 2)
  "npm install": 2,  "npm ci": 2,
  "git add":     2,  "git status": 2,  "git push": 2,  "git pull": 2,
  "docker run":  2,  "docker ps": 2,

  // Three-token commands (arity = 3)
  "npm run":      3,   // "npm run build --watch" → "npm run build"
  "git config":   3,   // "git config --global user.name" → "git config"
  "pnpm dlx":     3,
  "npx create-next-app": 4,

  // ... 450+ total
}

export function normalizeCommand(raw: string): string {
  const tokens = tokenize(raw)  // respect quotes

  // Longest prefix match: try 4 tokens, then 3, 2, 1
  for (let n = 4; n >= 1; n--) {
    const prefix = tokens.slice(0, n).join(" ")
    if (ARITY[prefix] !== undefined) {
      // Take arity-many tokens (excluding flags)
      const nonFlags = tokens.filter(t => !t.startsWith("-"))
      return nonFlags.slice(0, ARITY[prefix]).join(" ")
    }
  }

  return tokens[0] ?? ""  // fallback: first token only
}

// Examples:
// normalizeCommand("npm run build --watch")     → "npm run build"
// normalizeCommand("git config --global ...")   → "git config"
// normalizeCommand("ls -la /home")              → "ls"
// normalizeCommand("docker compose up -d")      → "docker compose"
`}

From normalize to pattern suggestion

Permission dialog: normalize → suggest pattern

{`
// Khi agent request permission:
const rawCmd     = "npm run build --watch --verbose"
const normalized = normalizeCommand(rawCmd)  // → "npm run build"

// Suggest "always" pattern cho user:
const suggested = normalized + " *"          // → "npm run build *"

// UI shows:
// ┌─────────────────────────────────────────────┐
// │ Agent muốn chạy: npm run build --watch ...  │
// │                                             │
// │ [once] [always: "npm run build *"] [reject] │
// └─────────────────────────────────────────────┘
`}

Tương tác với kỹ thuật khác Interaction

T18 (Tree-sitter bash parsing): T18 extract command name từ AST, T26 normalize command name đó. Pipeline: raw bash → tree-sitter → command list → arity normalize → suggest pattern.
T25 (Session state): Pattern đã normalized là input cho alwaysAllow khi user approve "always". Normalized pattern rộng hơn raw string nhưng vẫn hợp lý — "npm run build *" bao gồm tất cả npm run build variants.
T24 (Wildcard evaluate): Pattern được store trong alwaysAllow sẽ được check bởi wildcard evaluate (T24) khi agent chạy command tương tự sau đó.

Failure modes Failure

1. ARITY map không đầy đủ

Tool mới (bun, deno subcommand, new CLI) không có entry → fallback về first token → suggest pattern quá broad (vd "bun *" thay vì "bun run build *"). Cần process để thêm entries khi gap được phát hiện.

2. Shell tricks bypass normalization

sudo npm install, env X=1 git status, bash -c 'npm run ...' → normalization trả về "sudo" hoặc "env" thay vì command thật. Cần pre-processing để strip sudo/env/bash -c trước khi normalize.

3. Alias không được handle

User có alias g = git, n = npm → ARITY không biết → normalize fail. Shell alias expansion không available trong security-critical permission context.

So sánh với các harness khác Compare

Harness	Command normalization	Pattern suggestion
opencode	ARITY map 450+ entries, longest-prefix match	✅ normalized + wildcard suffix
Claude Code	Pattern matching nhưng không arity-aware	⚠️ raw command
Aider	Không normalize — show raw command	❌
Cline	Show raw command	❌
OpenHarness	Không có normalization	❌

Implementation recipe Recipe

{`
// Minimal ARITY-based normalization
const ARITY: Record<string, number> = {
  // Add entries as you discover new tools
  ls: 1,  cat: 1,  mkdir: 1,  rm: 1,  cp: 1,  mv: 1,
  "npm install": 2,  "npm run": 3,  "npm ci": 2,
  "git add": 2,  "git commit": 2,  "git push": 2,  "git status": 2,
  "git config": 3,  "git rebase": 2,
  "docker run": 2,  "docker compose": 2,
  "pnpm install": 2,  "pnpm run": 3,
}

function tokenize(cmd: string): string[] {
  // Respect quoted strings
  return cmd.match(/'[^']*'|"[^"]*"|\S+/g)
    ?.map(t => t.replace(/^['"]|['"]$/g, "")) ?? []
}

export function normalizeCommand(raw: string): string {
  const tokens = tokenize(raw)
  const noFlags = tokens.filter(t => !t.startsWith("-"))

  for (let n = Math.min(4, noFlags.length); n >= 1; n--) {
    const prefix = noFlags.slice(0, n).join(" ")
    if (prefix in ARITY) {
      return noFlags.slice(0, ARITY[prefix]).join(" ")
    }
  }

  return noFlags[0] ?? tokens[0] ?? ""
}

export function suggestPattern(normalized: string): string {
  return normalized + " *"
}
`}

Tham khảo Refs

Tham khảo