T17 — Fuzzy edit matching 3-tier

SimpleReplacer → LineTrimmedReplacer → BlockAnchorReplacer. Khi model copy indentation sai hoặc nhớ code gần đúng — thay vì fail, opencode thử từng tier loose hơn cho đến khi match hoặc hết option.

Nhóm: C — Tool DesignFile: tool/edit.ts · Lines 182–350ID: C.5 / T17Status: Stable

Tổng quan Tool Design

Tại sao quan trọng. Edit tool fail là nguyên nhân phổ biến nhất gây agent stuck. Model đọc file xong, copy old_string vào args nhưng: (1) thêm/bớt leading whitespace, (2) nhớ sai 1-2 ký tự giữa block dài, (3) copy block tương tự nhưng không phải block đúng. Exact match fail → agent phải đọc lại file, thử lại → tốn context + thời gian. 3-tier matching giải quyết từng class lỗi này theo thứ tự strict → loose, chỉ dùng fuzzy khi strict đã fail.

Adaptive threshold: Khi chỉ có 1 candidate, threshold = 0.0 (chấp nhận mọi match). Khi có nhiều candidate, threshold = 0.3 — tránh chọn sai block khi code có nhiều đoạn giống nhau (vd multiple function với cùng structure).

Phân tích code chi tiết Anatomy

3-tier cascade

applyEdit(content, oldStr, newStr) │ ├─► Tier 1: SimpleReplacer │ exact string match (indexOf) │ → fast, no ambiguity │ → fails if any char different │ ├─► Tier 2: LineTrimmedReplacer │ trim() each line before matching │ → catches indentation errors (2 spaces vs 4 spaces vs tab) │ → fails if content chars different │ └─► Tier 3: BlockAnchorReplacer match first + last line as anchors fuzzy Levenshtein on middle lines threshold: 0.0 (single) / 0.3 (multiple candidates) → catches "model remembers block approximately" → throw NoMatchError if all tiers fail

tool/edit.ts — 3-tier cascade

{`
const REPLACERS = [
  SimpleReplacer,       // Tier 1: exact
  LineTrimmedReplacer,  // Tier 2: trim whitespace per line
  BlockAnchorReplacer,  // Tier 3: anchor + Levenshtein
] as const

export async function applyEdit(
  content: string,
  oldStr: string,
  newStr: string,
) {
  for (const Replacer of REPLACERS) {
    const match = Replacer.findMatch(content, oldStr)
    if (match) {
      return Replacer.replace(content, match, newStr)
    }
  }
  throw new NoMatchError(
    "old_string not found in file. Re-read the file and try again with exact content."
  )
}
`}

BlockAnchorReplacer — Levenshtein trên middle lines

BlockAnchorReplacer — chi tiết implementation

{`
class BlockAnchorReplacer {
  static findMatch(content: string, oldStr: string) {
    const oldLines = oldStr.split("\n")
    // Chỉ áp dụng cho block >= 3 dòng
    if (oldLines.length < 3) return null

    const first = oldLines[0].trim()
    const last  = oldLines[oldLines.length - 1].trim()

    // Tìm tất cả cặp (first, last) trong content
    const candidates = findAllAnchorPairs(content, first, last)

    // Score mỗi candidate bằng Levenshtein ratio
    const scored = candidates.map((c) => ({
      ...c,
      score: levenshteinRatio(oldStr, content.slice(c.start, c.end)),
    }))
    scored.sort((a, b) => b.score - a.score)

    // Adaptive threshold: 0.0 khi 1 candidate, 0.3 khi nhiều
    const threshold = scored.length > 1 ? 0.3 : 0.0

    return scored[0]?.score >= threshold ? scored[0] : null
  }
}

// levenshteinRatio = 1.0 - (editDistance / maxLen)
// 1.0 = perfect match, 0.0 = hoàn toàn khác
`}

Generic implementation

Simplified 3-tier implementation

{`
function replaceFlex(content: string, oldStr: string, newStr: string): string {
  const replacers = [exactMatch, trimmedMatch, anchorMatch]

  for (const r of replacers) {
    const m = r.find(content, oldStr)
    if (m) return content.slice(0, m.start) + newStr + content.slice(m.end)
  }

  throw new Error(
    "No match found after 3-tier search. Re-read the file and retry with exact content."
  )
}

function anchorMatch(content: string, oldStr: string) {
  const lines = oldStr.split("\n")
  if (lines.length < 3) return null

  const first = lines[0].trim()
  const last  = lines[lines.length - 1].trim()

  // Find all positions where first and last line anchor match
  const candidates = findAnchorPairs(content, first, last)

  // Score by Levenshtein, return best above threshold
  return bestMatch(candidates, oldStr, content, candidates.length > 1 ? 0.3 : 0.0)
}
`}

Tương tác với kỹ thuật khác Interaction

T16 (Zod validation): Validation xảy ra trước fuzzy matching — nếu args thiếu old_string hoặc new_string, fail sớm với instruction error trước khi thử match.
T18 (Bash tree-sitter): Edit tool là tool khác bash — khi model cần edit file, nó dùng edit tool (không dùng bash+sed). tree-sitter parsing áp dụng riêng cho bash tool.
T6 (Doom loop): Nếu model liên tục gửi cùng old_string sai và edit cứ fail, doom loop detection trigger — hỏi user can thiệp.
T1 (ReAct loop): Khi edit fail (NoMatchError), loop tiếp tục — model nhận error message, đọc lại file, thử lại với old_string chính xác hơn.

Failure modes Failure

1. False positive match

Threshold 0.3 với multiple candidates có thể chọn sai block nếu code base có nhiều đoạn giống nhau (boilerplate, repetitive patterns). Model edit đúng về syntax nhưng sai file/function.

Phòng tránh: Tool description nên dạy model cung cấp đủ context trong old_string — không chỉ 2-3 dòng mà cả surrounding context để anchor match mạnh hơn.

2. Levenshtein O(n×m) với file lớn

File >5000 dòng với nhiều anchor candidates → nhiều Levenshtein call → latency cao. Cần index hoặc limit số candidates trước khi score.

3. Model học cách 'lười'

Khi biết fuzzy matching tồn tại, model có thể gửi old_string ngày càng imprecise, dựa vào tier 3 để match. Dẫn đến tăng risk edit sai âm thầm.

So sánh với các harness khác Compare

Harness	Edit matching strategy	Tiers
opencode	Exact → LineTrimmed → BlockAnchor+Levenshtein	3
Claude Code	Tương tự (có fuzzy matching)	2–3
Aider	Exact match + unified diff fallback	2
Cline	Exact match only, fail if not found	1
OpenHarness	Exact match + regex fallback	2

Implementation recipe Recipe

{`
// Levenshtein distance (simple implementation)
function levenshtein(a: string, b: string): number {
  const m = a.length, n = b.length
  const dp: number[][] = Array.from({ length: m + 1 }, (_, i) =>
    Array.from({ length: n + 1 }, (_, j) => i === 0 ? j : j === 0 ? i : 0)
  )
  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      dp[i][j] = a[i-1] === b[j-1]
        ? dp[i-1][j-1]
        : 1 + Math.min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])
    }
  }
  return dp[m][n]
}

function levenshteinRatio(a: string, b: string): number {
  const dist = levenshtein(a, b)
  return 1 - dist / Math.max(a.length, b.length, 1)
}

// 3-tier applyEdit
export function applyEdit(content: string, old: string, next: string): string {
  // Tier 1: exact
  const idx1 = content.indexOf(old)
  if (idx1 !== -1) return content.slice(0, idx1) + next + content.slice(idx1 + old.length)

  // Tier 2: line-trimmed
  const trimmedContent = content.split("\n").map(l => l.trim()).join("\n")
  const trimmedOld     = old.split("\n").map(l => l.trim()).join("\n")
  const idx2 = trimmedContent.indexOf(trimmedOld)
  if (idx2 !== -1) {
    // Map back to original content positions
    const start = mapTrimmedIndex(content, idx2)
    const end   = mapTrimmedIndex(content, idx2 + trimmedOld.length)
    return content.slice(0, start) + next + content.slice(end)
  }

  // Tier 3: block anchor + Levenshtein
  const oldLines = old.split("\n")
  if (oldLines.length >= 3) {
    const first = oldLines[0].trim()
    const last  = oldLines[oldLines.length - 1].trim()
    const candidates = findAnchorPairs(content, first, last)
    const scored = candidates
      .map(c => ({ ...c, score: levenshteinRatio(old, content.slice(c.start, c.end)) }))
      .sort((a, b) => b.score - a.score)
    const threshold = scored.length > 1 ? 0.3 : 0.0
    const best = scored[0]
    if (best && best.score >= threshold) {
      return content.slice(0, best.start) + next + content.slice(best.end)
    }
  }

  throw new Error("old_string not found. Re-read the file and retry.")
}
`}

Tham khảo Refs

Tham khảo