technical10 min read

The Myers Diff Algorithm: How Text Comparison Actually Works

A plain-language explanation of the algorithm behind every diff tool

If you have ever run `git diff`, used a code review tool, or compared two documents in any modern software, you have used the Myers diff algorithm β€” almost certainly without knowing it. Published by Eugene W. Myers in 1986 in the journal Algorithmica, the paper "An O(ND) Difference Algorithm and Its Variations" introduced the algorithm that became the standard for text comparison everywhere from Unix diff to GitHub to LineDiff.

Understanding how the Myers algorithm works will not only satisfy intellectual curiosity β€” it will help you understand why diff tools produce the output they do, and what trade-offs are made in the process.

**What Is a Diff?**

At its most basic, a diff is the answer to this question: given two sequences A and B, what is the minimum set of changes needed to transform A into B? A "change" can be either an insertion (add something that is in B but not in A) or a deletion (remove something that is in A but not in B). Unchanged elements β€” lines that appear in both A and B in the same relative order β€” are called the common subsequence.

The goal of any diff algorithm is to find the Longest Common Subsequence (LCS) of A and B, because once you know the LCS, everything else in A that is not in the LCS must have been deleted, and everything in B that is not in the LCS must have been inserted.

**The Edit Graph**

Myers represents the diff problem visually as a grid, often called the edit graph. Imagine a grid where the x-axis represents positions in sequence A (the original document) and the y-axis represents positions in sequence B (the revised document). You start at the top-left corner (0, 0) and want to reach the bottom-right corner (|A|, |B|).

Moving right one step means deleting one element from A. Moving down one step means inserting one element from B. Moving diagonally β€” right and down simultaneously β€” means keeping one element that is the same in both A and B. This diagonal move is free, in the sense that it does not count as an edit.

Finding the minimum edit distance between A and B is therefore equivalent to finding the shortest path from the top-left to the bottom-right of this grid, where diagonal moves are free and horizontal or vertical moves each cost one edit.

**The Shortest Edit Script**

The sequence of moves along this shortest path is called the Shortest Edit Script (SES). The Myers algorithm finds the SES efficiently without exploring every possible path through the grid β€” which would be computationally intractable for any real-world document.

See it in action β€” try a comparison with sample data instantly.

Try It Now arrow_forward

The key insight in Myers' paper is to search the edit graph in a greedy, diagonal-first manner. Rather than exploring breadth-first from the starting corner, the algorithm traces the furthest-reaching paths for each edit distance d, starting from d=0 and increasing. For each d, it finds the furthest point reachable on each diagonal k (where k = x - y) using at most d edits. Diagonal paths are extended as far as possible for free before counting the next edit.

**O(ND) Complexity**

The algorithm runs in O(ND) time, where N is the sum of the lengths of A and B, and D is the number of edits in the SES. This is near-optimal: when D is small (when the two documents are mostly the same), the algorithm is extremely fast. When D is large (when the two documents are very different), it takes longer β€” but in this case, there is fundamentally more work to do, and no algorithm can avoid it.

In practice, for typical document comparisons β€” two versions of a policy document, two drafts of a contract, two configurations files β€” D is small relative to N, and the Myers algorithm is extremely fast. This is why even large documents (100+ pages) produce results in milliseconds in LineDiff.

**Semantic Cleanup**

The raw output of the Myers algorithm is syntactically optimal β€” it produces the shortest edit script β€” but it is not always semantically optimal for human readers. Consider a diff that shows half a sentence deleted and a new half-sentence inserted, when a human reader would naturally describe this as "the sentence was rewritten." Myers' algorithm does not know what a sentence is; it only knows about the sequence of characters or lines it is given.

Most diff tools apply a semantic cleanup pass after running the core algorithm. This pass reorganizes the raw insertions and deletions to align with natural language boundaries β€” word boundaries, sentence boundaries, paragraph boundaries β€” producing a diff that is easier for humans to interpret without changing the underlying set of edits.

**How LineDiff Implements Myers in Web Workers**

LineDiff runs its diff computation in a dedicated Web Worker thread. This is critical for user experience: the Myers algorithm, even though it is fast, can take tens of milliseconds for large documents. Running it on the main browser thread would block the UI β€” the page would freeze while the diff computed. By offloading the computation to a worker, LineDiff keeps the interface responsive and can even show progressive results as the diff completes.

The worker receives the two documents as strings, tokenizes them into lines (for line-level diffs) or words (for inline diffs), runs the Myers algorithm, applies semantic cleanup, and returns the annotated diff result to the main thread. The entire pipeline typically completes in under 100 milliseconds for documents up to 10,000 lines.

**The AI Layer on Top**

Myers gives you syntactic truth: exactly what characters changed. It does not give you semantic meaning: what those changes mean in context. This is where LineDiff's AI analysis layer adds value.

After the Myers diff is computed, LineDiff optionally sends the change set to an LLM (GPT-4o by default, or your own OpenAI key via BYOK). The AI receives the structured diff β€” not the raw documents β€” and produces a domain-aware analysis: which changes are substantively important, which are cosmetic, and what the overall effect of the revision is.

For a legal document, the AI distinguishes between a comma inserted for grammatical clarity and a clause that materially changes a party's obligations. For a financial report, it flags changes in numerical values versus changes in narrative commentary. For a clinical protocol, it highlights dosing and eligibility changes versus administrative housekeeping.

The Myers algorithm tells you what changed. The AI tells you why it matters. Together, they make LineDiff more than a diff tool β€” they make it a document intelligence platform built on a 40-year-old algorithm that was, and remains, the best solution to one of computer science's most fundamental problems.

Try Free

Every diff tool uses some variant of the Myers algorithm. Learn how it works, why it produces optimal results, and how LineDiff extends it with AI-powered semantic analysis.