Learning Goals

  • β€œI can explain what an edit is (insert, delete, substitute) and what distance means between two words.”
  • β€œI can build and read a cost grid that finds a shortest edit path.”
  • β€œI can justify why there might be more than one optimal path.”

Success Criteria

  • Compute distances for small pairs (3–6 letters) without code.
  • Show at least one optimal path; identify ties and explain them.
  • Check work with the interactive demo and reconcile differences.

TIP

We’ll engage with in-class discussion now to further explore this topic.

Summary

If you were absent from class, or want to recap what we explored, keep reading.

Levenshtein distance

The distance between two words is the smallest number of single-letter edits to change the source into the target:

  • Insert (↕, vertical)
  • Delete (↔, horizontal)
  • Substitute (οΌΌ, diagonal).
    • Matches cost 0; each edit costs 1.

This is named the Levenshtein distance, for β€œthe Soviet mathematician Vladimir Levenshtein, who defined the metric in 1965.” Source

The grid method

Write the source across the top (columns) and the target down the left (rows).

  • Top row (target length = 0): represents turning a source prefix into the empty string β†’ deletions (0..n).
  • Left column (source length = 0): represents building the target from empty β†’ insertions (0..m).
  • Cell rule: min(↑ + 1, ← + 1, β†– + (0 if same letter else 1)).
  • Distance: the number in the bottom-right cell.
  • Draw arrows to the neighbour(s) that achieve the minimum. Ties mean multiple equally short edit scripts exist.

Examples

The method for building out a grid of Levenshtein distances is best understood by exploring examples using the interactive demo.

Please take a look at the following examples – be sure to try the Animate Table Fill button for a visual explanation of how to fill in a table with edit distance values:

  1. SWEAT β†’ SWEET
  2. EAT β†’ PEAT
  3. INDOLENT β†’ REDOLENT
  4. BASKETBALL β†’ BASEBALL

Exercises

Try completing a Levenshtein distance table by hand, then use the interactive demo to check your work:

  1. BICYCLE β†’ RECYCLE
  2. FRANCE β†’ DANCE
  3. CREATIVE β†’ REACTIVE

Key takeaway

  • The distance (bottom-right value) is unique for a given pair, but there can be multiple optimal edit scripts (edit paths through the grid).