Learning Goals
- βI can explain what an edit is (insert, delete, substitute) and what distance means between two words.β
- βI can build and read a cost grid that finds a shortest edit path.β
- βI can justify why there might be more than one optimal path.β
Success Criteria
- Compute distances for small pairs (3β6 letters) without code.
- Show at least one optimal path; identify ties and explain them.
- Check work with the interactive demo and reconcile differences.
TIP
Weβll engage with in-class discussion now to further explore this topic.
Summary
If you were absent from class, or want to recap what we explored, keep reading.
Levenshtein distance
The distance between two words is the smallest number of single-letter edits to change the source into the target:
- Insert (β, vertical)
- Delete (β, horizontal)
- Substitute (οΌΌ, diagonal).
- Matches cost 0; each edit costs 1.
This is named the Levenshtein distance, for βthe Soviet mathematician Vladimir Levenshtein, who defined the metric in 1965.β Source
The grid method
Write the source across the top (columns) and the target down the left (rows).
- Top row (target length = 0): represents turning a source prefix into the empty string β deletions (0..n).
- Left column (source length = 0): represents building the target from empty β insertions (0..m).
- Cell rule:
min(β + 1, β + 1, β + (0 if same letter else 1)). - Distance: the number in the bottom-right cell.
- Draw arrows to the neighbour(s) that achieve the minimum. Ties mean multiple equally short edit scripts exist.
Examples
The method for building out a grid of Levenshtein distances is best understood by exploring examples using the interactive demo.
Please take a look at the following examples β be sure to try the Animate Table Fill button for a visual explanation of how to fill in a table with edit distance values:
Exercises
Try completing a Levenshtein distance table by hand, then use the interactive demo to check your work:
BICYCLE β RECYCLEFRANCE β DANCECREATIVE β REACTIVE
Key takeaway
- The distance (bottom-right value) is unique for a given pair, but there can be multiple optimal edit scripts (edit paths through the grid).