Home GnuPG

Make prose diff algorithm more iterative, to improve prose diffs for (among…
f36c31c991caUnpublished

Unpublished Commit · Learn More

Repository Importing: This repository is still importing.

Description

Make prose diff algorithm more iterative, to improve prose diffs for (among other things) removed commas

Summary:
Ref T7643. This is a little hard to explain but before we would do this:

  • Diff paragraphs.
  • For each different paragraph, diff sentences
  • For each different sentence, diff characters.

Now, we do this:

  • Diff paragraphs.
  • Collect all the identical, purely added, and purely removed paragraphs and set them aside. We know we should have good diffs for these already.
  • What's left over is sequences of removed/added/changed paragraphs, which we may not have great diffs for yet. Smush these together into big diff blocks.
  • Now, for these blocks, diff sentences.
  • Repeat all of that to diff characters.

This seems to pass all the existing unit tests, and pass new unit tests which I was previously unable to make pass by fiddling with things without changing the algorithm.

Test Plan: Passed existing unit tests. Passed new unit tests.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T7643

Differential Revision: https://secure.phabricator.com/D16839

Details

Provenance
epriestley <git@epriestley.com>Authored on Nov 10 2016, 9:21 PM
Parents
rPHUTILc6634479d0f1: Add a `phutil_person()` wrapper for the string extractor
Branches
Unknown
Tags
Unknown

Event Timeline