Skip to content

Instantly share code, notes, and snippets.

@KyleMit
Last active April 24, 2020 20:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save KyleMit/9a07325571e0716543dcc2b75460f416 to your computer and use it in GitHub Desktop.
Save KyleMit/9a07325571e0716543dcc2b75460f416 to your computer and use it in GitHub Desktop.
Remove Capitalization Changes

Here's an example of two strings to run through a diffing processes

let oldText = "start case old Text"
let newText = "start Case next Text"

These get tokenized by each word (separated by a space) and passed into difflib which produces the following result:

let diffs =  [
  "  start",
  "- case",
  "+ Case",
  "- old",
  "+ new",
  "  text"
]

Where the first character represents deleted (-), added (+), or no change ( ), then there's a space, and then the actual text

However, capitalization changes should not be represented as a delete + add operation, so if consecutive items in the arrray have the same text during a case-insensitive comparison, they should be treated as the same word, with the winner coresponding to the added text from the new string.

We can't just covert everything toLowerCase before comparing, because the final output should preserve the casing of the input strings.

So the output, once transformed, should look like this:

let output = [
  "  start",
  "  Case",
  "- old",
  "+ new",
  "  text"
]

Rules:

  1. Identify consecutive items that have matching case insensitive text where one is added and the other is removed (regular duplicate words are allowed)
  2. Combine those items, the winner should be the one that has been added (starts with + , but updated to indicate no change (starts with ) and the removed word should be removed from the array

Sample Code

As a possible starting point, you can get strong typing by mapping each item in the array to the following:

let deltas = diffs.map(d=> {
    let delta = d.slice(0,1)
    let text = d.slice(2)
    
    return {
        text,
        added: delta === "+",
        deleted: delta === "-",
        both: delta === " ",
    }
})
@KyleMit
Copy link
Author

KyleMit commented Apr 24, 2020

Here's a working solution, but feels hacky
VermontDepartmentOfHealth/covid-bot@d4fa167

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment