Here's an example of two strings to run through a diffing processes
let oldText = "start case old Text"
let newText = "start Case next Text"
These get tokenized by each word (separated by a space) and passed into difflib which produces the following result:
let diffs = [
" start",
"- case",
"+ Case",
"- old",
"+ new",
" text"
]
Where the first character represents deleted (-
), added (+
), or no change (
), then there's a space, and then the actual text
However, capitalization changes should not be represented as a delete + add operation, so if consecutive items in the arrray have the same text during a case-insensitive comparison, they should be treated as the same word, with the winner coresponding to the added text from the new string.
We can't just covert everything toLowerCase
before comparing, because the final output should preserve the casing of the input strings.
So the output, once transformed, should look like this:
let output = [
" start",
" Case",
"- old",
"+ new",
" text"
]
- Identify consecutive items that have matching case insensitive text where one is added and the other is removed (regular duplicate words are allowed)
- Combine those items, the winner should be the one that has been added (starts with
+
, but updated to indicate no change (starts with
As a possible starting point, you can get strong typing by mapping each item in the array to the following:
let deltas = diffs.map(d=> {
let delta = d.slice(0,1)
let text = d.slice(2)
return {
text,
added: delta === "+",
deleted: delta === "-",
both: delta === " ",
}
})
Here's a working solution, but feels hacky
VermontDepartmentOfHealth/covid-bot@d4fa167