Created
November 28, 2023 08:36
-
-
Save crutchcorn/c89b010ed794ec17333d5a80d038baca to your computer and use it in GitHub Desktop.
A method to generate ignored indexes and do partial replacement in a regex
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Given an input | |
const input = "header 123 {#custom-id}" | |
// Provide a tranformation of said input that keeps the same length | |
// IE: "Capitalizing" a title in a markdown file | |
const transformedInput = input.toUpperCase(); | |
// However, we don't want to transform this regex | |
// IE: A custom ID | |
const ignored = ["{#custom-id}"] | |
// From this, our output should be: | |
// "HEADER 123 {#custom-id}" | |
// Below is the implementation | |
let ignoredIndexes = new Set(); | |
for (let regex of ignored) { | |
const matches = input.match(new RegExp(regex, 'gmu')); | |
if (!matches) continue; | |
let lastIndex = 0; | |
for (const match of matches) { | |
const matchedIndex = input.indexOf(match, lastIndex); | |
lastIndex = matchedIndex + match.length; | |
for (let indexOfMatchLength = 0; indexOfMatchLength < match.length; indexOfMatchLength++) { | |
ignoredIndexes.add(matchedIndex + indexOfMatchLength); | |
} | |
} | |
} | |
let output = ""; | |
for (let inputIndex = 0; inputIndex < input.length; inputIndex++) { | |
if (ignoredIndexes.has(inputIndex)) { | |
output += input[inputIndex]; | |
} else { | |
output += transformedInput[inputIndex]; | |
} | |
} | |
// Finally, output the result: | |
console.log(output); |
This is awesome @crutchcorn, why Regex though? It kinda looks like there could have been another way or maybe it's just cos I'm scared of Regex.
@tobySolutions good question! There's a few reasons:
- The package in question already uses regexes extensively for this style of customization
- Regexes can be easily serializable into JSON files, which is where many Remark config files live
- While the demo uses a hardcoded custom ID, I need to be able to catch all custom IDs in my headers via this regex:
/\{\s*#.*?\}\s*$/
Regexes aren't all that scary once you get used to them :) I even wrote a guide to them here:
https://unicorn-utterances.com/posts/the-complete-guide-to-regular-expressions-regex
Also, sorry for the disturbance, chatGPT also pointed out that there might be an issue with how the ignored characters was specified in the code:
https://chat.openai.com/share/fe811ff8-199d-48b1-b064-45144ca4151c
@tobySolutions That's intentional behavior😊
We don't want to escape the regexes, we want to capture them as-written
Thank you very much!! @crutchcorn
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I plan on contributing this to:
https://github.com/Xunnamius/unified-utils/tree/main/packages/remark-capitalize-headings
As a feature that suits my needs.