Last active
April 28, 2021 03:00
-
-
Save robjwells/5032356 to your computer and use it in GitHub Desktop.
BBEdit & TextWrangler text clean-up script for the Morning Star newspaper
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- By Rob Wells for the Morning Star | |
on open theStories | |
repeat with aStory in theStories | |
tell application "TextWrangler" | |
open aStory | |
tell the front text document | |
set encoding to "Unicode (UTF-8)" | |
educate quotes with replacing target | |
-- Remove hard-wraps (not perfect) | |
my grepRep("([-,—:;[:alnum:]]) *\\r *([-,—:;[:alnum:]])", "\\1 \\2") | |
-- Important thing to note is that it only finds non-terminal punctuation. | |
-- It hits properly formatted bylines, but the byline replace fixes it later. | |
my grepRep(" {2,}", " ") -- Multiple spaces to single space | |
my grepRep("^ ", "") -- Remove spaces at the start of lines | |
my grepRep("\\t+", "") -- Remove tabs | |
-- Superscript numbers to quote marks | |
my litRep("¹", "’") -- Apostrophe | |
my litRep("²", "”") -- Right double | |
my litRep("³", "“") -- Left double | |
my grepRep(" [–-] ", " — ") -- En-dashes & hyphens to em-dashes | |
my grepRep("^• *|^n ", "n") -- Blob-pars (nHeady heady) | |
my grepRep("([.0-9]*) *(%|percent)(?!age)", "\\1 per cent") -- "per cent" | |
my litRep("...", "…") -- Ellipses | |
-- Break byline after name and ensure lower-case 'b' | |
-- Check for (rare) 3-word bylines, else handle 2-word bylines | |
my grepRep("^(by Our (?:Foreign|News|Sports) Desk|by Morning Star Reporter)(?:[ ,]*)(.*)\\r+", "\\l\\1\\r\\2\\r") | |
if the result is 0 then -- 3-word byline not found | |
-- Break two-word byline | |
my grepRep("^(by [-[:alpha:]]+ [-[:alpha:]]+)(?:[ ,]*)(.*)\\r+", "\\l\\1\\r\\2\\r") | |
end if | |
my grepRep("\\r+\\z", "") -- Delete empty lines at end | |
my grepRep("”([[:punct:]])", "\\1”") -- Transpose rdquo with punctuation. | |
end tell | |
end tell | |
end repeat | |
end open | |
-- Convenience wrapper functions | |
on litRep(searchString, replaceString) -- Literal search and replace | |
tell application "TextWrangler" | |
tell text 1 of text window 1 | |
replace searchString using replaceString options {search mode:literal, starting at top:true} | |
end tell | |
end tell | |
end litRep | |
on grepRep(searchString, replaceString) -- Grep search and replace | |
tell application "TextWrangler" | |
tell text 1 of text window 1 | |
replace searchString using replaceString options {search mode:grep, starting at top:true} | |
end tell | |
end tell | |
end grepRep |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This script is designed to be saved as a droplet that copy editors drag text files onto before working on them.
Since it’s almost entirely literal and regex replaces, it is trivially easy to update it when you spot new problems cropping up in copy.
If you don’t want to get your hands dirty with AppleScript, the same thing can be made with a BBEdit text factory.