Skip to content

Instantly share code, notes, and snippets.

@olets
Forked from robjwells/CopyCleaner.applescript
Created March 26, 2017 03:03
Show Gist options
  • Save olets/fc76239d3e0305739ed2ac17d9bee2a9 to your computer and use it in GitHub Desktop.
Save olets/fc76239d3e0305739ed2ac17d9bee2a9 to your computer and use it in GitHub Desktop.
BBEdit & TextWrangler text clean-up script for the Morning Star newspaper
-- By Rob Wells for the Morning Star
on open theStories
repeat with aStory in theStories
tell application "TextWrangler"
open aStory
tell the front text document
set encoding to "Unicode (UTF-8)"
educate quotes with replacing target
-- Remove hard-wraps (not perfect)
my grepRep("([-,—:;[:alnum:]]) *\\r *([-,—:;[:alnum:]])", "\\1 \\2")
-- Important thing to note is that it only finds non-terminal punctuation.
-- It hits properly formatted bylines, but the byline replace fixes it later.
my grepRep(" {2,}", " ") -- Multiple spaces to single space
my grepRep("^ ", "") -- Remove spaces at the start of lines
my grepRep("\\t+", "") -- Remove tabs
-- Superscript numbers to quote marks
my litRep("¹", "’") -- Apostrophe
my litRep("²", "”") -- Right double
my litRep("³", "“") -- Left double
my grepRep(" [–-] ", " — ") -- En-dashes & hyphens to em-dashes
my grepRep("^• *|^n ", "n") -- Blob-pars (nHeady heady)
my grepRep("([.0-9]*) *(%|percent)(?!age)", "\\1 per cent") -- "per cent"
my litRep("...", "…") -- Ellipses
-- Break byline after name and ensure lower-case 'b'
-- Check for (rare) 3-word bylines, else handle 2-word bylines
my grepRep("^(by Our (?:Foreign|News|Sports) Desk|by Morning Star Reporter)(?:[ ,]*)(.*)\\r+", "\\l\\1\\r\\2\\r")
if the result is 0 then -- 3-word byline not found
-- Break two-word byline
my grepRep("^(by [-[:alpha:]]+ [-[:alpha:]]+)(?:[ ,]*)(.*)\\r+", "\\l\\1\\r\\2\\r")
end if
my grepRep("\\r+\\z", "") -- Delete empty lines at end
my grepRep("”([[:punct:]])", "\\1”") -- Transpose rdquo with punctuation.
end tell
end tell
end repeat
end open
-- Convenience wrapper functions
on litRep(searchString, replaceString) -- Literal search and replace
tell application "TextWrangler"
tell text 1 of text window 1
replace searchString using replaceString options {search mode:literal, starting at top:true}
end tell
end tell
end litRep
on grepRep(searchString, replaceString) -- Grep search and replace
tell application "TextWrangler"
tell text 1 of text window 1
replace searchString using replaceString options {search mode:grep, starting at top:true}
end tell
end tell
end grepRep
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment