Skip to content

Instantly share code, notes, and snippets.

@dpinney
Created July 6, 2021 15:43
Show Gist options
  • Save dpinney/f5de675274c9f1ad16df6794d614dea8 to your computer and use it in GitHub Desktop.
Save dpinney/f5de675274c9f1ad16df6794d614dea8 to your computer and use it in GitHub Desktop.
Convert all RTF and RTFD files to Word DOCX via AppleScript
# We do this with Pages because it's the only thing that can correctly convert rtfd.
# textutil works on rtf files with no text in them but not on rtfd.
set my_paths to {"/path/to/first/file.rtfd", "/path/to/second/file.rtfd"}
repeat with my_path in my_paths
tell application "Pages"
set my_file to (my_path as POSIX file)
set my_name to name of (info for my_file)
set doc to open my_file
export doc as Microsoft Word to alias (my_path & ".docx" as POSIX file)
close doc
tell application "Finder" to delete my_file
end tell
end repeat
@AJ-Duncan-Poole
Copy link

AJ-Duncan-Poole commented Sep 21, 2022

Hello David,

thank you for this insight. I have searched high and low to batch convert .doc files to .rtf via terminal while keeping images and tables in the document. I need this to work in order to access and automatically parse particular data in an .rtf file to a MacOS SwiftUI app.

I came across your script that I hope will be reversible, i.e. from .doc to .rtf via pages. I also used formatted text as I came across someone with the following statement: "constant defined in the dictionary for RTF documents is formatted text". Here is my code:

`
set my_paths to {"Users/.../Documents/Test_3.doc", "/Users/.../Documents/Test_4.doc"}

repeat with my_path in my_paths
tell application "Pages"
set my_file to (my_path as POSIX file)
set my_name to name of (info for my_file)
set doc to open my_file
export doc as formatted text to alias (my_path & ".rtf" as POSIX file)
close doc
tell application "Finder" to delete my_file
end tell
end repeat
`

I get an error: "The document "Test_3.doc" could not be opened. The file doesn't exist."

And the Apple Script Editor error reads: "Pages got an error: Can’t make missing value into type document."

Not sure what I am doing wrong...

@dpinney
Copy link
Author

dpinney commented Sep 21, 2022

Thanks for the feedback!

Looks like your first path "Users/.../Documents/Test_3.doc" is missing a slash at the beginning? I.e. it should start "/Users/..."?

@AJ-Duncan-Poole
Copy link

Fixed with this:

`
set my_paths to {"/Users/.../Documents/Test_3.doc", "/Users/.../Documents/Test_4.doc"}

repeat with my_path in my_paths
tell application "Pages"
set my_file to (my_path as POSIX file)
set my_name to name of (info for my_file)
set doc to open my_file
export doc as formatted text to alias (my_path & ".rtfd" as POSIX file)
close doc
end tell
end repeat
`

Thanks again for the initial script.

@AJ-Duncan-Poole
Copy link

AJ-Duncan-Poole commented Sep 22, 2022

Hello David, I'm trying to run the script for all files in an input folder using the following:

set inputFolder to (choose folder with prompt "Select Folder of Word Document files to convert:")
set outputFolder to (choose folder with prompt "Select Folder to save RTFD files")
set inputCount to 0
set outputCount to 0

The problem I'm finding is ensuring that pages will open AllFiles in the inputFolder and convert them to formatted text (rtfd)...
This is my rough attempt (but getting: "Pages got an error: File file :Test_1.doc wasn’t found." number -43 from file ":Test_1.doc" )

tell application "Finder"
	set AllFiles to every file of folder inputFolder
end tell

repeat with f in AllFiles
	tell application "Pages"
		set my_file to (f as POSIX file)
		set my_name to name of (info for my_file)
		set doc to open my_file
		export doc as formatted text to alias (my_path & ".rtfd" as POSIX file)
		close doc
	end tell
end repeat

Do you know what I am doing wrong here?

@dpinney
Copy link
Author

dpinney commented Sep 22, 2022

Hmm. This is a tough one. My best guess is that Pages doesn't like f defined as type "POSIX file".

@AJ-Duncan-Poole
Copy link

I came across a post online that seems to deal with batch conversion https://macvector.com/blog/2014/05/using-applescript-to-batch-convert-files/ and this is the script they put out:

-- Batch convert all MacVector files in a folder into Genbank format in a second folder.
-- Clindley@MacVector.com
-- v0.4
-- 2 April  2014
-- 2 April 2014 added routine to ignore any file other than MacVector NA files.
-- 15 May 2014 concatenate all GB files into a single one. display summary dialog and open MV before starting the conversion
--Add original filename to locus

set inputFolder to (choose folder with prompt "Select Folder of MV files to convert:")
set outputFolder to (choose folder with prompt "Select Folder to save Genbank files") as text -- we need this as text to manipulate later
set inputCount to 0
set outputCount to 0

-- decide whether Fasta or Genbank
display dialog "Do you want Fasta or Genbank output?" buttons {"fasta", "genbank"}
-- now define some variables about the file format.
if the button returned of the result is "fasta" then
	
	set SeqFormat to "fasta"
	set defaultAnswer to "AllFastaFiles.fa"
	set fileExtension to "fa"
else
	set SeqFormat to "genbank"
	set defaultAnswer to "AllFastaFiles.gb"
	set fileExtension to "gb"
end if

display dialog "Do you want a single multiple sequence " & SeqFormat & " file containing all " & SeqFormat & " sequences in " & outputFolder & "?" buttons {"yes", "no"}
if the button returned of the result is "yes" then
	set concatenate to 1
	display dialog "Please enter the Output filename (do not use spaces):" default answer defaultAnswer
	set AllGenbankFilename to text returned of result
	
else
	set concatenate to 0
end if

tell application "Finder"
	set AllFiles to every file of folder inputFolder
end tell

tell application "MacVector.app"
	--open MV to avoid delays opening files if MV is not already open
	activate
end tell

repeat with f in AllFiles
	--create the output filepath and add a suitable file extension
	tell application "Finder"
		--get file  extension of the file
		set mvExtension to the name extension of f
		-- get file path as posix path
		set inputFilePath to name of f
		set AppleScript's text item delimiters to "."
		set outputFilePathBits to text items of inputFilePath
		set last text item of outputFilePathBits to fileExtension
		set outputFileName to outputFilePathBits as text
		--now grab the filename without the extension as the locus variable
		set last text item of outputFilePathBits to ""
		set Locus to outputFilePathBits as text
		
		--now strip the last character which is the dot remaining from the file extension
		set Locuslength to (count of characters of Locus)
		set Locus to text 1 thru (Locuslength - 1) of Locus
		--Now reduce it to 16 characters which is the limit of the Locus field in a Genbank record
		if length of Locus is greater than 16 then
			set Locus to text 1 thru 16 of Locus
		end if
		
		--now remove any spaces from the locus name
		set AppleScript's text item delimiters to " "
		set temp to text items of Locus
		-- display dialog temp
		set AppleScript's text item delimiters to "_"
		set Locus to text items of temp as text
		--display dialog Locus
		
		set outputFilePath to outputFolder & outputFileName as text
		set inputCount to inputCount + 1 -- increment the number of files tested
	end tell
	
	
	if mvExtension = "nucl" then
		tell application "MacVector.app"
			--Now open the file
			open f
			delay 0.3 -- wait a little bit until MV has opened the file
			--now save it as a genbank file
			set docRef to (a reference to the first document)
			if SeqFormat = "fasta" then
				save docRef in outputFilePath as «constant savfFASA»
			else
				save docRef in outputFilePath as «constant savfGENB»
			end if
			close docRef
		end tell
		if concatenate = 1 then
			-- now concatenate the files
			--first we need a UNIX style path for using "cat"
			set UNIXinputFilePath to quoted form of POSIX path of outputFilePath
			set UNIXoutputFolder to outputFolder & AllGenbankFilename
			set UNIXoutputFolder to quoted form of POSIX path of UNIXoutputFolder
			
			--First let's modify the LOCUS to reflect the old filename. The following perl one liner will modify just the first line of the perl script
			do shell script "perl -pi -e 'substr($_, 12, 16)= \"" & Locus & "\" if $. <= 1' " & UNIXinputFilePath
			
			--display dialog "cat " & UNIXoutputFilePath & " >>" & UNIXoutputFolder
			do shell script "cat " & UNIXinputFilePath & " >>" & UNIXoutputFolder
		end if
		set outputCount to outputCount + 1 -- increment the number of files tested
	end if
end repeat

-- convert the count variables to strings so we can display them in the dialogue
set inputCount to inputCount as string
set outputCount to outputCount as string
if concatenate = 1 then
	display dialog inputCount & " files checked and " & outputCount & " files converted to " & SeqFormat & " format. All the files were saved to " & UNIXoutputFolder
else
	display dialog inputCount & " files checked and " & outputCount & " files converted to " & SeqFormat & " format."
end if

I was thinking I could possibly adjust the code here to suite my purposes, just not sure whether this is the answer...

@AJ-Duncan-Poole
Copy link

I have a possible work around where finder get items of (choose folder) and then set my_paths to {items}, however AppleScript is saying that 'it can't get all the items'.

tell application "Finder" to get items of (choose folder)

set my_paths to {items}

repeat with my_path in my_paths
	tell application "Pages"
		set my_file to (my_path as POSIX file)
		set my_name to name of (info for my_file)
		set doc to open my_file
		export doc as formatted text to alias (my_path & ".rtfd" as POSIX file)
		close doc
	end tell
end repeat

any ideas...

@AJ-Duncan-Poole
Copy link

I am making headway: I just managed to get it to work when choosing the file:

tell application "Finder" to get POSIX path of (choose file)

get POSIX path of result

set my_paths to {result}

repeat with my_path in my_paths
	tell application "Pages"
		set my_file to (my_path as POSIX file)
		set my_name to name of (info for my_file)
		set doc to open my_file
		export doc as formatted text to alias (my_path & ".rtfd" as POSIX file)
		close doc
	end tell
end repeat

Now I need to find a way to get all POSIX path of files in folder...

@dpinney
Copy link
Author

dpinney commented Sep 22, 2022

Nothing worse than debugging applescript...

Maybe something like this would work?

set folderChoice to POSIX path of (choose folder)
tell application "System Events"
    set myPaths to POSIX path of disk items of folder folderChoice
end tell
repeat with myPath in myPaths
   <existing working code>
end repeat

@AJ-Duncan-Poole
Copy link

Your code worked..up to a point: Pages stopped running the script when it hit the menacing “.DS_Store”

I hard coded the folder and managed to do the same as your code: (with the same error)

tell application "System Events"
	set TitleList to POSIX path of items of folder "/Users/.../Documents/.../Client 1/"
end tell

set my_paths to result

repeat with my_path in my_paths
	tell application "Pages"
		set my_file to (my_path as POSIX file)
		set my_name to name of (info for my_file)
		set doc to open my_file
		export doc as formatted text to alias (my_path & ".rtfd" as POSIX file)
		close doc
	end tell
end repeat

I saw a post where someone suggested only showing visibleFiles but I don't know how to integrate this into the existing code..

set TitleList to POSIX path of "/Users/.../Client 1/"
tell application "System Events"
	set allVisibleFiles to files of folder TitleList whose visible is true
end tell

The result excludes the .DS_Store file but the list is not POSIX, rather it is: {file "Macintosh HD:Users:..:Client 1:Test_3.doc" of application "System Events", file...etc.}

@AJ-Duncan-Poole
Copy link

I GOT IT!!!! WOOOHOOO! So Stoked!

set folderChoice to POSIX path of (choose folder)
tell application "System Events"
	set allVisibleFiles to POSIX path of disk items of folder folderChoice whose visible is true
end tell
repeat with my_path in allVisibleFiles
	tell application "Pages"
		set my_file to (my_path as POSIX file)
		set my_name to name of (info for my_file)
		set doc to open my_file
		export doc as formatted text to alias (my_path & ".rtfd" as POSIX file)
		close doc
	end tell
end repeat

Thank you David for your valuable input! Doing cartwheels in my living room LOL!!!

@dpinney
Copy link
Author

dpinney commented Sep 22, 2022

Congrats! Nice hacking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment