Skip to content

Instantly share code, notes, and snippets.

@mbohun

mbohun/NOTES.md Secret

Last active November 29, 2019 03:58
Show Gist options
  • Save mbohun/bb2688e2f67fbda9b48703c516330e5c to your computer and use it in GitHub Desktop.
Save mbohun/bb2688e2f67fbda9b48703c516330e5c to your computer and use it in GitHub Desktop.
PowerShell script to batch convert MS Office files to PDF

INTRO

PREREQUIREMENTS

  • Microsoft Windows 10 (The "real thing" will run on Microsoft Windows Server 2012)
  • Microsoft Office 365 (The "real thing" will use Microsoft Office 2016)
Jenkins build / test env notes

TODO: automate/script windows dev box (with vagrant?) This is to deep-slice test the whole "strawman"/"concept" of the whole PDF creation processing:

On a Linux Jenkins box:

  • start a Microsoft windows dev virtualbox
    • with a configured windows "shared drive" so the Linux host OS (acting here as the "Solaris env") can interact with it
  • a dev/test/mock windows process (script) will produce the input ("patient record files") in order to trigger and test:
    • the windows FileWatcher powersell script to parse the patient record input files, and create patient record sub-dir
    • then convert the MS Office files to PDFs, and output those into some shared output dir (for the Linux host ("Solaris") for further processing (the PDF processing and merging).

The script

This is the original script that was tested to convert the example patient record MS Office files to PDF on Windows box:

# This script converts all the .doc and .docx files in the `$documents_path` dir to .pdf
#
# It needs proper/robust error handling:
#    - https://stackoverflow.com/questions/16534292/basic-powershell-batch-convert-word-docx-to-pdf
#      NOTE: the 2nd post about crashes, and their ("typical") m$ workaround
#

$documents_path = '.\test_files'

$word_app = New-Object -ComObject Word.Application

# This filter will find .doc as well as .docx documents
Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {
    $document = $word_app.Documents.Open($_.FullName)
    $pdf_filename = "$($_.DirectoryName)\$($_.BaseName).pdf"
    $document.SaveAs([ref] $pdf_filename, [ref] 17)
    $document.Close()
}

$word_app.Quit()
  • NOTE: configuration information?; MS Office input files ("Filter") configured as env var ?
    setx DEHS_PDF_PROCESSOR_INPUT_FILES "*.doc, *.docx, *.xls, *.xlsx, *.ppt, *.rtf"
    
CLEANUP and re-test script notes
# This script converts all the .doc and .docx files in the `$documents_path` dir to .pdf
#
# It needs proper/robust error handling:
#    - https://stackoverflow.com/questions/16534292/basic-powershell-batch-convert-word-docx-to-pdf
#      NOTE: the 2nd post about crashes, and their ("typical") m$ workaround
#

$documents_path = '.\test_files'

$word_app = New-Object -ComObject Word.Application

# This filter will find .doc as well as .docx documents
#Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {
Get-ChildItem -Path $documents_path\* -Include *.doc, *.docx, *.xls, *.xlsx, *.ppt, *.rtf  | ForEach-Object {
    try {
        $document = $word_app.Documents.Open($_.FullName)
        $pdf_filename = "$($_.DirectoryName)\$($_.BaseName).pdf"
        $document.SaveAs([ref] $pdf_filename, [ref] 17)
    }
    catch {
        # NOTE: Write some error log info
        Write-Host "An error occurred:"
        Write-Host $_.ScriptStackTrace
    }
    finally {
        # NOTE: during testing I got an error, that in turn ended-up:
        # 1. opening 5 instances of word
        # 2. hanging/crashing
        #
        $document.Close() // TEST: is this reference visible or do we have to declare it "above/before" the try?
    }
}

$word_app.Quit()

# TODO: HOW are we going to handle errors?
  • NOTE: The script does need proper error handling! try-catch or at least a proper if-else with $document.Close() to avoid opening multiple instances of MS Word and hanging.

PERMISSIONS

ERROR

PS C:\Users\marti\src> C:\Users\marti\src\convert_msoffice_to_pdf.ps1
File C:\Users\marti\src\convert_msoffice_to_pdf.ps1 cannot be loaded because running scripts is disabled on this system. For more information, see about_Execution_Policies at 
https:/go.microsoft.com/fwlink/?LinkID=135170.
    + CategoryInfo          : SecurityError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : UnauthorizedAccess

EXISTING PERMISSIONS

PS C:\Users\marti\src> Get-ExecutionPolicy -List

        Scope ExecutionPolicy
        ----- ---------------
MachinePolicy       Undefined
   UserPolicy       Undefined
      Process       Undefined
  CurrentUser       Undefined
 LocalMachine       Undefined

ADJUST PERMISSIONS

PS C:\Users\marti\src> Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope CurrentUser
PS C:\Users\marti\src> Get-ExecutionPolicy -List

        Scope ExecutionPolicy
        ----- ---------------
MachinePolicy       Undefined
   UserPolicy       Undefined
      Process       Undefined
  CurrentUser    Unrestricted
 LocalMachine       Undefined

RUN

PS C:\Users\marti\src> C:\Users\marti\src\convert_msoffice_to_pdf.ps1
PS C:\Users\marti\src> 

REFERENCES

test script:

# https://stackoverflow.com/questions/16534292/basic-powershell-batch-convert-word-docx-to-pdf
# NOTE: the 2nd post about crashes, and their ("typical") m$ workaround
#

$documents_path = '.\test_files'

$output_path = New-Item  -ItemType Directory -Path "out_test" -ErrorAction SilentlyContinue
#echo "OUTPUT: $output_path"

$word_app = New-Object -ComObject Word.Application

$timestamp_start = Get-Date -Format yyyy-MM-dd_HH_mm_ss

# This filter will find .doc as well as .docx documents
Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {
    $document = $word_app.Documents.Open($_.FullName)
    #$pdf_filename = "$($_.DirectoryName)\$($_.BaseName).pdf"
    $pdf_filename = "$output_path\$($_.BaseName).pdf"
    #echo "PDF: $pdf_filename"

    $document.SaveAs([ref] $pdf_filename, [ref] 17)
    $document.Close()
}

$timestamp_end = Get-Date -Format yyyy-MM-dd_HH_mm_ss

$word_app.Quit()

echo "START: $timestamp_start"
echo "  END: $timestamp_end"

input files:

PS C:\Users\marti\src\test-00> ls -n .\test_files\*.doc
150123_AE408 - Mental Health Intake Assessmen_DDummy.doc
20150212_PM528_Dummy.doc
AD074 - Request for Dental Treatment Locality Restriction.doc
AE819 - International Carriage of Medicines for Personal Use.doc
PM101 - Medical or Dental Fitness Advice 01.doc
PM101 - Medical or Dental Fitness Advice Mon_Aug_2016 06_28_12_903.doc
PM101 - Medical or Dental Fitness Advice Mon_Jan_2016 02_00_22_533.doc
PM101 - Medical or Dental Fitness Advice Mon_Mar_2015 22_31_57_697.doc
PM101 - Medical or Dental Fitness Advice Sun_Jan_2016 07_27_48_713.doc
PM101 - Medical or Dental Fitness Advice Sun_Mar_2015 10_24_20_333.doc
PM101 - Medical or Dental Fitness Advice Thu_Dec_2014 06_17_38_110.doc
PM101 - Medical or Dental Fitness Advice Thu_Oct_2014 03_43_04_393.doc
PM101 - Medical or Dental Fitness Advice Thu_Sep_2014 04_05_42_830.doc
PM101 - Medical or Dental Fitness Advice Tue_Feb_2016 02_13_27_697.doc
PM101 - Medical or Dental Fitness Advice Tue_Feb_2016 02_51_41_873.doc
PM101 - Medical or Dental Fitness Advice Tue_Jul_2014 05_13_27_900.doc
PM101 - Medical or Dental Fitness Advice Tue_Jun_2016 05_55_02_437.doc
PM101 - Medical or Dental Fitness Advice Tue_Mar_2016 02_05_02_803.doc
PM101 - Medical or Dental Fitness Advice Tue_Mar_2016 02_51_37_427.doc
PM101 - Medical or Dental Fitness Advice Tue_Oct_2014 01_34_41_690.doc
PM101 - Medical or Dental Fitness Advice Tue_Sep_2015 03_33_58_470.doc
PM101 - Medical or Dental Fitness Advice Wed_Apr_2015 04_03_22_383.doc
PM101 - Medical or Dental Fitness Advice Wed_Jun_2016 05_03_48_530.doc
PM101 - Medical or Dental Fitness Advice Wed_Mar_2016 23_44_43_833.doc
PM101 - Medical or Dental Fitness Advice Wed_May_2015 01_06_55_320.doc
PM101 - Medical or Dental Fitness Advice Wed_May_2017 16_20_34_833.doc
PM101 - Medical or Dental Fitness Advice Wed_Oct_2014 09_02_50_210.doc
PM518 - MEC Review Record Fri_Aug_2014 07_50_15_450.doc
PM518 - MEC Review Record Mon_Apr_2016 08_01_14_020.doc
PM518 - MEC Review Record Tue_Feb_2015 21_39_39_510.doc
PM518 - MEC Review Record Tue_Jan_2015 05_24_26_160.doc
PM518 - MEC Review Record Tue_Nov_2016 12_24_00_180.doc
PM518 - MEC Review Record Tue_Nov_2016 12_24_00_237.doc
PM518 - MEC Review Record Tue_Nov_2016 12_36_21_527.doc
PM518 - MEC Review Record Tue_Nov_2016 12_36_21_530.doc
PM518 - MEC Review Record Tue_Nov_2016 12_41_22_150.doc
PM527-1 Diagnostic Imaging Request Fri_Apr_2014 11_45_01_473.doc
PM527-1 Diagnostic Imaging Request Fri_Apr_2014 12_02_00_130.doc
PM527-1 Diagnostic Imaging Request Thu_Apr_2014 11_34_26_703.doc
PM527-1 Diagnostic Imaging Request Thu_Apr_2014 11_39_57_100.doc
PM527-1 Diagnostic Imaging Request Thu_Apr_2014 11_42_47_533.doc
PM527-1 Diagnostic Imaging Request Thu_Apr_2014 11_56_52_377.doc
PM527-1 Pathology Request Onboard.doc
PM528 - External Service Provider Request Fri_Aug_2016 01_25_31_137.doc
PM528 - External Service Provider Request Thu_Aug_2016 00_24_42_783.doc
PM528 - External Service Provider Request Thu_Nov_2014 01_11_25_053.doc
PM528 - External Service Provider Request Thu_Oct_2014 00_26_14_940.doc
PM528 - External Service Provider Request Wed_Jun_2016 00_37_08_500.doc
PM528-4 Extension of Episode of Care with External Service Provider.doc
PM532 - Medical Employment Classification (MEC) Advice.doc
PULHEEMS Table.doc
QBF.doc
Temporary Issue Record.doc

run script to convert input files to PDF:

PS C:\Users\marti\src\test-00> .\convert_msoffice_to_pdf.ps1
START: 2019-11-29_14_28_20
  END: 2019-11-29_14_28_47
PS C:\Users\marti\src\test-00>

result: 27 seconds

output files:

PS C:\Users\marti\src\test-00> ls -n .\out_test
150123_AE408 - Mental Health Intake Assessmen_DDummy.pdf
20150212_PM528_Dummy.pdf
AD074 - Request for Dental Treatment Locality Restriction.pdf
AE819 - International Carriage of Medicines for Personal Use.pdf
PM101 - Medical or Dental Fitness Advice 01.pdf
PM101 - Medical or Dental Fitness Advice Mon_Aug_2016 06_28_12_903.pdf
PM101 - Medical or Dental Fitness Advice Mon_Jan_2016 02_00_22_533.pdf
PM101 - Medical or Dental Fitness Advice Mon_Mar_2015 22_31_57_697.pdf
PM101 - Medical or Dental Fitness Advice Sun_Jan_2016 07_27_48_713.pdf
PM101 - Medical or Dental Fitness Advice Sun_Mar_2015 10_24_20_333.pdf
PM101 - Medical or Dental Fitness Advice Thu_Dec_2014 06_17_38_110.pdf
PM101 - Medical or Dental Fitness Advice Thu_Oct_2014 03_43_04_393.pdf
PM101 - Medical or Dental Fitness Advice Thu_Sep_2014 04_05_42_830.pdf
PM101 - Medical or Dental Fitness Advice Tue_Feb_2016 02_13_27_697.pdf
PM101 - Medical or Dental Fitness Advice Tue_Feb_2016 02_51_41_873.pdf
PM101 - Medical or Dental Fitness Advice Tue_Jul_2014 05_13_27_900.pdf
PM101 - Medical or Dental Fitness Advice Tue_Jun_2016 05_55_02_437.pdf
PM101 - Medical or Dental Fitness Advice Tue_Mar_2016 02_05_02_803.pdf
PM101 - Medical or Dental Fitness Advice Tue_Mar_2016 02_51_37_427.pdf
PM101 - Medical or Dental Fitness Advice Tue_Oct_2014 01_34_41_690.pdf
PM101 - Medical or Dental Fitness Advice Tue_Sep_2015 03_33_58_470.pdf
PM101 - Medical or Dental Fitness Advice Wed_Apr_2015 04_03_22_383.pdf
PM101 - Medical or Dental Fitness Advice Wed_Jun_2016 05_03_48_530.pdf
PM101 - Medical or Dental Fitness Advice Wed_Mar_2016 23_44_43_833.pdf
PM101 - Medical or Dental Fitness Advice Wed_May_2015 01_06_55_320.pdf
PM101 - Medical or Dental Fitness Advice Wed_May_2017 16_20_34_833.pdf
PM101 - Medical or Dental Fitness Advice Wed_Oct_2014 09_02_50_210.pdf
PM518 - MEC Review Record Fri_Aug_2014 07_50_15_450.pdf
PM518 - MEC Review Record Mon_Apr_2016 08_01_14_020.pdf
PM518 - MEC Review Record Tue_Feb_2015 21_39_39_510.pdf
PM518 - MEC Review Record Tue_Jan_2015 05_24_26_160.pdf
PM518 - MEC Review Record Tue_Nov_2016 12_24_00_180.pdf
PM518 - MEC Review Record Tue_Nov_2016 12_24_00_237.pdf
PM518 - MEC Review Record Tue_Nov_2016 12_36_21_527.pdf
PM518 - MEC Review Record Tue_Nov_2016 12_36_21_530.pdf
PM518 - MEC Review Record Tue_Nov_2016 12_41_22_150.pdf
PM527-1 Diagnostic Imaging Request Fri_Apr_2014 11_45_01_473.pdf
PM527-1 Diagnostic Imaging Request Fri_Apr_2014 12_02_00_130.pdf
PM527-1 Diagnostic Imaging Request Thu_Apr_2014 11_34_26_703.pdf
PM527-1 Diagnostic Imaging Request Thu_Apr_2014 11_39_57_100.pdf
PM527-1 Diagnostic Imaging Request Thu_Apr_2014 11_42_47_533.pdf
PM527-1 Diagnostic Imaging Request Thu_Apr_2014 11_56_52_377.pdf
PM527-1 Pathology Request Onboard.pdf
PM528 - External Service Provider Request Fri_Aug_2016 01_25_31_137.pdf
PM528 - External Service Provider Request Thu_Aug_2016 00_24_42_783.pdf
PM528 - External Service Provider Request Thu_Nov_2014 01_11_25_053.pdf
PM528 - External Service Provider Request Thu_Oct_2014 00_26_14_940.pdf
PM528 - External Service Provider Request Wed_Jun_2016 00_37_08_500.pdf
PM528-4 Extension of Episode of Care with External Service Provider.pdf
PM532 - Medical Employment Classification (MEC) Advice.pdf
PULHEEMS Table.pdf
QBF.pdf
Temporary Issue Record.pdf

result: 53 PDF files

CONCLUSION:

27.0 seconds / 53 files = 0.5094339622641509 sec-per-file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment