Last active
September 7, 2023 18:59
-
-
Save mklement0/ef57aea441ea8bd43387a7d7edfc6c19 to your computer and use it in GitHub Desktop.
PowerShell function for invoking native (external) programs with a specified character encoding
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<# | |
Prerequisites: PowerShell v3+ | |
License: MIT | |
Author: Michael Klement <mklement0@gmail.com> | |
DOWNLOAD and DEFINITION OF THE FUNCTION: | |
irm https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19/raw/Invoke-WithEncoding.ps1 | iex | |
The above directly defines the function below in your session and offers guidance for making it available in future | |
sessions too. | |
DOWNLOAD ONLY: | |
irm https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19/raw > Invoke-WithEncoding.ps1 | |
The above downloads to the specified file, which you then need to dot-source to make the function available | |
in the current session: | |
. ./Invoke-WithEncoding.ps1 | |
To learn what the function does: | |
* see the next comment block | |
* or, once downloaded and defined, invoke the function with -? or pass its name to Get-Help. | |
To define an ALIAS for the function, (also) add something like the following to your $PROFILE: | |
Set-Alias ien Invoke-WithEncoding | |
#> | |
function Invoke-WithEncoding { | |
<# | |
.SYNOPSIS | |
Invokes a native (external) program with the specified character encoding. | |
.DESCRIPTION | |
Invokes a native (external) program using the specified encoding to both | |
send data to and receive data from via the pipeline. | |
Note: | |
* Even though there's no formal parameter, pipeline input *is* supported. | |
* However, for technical reasons all pipelne in put is *collected in full* | |
first, and so is all output. | |
* This command ensures that decoding of the native program output into .NET | |
string is performed, as would invariably happen on capturing output or | |
piping output to a different command, so that, on Windows, encoding | |
mismatches aren't masked by direct-to-console output printing correctly. | |
The previous encoding settings are restored when this command exits. | |
.PARAMETER ScriptBlock | |
The script block containing the native-program call(s) to perform. | |
Note that if you use the pipeline to pipe text to this command and you have | |
have *multiple* native-program call, only the *first* one will receive the | |
input. | |
.PARAMETER Encoding | |
The character encoding to use as a temporary override while executing the | |
command(s). | |
You may pass a [System.Text.Encoding] instance directly, a code-page number (e.g. 850), | |
or an encoding name (e.g. 'utf-8'). | |
Additionally, 'ansi' and 'oem' are supported to refer to the system's active ANSI/OEM | |
code page. | |
The resulting encoding is temporarily set as follows: | |
$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = <encoding> | |
Note that $OutputEncoding is also set, to ensure consistency with the console | |
settings, whereas the default $OutputEncoding value differs, except if you use | |
PowerShell (Core) 7+ *and* have *system-wide* UTF-8 support enabled (available in | |
Windows 10). | |
See the NOTES section (Get-Help -Full) for more information. | |
.Parameter WindowsOnly | |
Indicates that the encoding should only be applied when running *on Windows*, | |
which is helpful for programs that only exhibit nonstandard behavior on Windows. | |
For instance, Python works as expected on Unix-like platforms, but unexpectedly | |
uses the active *ANSI* rather than OEM code page on Windows. | |
Using -WindowsOnly allows you to use the same invocation on both platforms, | |
without the need for a conditional | |
.Parameter InputObject | |
An aux. parameter to enable input from the pipeline. | |
Do not use it directly. | |
.EXAMPLE | |
Invoke-WithEncoding -Encoding Ansi -WindowsOnly { python -c "print('eé')" } | |
Calls Python to print an ASCII-range and an accented character, using ANSI | |
encoding to decode the output, which Python unconditionally uses, but only on | |
Windows. | |
.EXAMPLE | |
'eé' | Invoke-WithEncoding -Encoding utf8 { node -pe "require('fs').readFileSync(0).toString().trim()" } | |
Pipes string 'eé' to a Node.js command that simply relays its stdin input to stdout, | |
using UTF-8 encoding to send input and receive output. | |
.EXAMPLE | |
Invoke-WithEncoding -Encoding utf8 { node -pe "'eé'" } | ForEach-Object { | |
$_.ToCharArray().ForEach({ '0x' + ([int] $_).ToString('x') + " ($_)" }) | |
} | |
Calls Node.js to print an ASCII and an accented character, using UTF-8 encoding | |
to decode the output, which Node.js unconditionally uses, and examines the | |
output string's Unicode code points in hex. format. | |
.NOTES | |
Given that most Unix-like system nowadays default to UTF-8 encoding, where | |
no encoding problems are to be expected, this command is primarily useful | |
on Windows. | |
To make a console / Windows Terminal window use UTF-8 consistently, run the | |
following (which you may place in your $PROFILE file): | |
$global:OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = [System.Text.UTF8Encoding]::new() | |
For background information, including how to enable UTF-8 system-wide | |
in Windows 10, see https://stackoverflow.com/a/57134096/45375 | |
#> | |
# ALSO STORED AS A GIST AT: https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19 | |
[CmdletBinding(PositionalBinding = $false)] | |
param( | |
[Parameter(Mandatory, Position = 0)] $Encoding, # [System.Text.Encoding] instance, code-page number, or encoding name. | |
[Parameter(Mandatory, Position = 1)] [scriptblock] $ScriptBlock, | |
[Parameter(ValueFromPipeline)] $InputObject, | |
[switch] $WindowsOnly | |
) | |
Set-StrictMode -Version 1; $ErrorActionPreference = 'Stop' | |
# Prevent direct use of -InputObject. | |
# Note that mistaken attempts to provide *both* pipeline input and use -InputObject will | |
# cause PowerShell itself to complain *for each input object*, with "The input object cannot be bound to any parameters, ..." | |
if (-not $MyInvocation.ExpectingInput -and $InputObject) { Throw "Direct use of -InputObject is not supported. Please use the pipeline." } | |
# Get the active ANSI and OEM encodings. | |
# Note: | |
# * On Windows, we query the *registry* to reliably get the *system locale*'s code pages, given that [cultureinfo]::CurrentCulture.TextInfo.ANSI/OEMCodePage can be *overridden* on a per-user basis (reflect's the user's / thread's culture) | |
# * On Unix, our only option is to use [cultureinfo]::CurrentCulture.TextInfo.ANSI/OEMCodePage | |
$ansiEncoding = if ($env:OS -eq 'Windows_NT') { [System.Text.Encoding]::GetEncoding([int] (Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP).ACP) } else { [Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage) } | |
$oemEncoding = if ($env:OS -eq 'Windows_NT') { [System.Text.Encoding]::GetEncoding([int] (Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage OEMCP).OEMCP) } else { [Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.OEMCodePage) } | |
# Validate the -Encoding argument, if any: | |
if ($null -ne $Encoding -and $Encoding -isnot [System.Text.Encoding]) { | |
# As a courtesy, accept 'ANSI' and 'OEM' to represent the active ANSI / OEM encoding. | |
if ($Encoding -is [string] -and $Encoding -in 'ansi', 'oem') { | |
$Encoding = @{ ansi = $ansiEncoding; oem = $oemEncoding }[$Encoding] | |
} | |
else { | |
# Code-page number or encoding name (e.g., 'unicode', 'utf-8') | |
# As a courtesy, also accept 'utf8' instead of 'utf-8', etc. | |
# NOTE: UTF-32 is NOT supported: it fails on assigning to [Console]::InputEncoding / [Console]::OutputEncoding | |
if ($Encoding -is [string]) { $Encoding = $Encoding -replace '^utf(\d)', 'utf-$1' } | |
if ($Encoding -match '^(utf-|unicode$)' -and $Encoding -ne 'utf-7') { | |
# !! [System.Text.Encoding]::GetEncoding('utf-.*|unicode') calls return an encoding *with BOM*, which we do NOT want. | |
# !! so we explicitly create one without. | |
# !! Note: UTF-32 isn't supported anyway, and identifiers such as 'utf-16be' for BE encodings are seemingly not supported. | |
$Encoding = switch ($Encoding) { | |
'utf-8' { [System.Text.Utf8Encoding]::new() } | |
{ $_ -in 'unicode', 'utf-16', 'utf-16le' } { [System.Text.UnicodeEncoding]::new($false, $false) } | |
default { [System.Text.Encoding]::GetEncoding($Encoding) } | |
} | |
} | |
else { | |
$Encoding = [System.Text.Encoding]::GetEncoding($Encoding) | |
} | |
} | |
} | |
$ignoreEncoding = $WindowsOnly -and $env:OS -ne 'Windows_NT' | |
try { | |
if ($ignoreEncoding) { | |
Write-Verbose "Non-Windows platform: ignoring specified encoding, as requested." | |
} else { | |
Write-Verbose "Temporarily setting encoding to: $($Encoding.WebName)" | |
# Save the currently active encodings for later restoration. | |
$prevIn, $prevOut = [Console]::InputEncoding, [Console]::OutputEncoding | |
# Set in-, output and $OutputEncoding to the specified encoding. | |
$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = $Encoding | |
} | |
# Note: | |
# * Since this is an *advanced* function and there is no process {} | |
# block, $input is an [object[]] array that is empty if there's no | |
# pipeline input. | |
if ($Input) { | |
# There is pipeline input: we must patch it into the script block. | |
# Note: In order to patch the equivalent of `$Input | ...` into the | |
# the script block, we simply stringify and invoke the patched | |
# command with Invoke-Expression - hypothetically, state from | |
# the original script block could be lost in this implicit re-creation, | |
# but this is probably not a real-world concern. | |
$collectedInput = $Input # We must use an aux. variable, because $Input is redefined in the Invoke-Expression context. | |
# !! Do NOT try to force enumeration with @($Input), as we want to preserve *streaming* behavior. | |
$ScriptBlock = { Invoke-Expression ('$collectedInput | {0}' -f $ScriptBlock.ToString()) } | |
} | |
# * Invoke in a *streaming* manner, so as to also support indefinitely | |
# running external programs that periodically produce output, such as | |
# `mosquitto_sub` | |
# * This precludes using $output = & $ScriptBlock | |
# * Streaming requires us to output lines *as they're received*, | |
# yet we don't want to just pass them through as-is, as that could cause | |
# the false appearance that everything is fine on Windows with CLIs that | |
# use WriteConsole() with Unicode support when stdout is directly connected | |
# to a console. | |
# * Therefore, we must force *decoding* of each line, which we can simply | |
# achieve by enclosing it in (...) | |
# * On Unix, it is additionally necessary to restore the original | |
# console output encoding *before* outputting the decoded output - otherwise | |
# even correctly decoded input will *print incorrectly* or vice versa. | |
& $ScriptBlock | ForEach-Object { | |
if (-not $ignoreEncoding) { [Console]::OutputEncoding = $prevOut } # To ensure correct *printing to the terminal* (strictly speaking needed on Unix only), temporarily revert to the original output encoding. | |
($_) | |
if (-not $ignoreEncoding) { [Console]::OutputEncoding = $Encoding } # Restore the target encoding *for decoding* for the next output line. | |
} | |
} | |
finally { # This should also cover aborting with ^C | |
if (-not $ignoreEncoding) { | |
# Restore original encodings. | |
# Note: No need to restore $OutputEncoding - it was set as a *local* | |
# variable only that will go out of scope automatically. | |
[Console]::InputEncoding, [Console]::OutputEncoding = $prevIn, $prevOut | |
} | |
} | |
} # end of function | |
# -------------------------------- | |
# GENERIC INSTALLATION HELPER CODE | |
# -------------------------------- | |
# Provides guidance for making the function persistently available when | |
# this script is either directly invoked from the originating Gist or | |
# dot-sourced after download. | |
# IMPORTANT: | |
# * DO NOT USE `exit` in the code below, because it would exit | |
# the calling shell when Invoke-Expression is used to directly | |
# execute this script's content from GitHub. | |
# * Because the typical invocation is DOT-SOURCED (via Invoke-Expression), | |
# do not define variables or alter the session state via Set-StrictMode, ... | |
# *except in child scopes*, via & { ... } | |
if ($MyInvocation.Line -eq '') { | |
# Most likely, this code is being executed via Invoke-Expression directly | |
# from gist.github.com | |
# To simulate for testing with a local script, use the following: | |
# Note: Be sure to use a path and to use "/" as the separator. | |
# iex (Get-Content -Raw ./script.ps1) | |
# Derive the function name from the invocation command, via the enclosing | |
# script name presumed to be contained in the URL. | |
# NOTE: Unfortunately, when invoked via Invoke-Expression, $MyInvocation.MyCommand.ScriptBlock | |
# with the actual script content is NOT available, so we cannot extract | |
# the function name this way. | |
& { | |
param($invocationCmdLine) | |
# Try to extract the function name from the URL. | |
$funcName = $invocationCmdLine -replace '^.+/(.+?)(?:\.ps1).*$', '$1' | |
if ($funcName -eq $invocationCmdLine) { | |
# Function name could not be extracted, just provide a generic message. | |
# Note: Hypothetically, we could try to extract the Gist ID from the URL | |
# and use the REST API to determine the first filename. | |
Write-Verbose -Verbose "Function is now defined in this session." | |
} | |
else { | |
# Indicate that the function is now defined and also show how to | |
# add it to the $PROFILE or convert it to a script file. | |
Write-Verbose -Verbose @" | |
Function `"$funcName`" is now defined in this session. | |
* If you want to add this function to your `$PROFILE, run the following: | |
"``nfunction $funcName {``n`${function:$funcName}``n}" | Add-Content `$PROFILE | |
* If you want to convert this function into a script file that you can invoke | |
directly, run: | |
"`${function:$funcName}" | Set-Content $funcName.ps1 -Encoding $('utf8' + ('', 'bom')[[bool] (Get-Variable -ErrorAction Ignore IsCoreCLR -ValueOnly)]) | |
"@ | |
} | |
} $MyInvocation.MyCommand.Definition # Pass the original invocation command line to the script block. | |
} | |
else { | |
# Invocation presumably as a local file after manual download, | |
# either dot-sourced (as it should be) or mistakenly directly. | |
& { | |
param($originalInvocation) | |
# Parse this file to reliably extract the name of the embedded function, | |
# irrespective of the name of the script file. | |
$ast = $originalInvocation.MyCommand.ScriptBlock.Ast | |
$funcName = $ast.Find( { $args[0] -is [System.Management.Automation.Language.FunctionDefinitionAst] }, $false).Name | |
if ($originalInvocation.InvocationName -eq '.') { | |
# Being dot-sourced as a file. | |
# Provide a hint that the function is now loaded and provide | |
# guidance for how to add it to the $PROFILE. | |
Write-Verbose -Verbose @" | |
Function `"$funcName`" is now defined in this session. | |
If you want to add this function to your `$PROFILE, run the following: | |
"``nfunction $funcName {``n`${function:$funcName}``n}" | Add-Content `$PROFILE | |
"@ | |
} | |
else { | |
# Mistakenly directly invoked. | |
# Issue a warning that the function definition didn't effect and | |
# provide guidance for reinvocation and adding to the $PROFILE. | |
Write-Warning @" | |
This script contains a definition for function "$funcName", but this definition | |
only takes effect if you dot-source this script. | |
To define this function for the current session, run: | |
. "$($originalInvocation.MyCommand.Path)" | |
"@ | |
} | |
} $MyInvocation # Pass the original invocation info to the helper script block. | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment