Skip to content

Instantly share code, notes, and snippets.

@mklement0
Last active September 7, 2023 18:59
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mklement0/ef57aea441ea8bd43387a7d7edfc6c19 to your computer and use it in GitHub Desktop.
Save mklement0/ef57aea441ea8bd43387a7d7edfc6c19 to your computer and use it in GitHub Desktop.
PowerShell function for invoking native (external) programs with a specified character encoding
<#
Prerequisites: PowerShell v3+
License: MIT
Author: Michael Klement <mklement0@gmail.com>
DOWNLOAD and DEFINITION OF THE FUNCTION:
irm https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19/raw/Invoke-WithEncoding.ps1 | iex
The above directly defines the function below in your session and offers guidance for making it available in future
sessions too.
DOWNLOAD ONLY:
irm https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19/raw > Invoke-WithEncoding.ps1
The above downloads to the specified file, which you then need to dot-source to make the function available
in the current session:
. ./Invoke-WithEncoding.ps1
To learn what the function does:
* see the next comment block
* or, once downloaded and defined, invoke the function with -? or pass its name to Get-Help.
To define an ALIAS for the function, (also) add something like the following to your $PROFILE:
Set-Alias ien Invoke-WithEncoding
#>
function Invoke-WithEncoding {
<#
.SYNOPSIS
Invokes a native (external) program with the specified character encoding.
.DESCRIPTION
Invokes a native (external) program using the specified encoding to both
send data to and receive data from via the pipeline.
Note:
* Even though there's no formal parameter, pipeline input *is* supported.
* However, for technical reasons all pipelne in put is *collected in full*
first, and so is all output.
* This command ensures that decoding of the native program output into .NET
string is performed, as would invariably happen on capturing output or
piping output to a different command, so that, on Windows, encoding
mismatches aren't masked by direct-to-console output printing correctly.
The previous encoding settings are restored when this command exits.
.PARAMETER ScriptBlock
The script block containing the native-program call(s) to perform.
Note that if you use the pipeline to pipe text to this command and you have
have *multiple* native-program call, only the *first* one will receive the
input.
.PARAMETER Encoding
The character encoding to use as a temporary override while executing the
command(s).
You may pass a [System.Text.Encoding] instance directly, a code-page number (e.g. 850),
or an encoding name (e.g. 'utf-8').
Additionally, 'ansi' and 'oem' are supported to refer to the system's active ANSI/OEM
code page.
The resulting encoding is temporarily set as follows:
$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = <encoding>
Note that $OutputEncoding is also set, to ensure consistency with the console
settings, whereas the default $OutputEncoding value differs, except if you use
PowerShell (Core) 7+ *and* have *system-wide* UTF-8 support enabled (available in
Windows 10).
See the NOTES section (Get-Help -Full) for more information.
.Parameter WindowsOnly
Indicates that the encoding should only be applied when running *on Windows*,
which is helpful for programs that only exhibit nonstandard behavior on Windows.
For instance, Python works as expected on Unix-like platforms, but unexpectedly
uses the active *ANSI* rather than OEM code page on Windows.
Using -WindowsOnly allows you to use the same invocation on both platforms,
without the need for a conditional
.Parameter InputObject
An aux. parameter to enable input from the pipeline.
Do not use it directly.
.EXAMPLE
Invoke-WithEncoding -Encoding Ansi -WindowsOnly { python -c "print('eé')" }
Calls Python to print an ASCII-range and an accented character, using ANSI
encoding to decode the output, which Python unconditionally uses, but only on
Windows.
.EXAMPLE
'eé' | Invoke-WithEncoding -Encoding utf8 { node -pe "require('fs').readFileSync(0).toString().trim()" }
Pipes string 'eé' to a Node.js command that simply relays its stdin input to stdout,
using UTF-8 encoding to send input and receive output.
.EXAMPLE
Invoke-WithEncoding -Encoding utf8 { node -pe "'eé'" } | ForEach-Object {
$_.ToCharArray().ForEach({ '0x' + ([int] $_).ToString('x') + " ($_)" })
}
Calls Node.js to print an ASCII and an accented character, using UTF-8 encoding
to decode the output, which Node.js unconditionally uses, and examines the
output string's Unicode code points in hex. format.
.NOTES
Given that most Unix-like system nowadays default to UTF-8 encoding, where
no encoding problems are to be expected, this command is primarily useful
on Windows.
To make a console / Windows Terminal window use UTF-8 consistently, run the
following (which you may place in your $PROFILE file):
$global:OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
For background information, including how to enable UTF-8 system-wide
in Windows 10, see https://stackoverflow.com/a/57134096/45375
#>
# ALSO STORED AS A GIST AT: https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19
[CmdletBinding(PositionalBinding = $false)]
param(
[Parameter(Mandatory, Position = 0)] $Encoding, # [System.Text.Encoding] instance, code-page number, or encoding name.
[Parameter(Mandatory, Position = 1)] [scriptblock] $ScriptBlock,
[Parameter(ValueFromPipeline)] $InputObject,
[switch] $WindowsOnly
)
Set-StrictMode -Version 1; $ErrorActionPreference = 'Stop'
# Prevent direct use of -InputObject.
# Note that mistaken attempts to provide *both* pipeline input and use -InputObject will
# cause PowerShell itself to complain *for each input object*, with "The input object cannot be bound to any parameters, ..."
if (-not $MyInvocation.ExpectingInput -and $InputObject) { Throw "Direct use of -InputObject is not supported. Please use the pipeline." }
# Get the active ANSI and OEM encodings.
# Note:
# * On Windows, we query the *registry* to reliably get the *system locale*'s code pages, given that [cultureinfo]::CurrentCulture.TextInfo.ANSI/OEMCodePage can be *overridden* on a per-user basis (reflect's the user's / thread's culture)
# * On Unix, our only option is to use [cultureinfo]::CurrentCulture.TextInfo.ANSI/OEMCodePage
$ansiEncoding = if ($env:OS -eq 'Windows_NT') { [System.Text.Encoding]::GetEncoding([int] (Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP).ACP) } else { [Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage) }
$oemEncoding = if ($env:OS -eq 'Windows_NT') { [System.Text.Encoding]::GetEncoding([int] (Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage OEMCP).OEMCP) } else { [Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.OEMCodePage) }
# Validate the -Encoding argument, if any:
if ($null -ne $Encoding -and $Encoding -isnot [System.Text.Encoding]) {
# As a courtesy, accept 'ANSI' and 'OEM' to represent the active ANSI / OEM encoding.
if ($Encoding -is [string] -and $Encoding -in 'ansi', 'oem') {
$Encoding = @{ ansi = $ansiEncoding; oem = $oemEncoding }[$Encoding]
}
else {
# Code-page number or encoding name (e.g., 'unicode', 'utf-8')
# As a courtesy, also accept 'utf8' instead of 'utf-8', etc.
# NOTE: UTF-32 is NOT supported: it fails on assigning to [Console]::InputEncoding / [Console]::OutputEncoding
if ($Encoding -is [string]) { $Encoding = $Encoding -replace '^utf(\d)', 'utf-$1' }
if ($Encoding -match '^(utf-|unicode$)' -and $Encoding -ne 'utf-7') {
# !! [System.Text.Encoding]::GetEncoding('utf-.*|unicode') calls return an encoding *with BOM*, which we do NOT want.
# !! so we explicitly create one without.
# !! Note: UTF-32 isn't supported anyway, and identifiers such as 'utf-16be' for BE encodings are seemingly not supported.
$Encoding = switch ($Encoding) {
'utf-8' { [System.Text.Utf8Encoding]::new() }
{ $_ -in 'unicode', 'utf-16', 'utf-16le' } { [System.Text.UnicodeEncoding]::new($false, $false) }
default { [System.Text.Encoding]::GetEncoding($Encoding) }
}
}
else {
$Encoding = [System.Text.Encoding]::GetEncoding($Encoding)
}
}
}
$ignoreEncoding = $WindowsOnly -and $env:OS -ne 'Windows_NT'
try {
if ($ignoreEncoding) {
Write-Verbose "Non-Windows platform: ignoring specified encoding, as requested."
} else {
Write-Verbose "Temporarily setting encoding to: $($Encoding.WebName)"
# Save the currently active encodings for later restoration.
$prevIn, $prevOut = [Console]::InputEncoding, [Console]::OutputEncoding
# Set in-, output and $OutputEncoding to the specified encoding.
$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = $Encoding
}
# Note:
# * Since this is an *advanced* function and there is no process {}
# block, $input is an [object[]] array that is empty if there's no
# pipeline input.
if ($Input) {
# There is pipeline input: we must patch it into the script block.
# Note: In order to patch the equivalent of `$Input | ...` into the
# the script block, we simply stringify and invoke the patched
# command with Invoke-Expression - hypothetically, state from
# the original script block could be lost in this implicit re-creation,
# but this is probably not a real-world concern.
$collectedInput = $Input # We must use an aux. variable, because $Input is redefined in the Invoke-Expression context.
# !! Do NOT try to force enumeration with @($Input), as we want to preserve *streaming* behavior.
$ScriptBlock = { Invoke-Expression ('$collectedInput | {0}' -f $ScriptBlock.ToString()) }
}
# * Invoke in a *streaming* manner, so as to also support indefinitely
# running external programs that periodically produce output, such as
# `mosquitto_sub`
# * This precludes using $output = & $ScriptBlock
# * Streaming requires us to output lines *as they're received*,
# yet we don't want to just pass them through as-is, as that could cause
# the false appearance that everything is fine on Windows with CLIs that
# use WriteConsole() with Unicode support when stdout is directly connected
# to a console.
# * Therefore, we must force *decoding* of each line, which we can simply
# achieve by enclosing it in (...)
# * On Unix, it is additionally necessary to restore the original
# console output encoding *before* outputting the decoded output - otherwise
# even correctly decoded input will *print incorrectly* or vice versa.
& $ScriptBlock | ForEach-Object {
if (-not $ignoreEncoding) { [Console]::OutputEncoding = $prevOut } # To ensure correct *printing to the terminal* (strictly speaking needed on Unix only), temporarily revert to the original output encoding.
($_)
if (-not $ignoreEncoding) { [Console]::OutputEncoding = $Encoding } # Restore the target encoding *for decoding* for the next output line.
}
}
finally { # This should also cover aborting with ^C
if (-not $ignoreEncoding) {
# Restore original encodings.
# Note: No need to restore $OutputEncoding - it was set as a *local*
# variable only that will go out of scope automatically.
[Console]::InputEncoding, [Console]::OutputEncoding = $prevIn, $prevOut
}
}
} # end of function
# --------------------------------
# GENERIC INSTALLATION HELPER CODE
# --------------------------------
# Provides guidance for making the function persistently available when
# this script is either directly invoked from the originating Gist or
# dot-sourced after download.
# IMPORTANT:
# * DO NOT USE `exit` in the code below, because it would exit
# the calling shell when Invoke-Expression is used to directly
# execute this script's content from GitHub.
# * Because the typical invocation is DOT-SOURCED (via Invoke-Expression),
# do not define variables or alter the session state via Set-StrictMode, ...
# *except in child scopes*, via & { ... }
if ($MyInvocation.Line -eq '') {
# Most likely, this code is being executed via Invoke-Expression directly
# from gist.github.com
# To simulate for testing with a local script, use the following:
# Note: Be sure to use a path and to use "/" as the separator.
# iex (Get-Content -Raw ./script.ps1)
# Derive the function name from the invocation command, via the enclosing
# script name presumed to be contained in the URL.
# NOTE: Unfortunately, when invoked via Invoke-Expression, $MyInvocation.MyCommand.ScriptBlock
# with the actual script content is NOT available, so we cannot extract
# the function name this way.
& {
param($invocationCmdLine)
# Try to extract the function name from the URL.
$funcName = $invocationCmdLine -replace '^.+/(.+?)(?:\.ps1).*$', '$1'
if ($funcName -eq $invocationCmdLine) {
# Function name could not be extracted, just provide a generic message.
# Note: Hypothetically, we could try to extract the Gist ID from the URL
# and use the REST API to determine the first filename.
Write-Verbose -Verbose "Function is now defined in this session."
}
else {
# Indicate that the function is now defined and also show how to
# add it to the $PROFILE or convert it to a script file.
Write-Verbose -Verbose @"
Function `"$funcName`" is now defined in this session.
* If you want to add this function to your `$PROFILE, run the following:
"``nfunction $funcName {``n`${function:$funcName}``n}" | Add-Content `$PROFILE
* If you want to convert this function into a script file that you can invoke
directly, run:
"`${function:$funcName}" | Set-Content $funcName.ps1 -Encoding $('utf8' + ('', 'bom')[[bool] (Get-Variable -ErrorAction Ignore IsCoreCLR -ValueOnly)])
"@
}
} $MyInvocation.MyCommand.Definition # Pass the original invocation command line to the script block.
}
else {
# Invocation presumably as a local file after manual download,
# either dot-sourced (as it should be) or mistakenly directly.
& {
param($originalInvocation)
# Parse this file to reliably extract the name of the embedded function,
# irrespective of the name of the script file.
$ast = $originalInvocation.MyCommand.ScriptBlock.Ast
$funcName = $ast.Find( { $args[0] -is [System.Management.Automation.Language.FunctionDefinitionAst] }, $false).Name
if ($originalInvocation.InvocationName -eq '.') {
# Being dot-sourced as a file.
# Provide a hint that the function is now loaded and provide
# guidance for how to add it to the $PROFILE.
Write-Verbose -Verbose @"
Function `"$funcName`" is now defined in this session.
If you want to add this function to your `$PROFILE, run the following:
"``nfunction $funcName {``n`${function:$funcName}``n}" | Add-Content `$PROFILE
"@
}
else {
# Mistakenly directly invoked.
# Issue a warning that the function definition didn't effect and
# provide guidance for reinvocation and adding to the $PROFILE.
Write-Warning @"
This script contains a definition for function "$funcName", but this definition
only takes effect if you dot-source this script.
To define this function for the current session, run:
. "$($originalInvocation.MyCommand.Path)"
"@
}
} $MyInvocation # Pass the original invocation info to the helper script block.
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment