Skip to content

Instantly share code, notes, and snippets.

@ninmonkey
Last active January 15, 2024 12:20
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ninmonkey/7bf2ac107f92856de09e01af240edf5f to your computer and use it in GitHub Desktop.
Save ninmonkey/7bf2ac107f92856de09e01af240edf5f to your computer and use it in GitHub Desktop.
Inspecting the codepoints, and runes of strings - Pwsh | Powershell

Inspecting Runes in Pwsh

Unicode has a block of characters that are used to safely represent control characters and ANSI escape sequences

the important part is like this Text.Rune enumerator

"πŸ’`a`ts".EnumerateRunes() | Foreach-Object { 
    $isCtrlChar = $_.Value -ge 0 -and $_.Value -le 0x1f
    [Text.Rune]$DisplayedRune = $isCtrlChar ? [Text.Rune]::New( $_.Value + 0x2400 ) : $_
    if (-not $isCtrlChar) { 
        return $_
    }
    $DisplayedRune
} |  ft -auto

outputs the safe value in the Render column

IsAscii IsBmp Plane Utf16SequenceLength Utf8SequenceLength  Value Render Hex
------- ----- ----- ------------------- ------------------  ----- ------ ---
  False False     1                   2                  4 128018 πŸ’     0x1f412
  False  True     0                   1                  3   9223 ␇      0x2407
  False  True     0                   1                  3   9225 ␉      0x2409
   True  True     0                   1                  1    115 s      0x73

Examples

Pwsh> 'πŸ‘¨β€πŸ‘¨β€πŸ‘¦β€πŸ‘¦'.EnumerateRunes() | Ft Value, Render, Hex

Value Render Hex
----- ------ ---
128104 πŸ‘¨     0x1f468
  8205 ‍      0x200d
128104 πŸ‘¨     0x1f468
  8205 ‍      0x200d
128102 πŸ‘¦     0x1f466
  8205 ‍      0x200d
128102 πŸ‘¦     0x1f466
Pwsh> "πŸ’ hi `n`t`u{0}world"| inspectRunes

0x1f412 = "πŸ’"
0x20 = " "
0x68 = "h"
0x69 = "i"
0x20 = " "
0xa = "␊"
0x9 = "␉"
0x0 = "␀"
0x77 = "w"
0x6f = "o"
0x72 = "r"
0x6c = "l"
0x64 = "d"
$sample0 = 'hi πŸ΅πŸ’ world'
$sample1 = "`0 `u{0} `r`n`t"
$Sample2 = "hi`n`t${fg:red}wor${fg:clear}ld!"
$Sample3 = @(
    'hi'
    $PSStyle.Foreground.FromRgb(233, 45, 65)
    "`a `0 `u{0}"  # mixing in all kinds of control characters
    'world'
    $PSStyle.Reset
    '!'
) -join ''

Finding non-ascii chars

$sample0.EnumerateRunes()
| ? -not IsAscii  | ft Value, Render, Hex

# output:

 Value Render Hex
 ----- ------ ---
128053 🐡     0x1f435
128018 πŸ’     0x1f412

Zero width joiners combine multiple runes as a grapheme

Pwsh> 'πŸ‘¨β€πŸ‘¨β€πŸ‘¦β€πŸ‘¦'.EnumerateRunes() | Ft

IsAscii IsBmp Plane Utf16SequenceLength Utf8SequenceLength  Value Render Hex
------- ----- ----- ------------------- ------------------  ----- ------ ---
  False False     1                   2                  4 128104 πŸ‘¨     0x1f468
  False  True     0                   1                  3   8205 ‍      0x200d
  False False     1                   2                  4 128104 πŸ‘¨     0x1f468
  False  True     0                   1                  3   8205 ‍      0x200d
  False False     1                   2                  4 128102 πŸ‘¦     0x1f466
  False  True     0                   1                  3   8205 ‍      0x200d
  False False     1                   2                  4 128102 πŸ‘¦     0x1f466

Big one, Ensuring ansi escapes for colors, are safe to print:

$sample3.EnumerateRunes() | ft Value, Render, Hex, IsAscii

Value Render Hex  IsAscii
----- ------ ---  -------
  104 h      0x68    True
  105 i      0x69    True
   27 ␛      0x1b    True
   91 [      0x5b    True
   51 3      0x33    True
   56 8      0x38    True
   59 ;      0x3b    True
   50 2      0x32    True
   59 ;      0x3b    True
   50 2      0x32    True
   51 3      0x33    True
   51 3      0x33    True
   59 ;      0x3b    True
   52 4      0x34    True
   53 5      0x35    True
   59 ;      0x3b    True
   54 6      0x36    True
   53 5      0x35    True
  109 m      0x6d    True
    7 ␇      0x7     True
   32        0x20    True
    0 ␀      0x0     True
   32        0x20    True
    0 ␀      0x0     True
  119 w      0x77    True
  111 o      0x6f    True
  114 r      0x72    True
  108 l      0x6c    True
  100 d      0x64    True
   27 ␛      0x1b    True
   91 [      0x5b    True
   48 0      0x30    True
  109 m      0x6d    True
   33 !      0x21    True
function inspectRunes {
<#
.synopsis
minimal sugar to see codepoints
.EXAMPLE
$Str1 = 'πŸ’'
$str1 | inspectRunes.prof
#>
param(
# default will replace control chars to text. ie: You can pipe it safely
[switch]$WithRawCtrlChar
# ,[switch]$PassThru
)
process {
$_.EnumerateRunes() | ForEach-Object {
$isCtrlChar = $_.Value -ge 0 -and $_.Value -le 0x1f
$Rune = $isCtrlChar ? [Text.Rune]::New($_.Value + 0x2400) : $_
'0x{0:x} = "{1}"' -f @(
$_.Value
$Rune
) }
}
}
updateTypeDataSplat = @{
TypeName = "System.Text.Rune"
MemberType = 'ScriptProperty'
MemberName = "Render"
}
Update-TypeData @updateTypeDataSplat -Force -Value {
$isCtrlChar = $this.Value -ge 0 -and $this.Value -le 0x1f
if (-not $isCtrlChar) {
return $this.ToString()
}
$Rune = $isCtrlChar ? [Text.Rune]::New($this.Value + 0x2400 ) : $this
return $Rune.ToString()
}
$updateTypeDataSplat = @{
TypeName = "System.Text.Rune"
MemberType = 'ScriptProperty'
MemberName = "Hex"
}
Update-TypeData @updateTypeDataSplat -Force -Value {
'0x{0:x}' -f @($This.Value)
}
'sdfds.'.EnumerateRunes() | Ft
$sample0 = 'hi πŸ΅πŸ’ world'
$sample1 = "`0 `u{0} `r`n`t"
$Sample2 = "hi`n`t${fg:red}wor${fg:clear}ld!"
Label 'Rune | ? ' 'not IsAscii'
$sample0 | %{
$_.EnumerateRunes()
}
| ? -not IsAscii | ft
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment