Instantly share code, notes, and snippets.

Embed
What would you like to do?
Convert-FileEncoding and Get-FileEncoding
<#
.SYNOPSIS
Converts files to the given encoding.
Matches the include pattern recursively under the given path.
.EXAMPLE
Convert-FileEncoding -Include *.js -Path scripts -Encoding UTF8
#>
function Convert-FileEncoding([string]$Include, [string]$Path, [string]$Encoding='UTF8') {
$count = 0
Get-ChildItem -Include $Pattern -Recurse -Path $Path `
| select FullName, @{n='Encoding';e={Get-FileEncoding $_.FullName}} `
| where {$_.Encoding -ne $Encoding} `
| % { (Get-Content $_.FullName) `
| Out-File $_.FullName -Encoding $Encoding; $count++; }
Write-Host "$count $Pattern file(s) converted to $Encoding in $Path."
}
# http://franckrichard.blogspot.com/2010/08/powershell-get-encoding-file-type.html
<#
.SYNOPSIS
Gets file encoding.
.DESCRIPTION
The Get-FileEncoding function determines encoding by looking at Byte Order Mark (BOM).
Based on port of C# code from http://www.west-wind.com/Weblog/posts/197245.aspx
.EXAMPLE
Get-ChildItem *.ps1 | select FullName, @{n='Encoding';e={Get-FileEncoding $_.FullName}} | where {$_.Encoding -ne 'ASCII'}
This command gets ps1 files in current directory where encoding is not ASCII
.EXAMPLE
Get-ChildItem *.ps1 | select FullName, @{n='Encoding';e={Get-FileEncoding $_.FullName}} | where {$_.Encoding -ne 'ASCII'} | foreach {(get-content $_.FullName) | set-content $_.FullName -Encoding ASCII}
Same as previous example but fixes encoding using set-content
# Modified by F.RICHARD August 2010
# add comment + more BOM
# http://unicode.org/faq/utf_bom.html
# http://en.wikipedia.org/wiki/Byte_order_mark
#
# Do this next line before or add function in Profile.ps1
# Import-Module .\Get-FileEncoding.ps1
#>
function Get-FileEncoding
{
[CmdletBinding()]
Param (
[Parameter(Mandatory = $True, ValueFromPipelineByPropertyName = $True)]
[string]$Path
)
[byte[]]$byte = get-content -Encoding byte -ReadCount 4 -TotalCount 4 -Path $Path
#Write-Host Bytes: $byte[0] $byte[1] $byte[2] $byte[3]
# EF BB BF (UTF8)
if ( $byte[0] -eq 0xef -and $byte[1] -eq 0xbb -and $byte[2] -eq 0xbf )
{ Write-Output 'UTF8' }
# FE FF (UTF-16 Big-Endian)
elseif ($byte[0] -eq 0xfe -and $byte[1] -eq 0xff)
{ Write-Output 'Unicode UTF-16 Big-Endian' }
# FF FE (UTF-16 Little-Endian)
elseif ($byte[0] -eq 0xff -and $byte[1] -eq 0xfe)
{ Write-Output 'Unicode UTF-16 Little-Endian' }
# 00 00 FE FF (UTF32 Big-Endian)
elseif ($byte[0] -eq 0 -and $byte[1] -eq 0 -and $byte[2] -eq 0xfe -and $byte[3] -eq 0xff)
{ Write-Output 'UTF32 Big-Endian' }
# FE FF 00 00 (UTF32 Little-Endian)
elseif ($byte[0] -eq 0xfe -and $byte[1] -eq 0xff -and $byte[2] -eq 0 -and $byte[3] -eq 0)
{ Write-Output 'UTF32 Little-Endian' }
# 2B 2F 76 (38 | 38 | 2B | 2F)
elseif ($byte[0] -eq 0x2b -and $byte[1] -eq 0x2f -and $byte[2] -eq 0x76 -and ($byte[3] -eq 0x38 -or $byte[3] -eq 0x39 -or $byte[3] -eq 0x2b -or $byte[3] -eq 0x2f) )
{ Write-Output 'UTF7'}
# F7 64 4C (UTF-1)
elseif ( $byte[0] -eq 0xf7 -and $byte[1] -eq 0x64 -and $byte[2] -eq 0x4c )
{ Write-Output 'UTF-1' }
# DD 73 66 73 (UTF-EBCDIC)
elseif ($byte[0] -eq 0xdd -and $byte[1] -eq 0x73 -and $byte[2] -eq 0x66 -and $byte[3] -eq 0x73)
{ Write-Output 'UTF-EBCDIC' }
# 0E FE FF (SCSU)
elseif ( $byte[0] -eq 0x0e -and $byte[1] -eq 0xfe -and $byte[2] -eq 0xff )
{ Write-Output 'SCSU' }
# FB EE 28 (BOCU-1)
elseif ( $byte[0] -eq 0xfb -and $byte[1] -eq 0xee -and $byte[2] -eq 0x28 )
{ Write-Output 'BOCU-1' }
# 84 31 95 33 (GB-18030)
elseif ($byte[0] -eq 0x84 -and $byte[1] -eq 0x31 -and $byte[2] -eq 0x95 -and $byte[3] -eq 0x33)
{ Write-Output 'GB-18030' }
else
{ Write-Output 'ASCII' }
}
@omidkrad

This comment has been minimized.

Show comment
Hide comment
@omidkrad

omidkrad Oct 6, 2015

Very useful script. Only note that it will report any file with no BOM as ASCII.

omidkrad commented Oct 6, 2015

Very useful script. Only note that it will report any file with no BOM as ASCII.

@honzakuzel1989

This comment has been minimized.

Show comment
Hide comment
@honzakuzel1989

honzakuzel1989 Sep 19, 2016

Hi, nice work,
but I think that you have bug in use $Include parameter in Convert-FileEncoding function. The parameter has name $Include but in Get-ChildItem and Write-Host you use undefined variable $Pattern ($Pattern is empty and Get-ChildItem get all items). You should change $Include parameter to $Pattern and use Convert-FileEncoding -Pattern *.js -Path scripts -Encoding UTF8 in example or rename $Pattern to $Include inside Convert-FileEncoding. In my opinion is the second variant cleaner (it is up to you).

honzakuzel1989 commented Sep 19, 2016

Hi, nice work,
but I think that you have bug in use $Include parameter in Convert-FileEncoding function. The parameter has name $Include but in Get-ChildItem and Write-Host you use undefined variable $Pattern ($Pattern is empty and Get-ChildItem get all items). You should change $Include parameter to $Pattern and use Convert-FileEncoding -Pattern *.js -Path scripts -Encoding UTF8 in example or rename $Pattern to $Include inside Convert-FileEncoding. In my opinion is the second variant cleaner (it is up to you).

@TheIncorrigible1

This comment has been minimized.

Show comment
Hide comment
@TheIncorrigible1

TheIncorrigible1 Dec 22, 2017

You should look into the UTF8 spec and other detection algorithms. Relying on a BOM is not accurate.

TheIncorrigible1 commented Dec 22, 2017

You should look into the UTF8 spec and other detection algorithms. Relying on a BOM is not accurate.

@marckassay

This comment has been minimized.

Show comment
Hide comment
@marckassay

marckassay Jan 5, 2018

Here is an alternative, the value of $Encoding may be 'utf-8'. The hyphen is removed (if there is one) for that it may be passed in for instance Out-File.

$File = Get-Item .\src\app\app.module.ts
$Encoding = New-Object -TypeName System.IO.StreamReader -ArgumentList $File.FullName -OutVariable Stream | `
    Select-Object -ExpandProperty CurrentEncoding | `
    Select-Object -ExpandProperty BodyName
$Stream.Dispose()
if ($Encoding.Contains('-')) {
    $Encoding = $Encoding.Replace('-', '')
}

marckassay commented Jan 5, 2018

Here is an alternative, the value of $Encoding may be 'utf-8'. The hyphen is removed (if there is one) for that it may be passed in for instance Out-File.

$File = Get-Item .\src\app\app.module.ts
$Encoding = New-Object -TypeName System.IO.StreamReader -ArgumentList $File.FullName -OutVariable Stream | `
    Select-Object -ExpandProperty CurrentEncoding | `
    Select-Object -ExpandProperty BodyName
$Stream.Dispose()
if ($Encoding.Contains('-')) {
    $Encoding = $Encoding.Replace('-', '')
}
@markwragg

This comment has been minimized.

Show comment
Hide comment
@markwragg

markwragg Jan 9, 2018

From PowerShell core -Encoding Byte is no longer a valid option. It has been replaced by a new parameter named -AsByteStream:

-- https://www.jonathanmedd.net/2017/12/powershell-core-does-not-have-encoding-byte-replaced-with-new-parameter-asbytestream.html

markwragg commented Jan 9, 2018

From PowerShell core -Encoding Byte is no longer a valid option. It has been replaced by a new parameter named -AsByteStream:

-- https://www.jonathanmedd.net/2017/12/powershell-core-does-not-have-encoding-byte-replaced-with-new-parameter-asbytestream.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment