Skip to content

Instantly share code, notes, and snippets.

@ScriptAutomate
Forked from mklement0/Out-FileUtf8NoBom.ps1
Last active February 3, 2022 18:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ScriptAutomate/0c53db9328ab329101f6bd6fba436f6f to your computer and use it in GitHub Desktop.
Save ScriptAutomate/0c53db9328ab329101f6bd6fba436f6f to your computer and use it in GitHub Desktop.
PowerShell function that emulates Out-File for creating UTF-8-encoded files *without a BOM* (byte-order mark).
<#
Prerequisites: PowerShell version 3 or above.
License: MIT
Author: Michael Klement <mklement0@gmail.com>
#>
function Out-FileUtf8NoBom {
<#
.SYNOPSIS
Outputs to a UTF-8-encoded file *without a BOM* (byte-order mark).
.DESCRIPTION
Mimics the most important aspects of Out-File:
* Input objects are sent to Out-String first.
* -Append allows you to append to an existing file, -NoClobber prevents
overwriting of an existing file.
* -Width allows you to specify the line width for the text representations
of input objects that aren't strings.
However, it is not a complete implementation of all Out-File parameters:
* Only a literal output path is supported, and only as a parameter.
* -Force is not supported.
* Conversely, an extra -UseLF switch is supported for using LF-only newlines.
Caveat: *All* pipeline input is buffered before writing output starts,
but the string representations are generated and written to the target
file one by one.
.NOTES
The raison d'être for this advanced function is that Windows PowerShell
lacks the ability to write UTF-8 files without a BOM: using -Encoding UTF8
invariably prepends a BOM.
Copyright (c) 2017, 2020 Michael Klement <mklement0@gmail.com> (http://same2u.net),
released under the [MIT license](https://spdx.org/licenses/MIT#licenseText).
#>
[CmdletBinding()]
param(
[Parameter(Mandatory, Position=0)] [string] $LiteralPath,
[switch] $Append,
[switch] $NoClobber,
[AllowNull()] [int] $Width,
[switch] $UseLF,
[Parameter(ValueFromPipeline)] $InputObject
)
#requires -version 3
# Convert the input path to a full one, since .NET's working dir. usually
# differs from PowerShell's.
$dir = Split-Path -LiteralPath $LiteralPath
if ($dir) { $dir = Convert-Path -ErrorAction Stop -LiteralPath $dir } else { $dir = $pwd.ProviderPath}
$LiteralPath = [IO.Path]::Combine($dir, [IO.Path]::GetFileName($LiteralPath))
# If -NoClobber was specified, throw an exception if the target file already
# exists.
if ($NoClobber -and (Test-Path $LiteralPath)) {
Throw [IO.IOException] "The file '$LiteralPath' already exists."
}
# Create a StreamWriter object.
# Note that we take advantage of the fact that the StreamWriter class by default:
# - uses UTF-8 encoding
# - without a BOM.
$sw = New-Object System.IO.StreamWriter $LiteralPath, $Append
$htOutStringArgs = @{}
if ($Width) {
$htOutStringArgs += @{ Width = $Width }
}
# Note: By not using begin / process / end blocks, we're effectively running
# in the end block, which means that all pipeline input has already
# been collected in automatic variable $Input.
# We must use this approach, because using | Out-String individually
# in each iteration of a process block would format each input object
# with an indvidual header.
try {
$Input | Out-String -Stream @htOutStringArgs | % {
if ($UseLf) {
$sw.Write($_ + "`n")
}
else {
$sw.WriteLine($_)
}
}
} finally {
$sw.Dispose()
}
}
@ScriptAutomate
Copy link
Author

ScriptAutomate commented Feb 3, 2022

For convenience, here's advanced function Out-FileUtf8NoBom, a pipeline-based alternative that mimics Out-File, which means:

    you can use it just like Out-File in a pipeline.
    input objects that aren't strings are formatted as they would be if you sent them to the console, just like with Out-File.
    an additional -UseLF switch allows you transform Windows-style CRLF newlines to Unix-style LF-only newlines.

Example:

(Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath # Add -UseLF for Unix newlines

Note how (Get-Content $MyPath) is enclosed in (...), which ensures that the entire file is opened, read in full, and closed before sending the result through the pipeline. This is necessary in order to be able to write back to the same file (update it in place).
Generally, though, this technique is not advisable for 2 reasons: (a) the whole file must fit into memory and (b) if the command is interrupted, data will be lost.

Related

Original answer I used to use:

The benefit of the script in this gist is that -UseLF is an option, which is really helpful, vs. merely UTF8 w/o BOM

@ScriptAutomate
Copy link
Author

Configure git for more powers

Other information that may be helpful in git repos that are used on Windows systems:

  1. Run git command in root of your repo
# Run this on Windows systems with repos that use tools expecting LF, too
git config core.autocrlf input

You can optionally run git config --global core.autocrlf input, too, though this will have git treat all of your repos this way locally on your system.

  1. Create .gitattibutes in root of repo
# In .gitattributes in root
* text=auto
*.md text eol=lf
*.yml text eol=lf
  1. Refresh your repo files with new line endings
git add . -u
git commit -m "Saving files before refreshing line endings"
git add --renormalize .
git status # Shows rewritten/normalized files
git commit -m "Normalize all the line endings"

Related

More info: GitHub Docs: Configuring Git to handle line endings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment