Skip to content

Instantly share code, notes, and snippets.

@angel-vladov
Last active January 9, 2024 15:41
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save angel-vladov/9482676 to your computer and use it in GitHub Desktop.
Save angel-vladov/9482676 to your computer and use it in GitHub Desktop.
PowerShell function you can use for reading UTF8 encoded HTML pages content. The built in Invoke-WebRequest and Invoke-RestMethod fail miserably.
function Read-HtmlPage {
param ([Parameter(Mandatory=$true, Position=0, ValueFromPipeline=$true)][String] $Uri)
# Invoke-WebRequest and Invoke-RestMethod can't work properly with UTF-8 Response so we need to do things this way.
[Net.HttpWebRequest]$WebRequest = [Net.WebRequest]::Create($Uri)
[Net.HttpWebResponse]$WebResponse = $WebRequest.GetResponse()
$Reader = New-Object IO.StreamReader($WebResponse.GetResponseStream())
$Response = $Reader.ReadToEnd()
$Reader.Close()
# Create the document class
[mshtml.HTMLDocumentClass] $Doc = New-Object -com "HTMLFILE"
$Doc.IHTMLDocument2_write($Response)
# Returns a HTMLDocumentClass instance just like Invoke-WebRequest ParsedHtml
$Doc
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment