Skip to content

Instantly share code, notes, and snippets.

@thedavecarroll
Last active February 2, 2020 21:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save thedavecarroll/8e0de90d498308381ac38c3b2003c9ac to your computer and use it in GitHub Desktop.
Save thedavecarroll/8e0de90d498308381ac38c3b2003c9ac to your computer and use it in GitHub Desktop.
Test Links in Markdown Files with PowerShell (and Regex)
@thedavecarroll
Copy link
Author

Updated to properly handle inline links without http/https at the beginning of the link.

Both of these should work now.

this is a [test link](powershell.org)
[GitHub](https://github.com/thedavecarroll){:target="_blank"}

And this shouldn't.

subtle comment on artificial intelligence (AI).

@thedavecarroll
Copy link
Author

thedavecarroll commented Feb 1, 2020

Introduction

I recently needed to verify links in several markdown files so I wrote this function.

Knowing that regex would be the way to go, I started building my regex statement for each link type. I'm using named groups in each statement which helps with processing. If you haven't used named groups in regex, you should check it out.

Originally, I was checking each line for each of the link types but, during some of the testing, I eventually switched to matching against the entire file. Then for each match, I'm looping through each line to find the line number where it starts. I did look into using regex to generate the numbers, but it's not an easy thing to do (for regex novice). I already had a headache and that would have compounded it, I'm sure.

Output

[PsCustomObject]

Property Description
Name name of the file
FullName fullname of the file
LineNumber the line number where the link was found
Line the full line in the file where the link was found
LinkType the link type (see below)
Url the url or link
StatusCode Skipped or status returned from Invoke-WebRequest

-Verbose

VERBOSE: Found # links in <full file name>
VERBOSE: Inline(#) : InlineWithTitle(#) : AngleBracket(#) : Reference(#) : Relative(#)

Link Types

Here are the link types, that can now be captured and verified, with sample output.

Inline

[I'm an inline-style link](https://www.google.com)
Name       : MarkdownTest.md
FullName   : D:\Development\PowerShell\MarkdownTest.md
LineNumber : 1
Line       : [I'm an inline-style link](https://www.google.com)
Url        : https://www.google.com
LinkType   : Inline
StatusCode : 200

InlineWithTitle

[I'm an inline-style link with title](https://www.google.com "Google's Homepage")
Name       : MarkdownTest.md
FullName   : D:\Development\PowerShell\MarkdownTest.md
LineNumber : 3
Line       : [I'm an inline-style link with title](https://www.google.com "Google's Homepage")
Url        : https://www.google.com
LinkType   : InlineWithTitle
StatusCode : 200

Reference

[arbitrary case-insensitive reference text]: https://www.mozilla.org
[1]: http://slashdot.org
[link text itself]: http://www.reddit.com
Name       : MarkdownTest.md
FullName   : D:\Development\PowerShell\MarkdownTest.md
LineNumber : 20
Line       : [1]: http://slashdot.org
Url        : http://slashdot.org
LinkType   : Reference
StatusCode : 200

Relative

[I'm a relative reference to a repository file](../blob/master/LICENSE)
FullName   : D:\Development\PowerShell\MarkdownTest.md
LineNumber : 7
Line       : [I'm a relative reference to a repository file](../blob/master/LICENSE)
Url        : ../blob/master/LICENSE
LinkType   : Relative
StatusCode : Skipped

AngleBracket

http://www.example.com or <http://www.example.com> and sometimes
Name       : MarkdownTest.md
FullName   : D:\Development\PowerShell\MarkdownTest.md
LineNumber : 16
Line       : http://www.example.com or <http://www.example.com> and sometimes
Url        : http://www.example.com
LinkType   : AngleBracket
StatusCode : 200

RawUrl

I had to remove the RawUrl checking from the function as it was producing too many false positives.

HTML Comments

When I added the AngleBracket link type, I failed to consider HTML comments. They are now ignored with an updated Regex for this link type.

Example

<!-- vale Microsoft.We = YES -->

Summary

If you find some links that are not caught or false positives, please leave a comment with an example of a full line that contains the link. If I have time, I will work on correcting that case.

I hope you found this function useful, and if so, please leave a comment.

Notes

  • This function has been tested with Windows PowerShell 5.1 and PowerShell 7 RC2.
  • Performance issues could arise while scanning a large number of files and content.
  • As with most freely shared code, this comes as-is and without warranty or guarantee.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment