Skip to content

Instantly share code, notes, and snippets.

@PrateekKumarSingh
Last active March 16, 2018 10:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PrateekKumarSingh/9961f6780ae63692e821c662cdc4f11f to your computer and use it in GitHub Desktop.
Save PrateekKumarSingh/9961f6780ae63692e821c662cdc4f11f to your computer and use it in GitHub Desktop.
<#.Synopsis
Returns Summary of a text document.
.DESCRIPTION
Returns the summary of a text document passed to it, depending upon your chosen word limit (Default 100 words).
.PARAMETER File
Text File with the content to summarize.
.PARAMETER WordLimit
Maximum number of words to be allowed in Summary
.EXAMPLE
PS Root\> Get-Summary -File .\Document.txt
“Since my letter [of October 28], the FBI investigation team has been working round-the-clock to process and review a large volume of emails from a device obtained in connection with an unrelated criminal investigation,” Mr. Comey said. It was reported that there were 6,50,000 emails on that laptop.
Provide a path to a text file in the cmdlet and it will generate a summary for you, by default it summarizes upto less than or equal to 100 words.
.EXAMPLE
PS Root\> Get-Summary -File D:\Document.txt -WordLimit 150
“Since my letter [of October 28], the FBI investigation team has been working round-the-clock to process and review a large volume of emails from a device obtained in connection with an unrelated criminal investigation,” Mr. Comey said. It was reported that there were 6,50,000 emails on that laptop. Democratic leader Nancy Pelosi said that after yet another exhaustive review of emails to and from Ms. Clinton, the FBI has again given her a clean bill of health. “The FBI’s findings from its criminal investigation of Hillary Clinton’s secret email server were a damning and unprecedented indictment of her judgment. The FBI found evidence Clinton broke the law, that she placed highly classified national security information at risk and repeatedly lied to the American people about her reckless conduct,” he said.
You can also provide a value to '-WordLimit' parameter to increase or decrease the length of summary.
.EXAMPLE
PS Root\> Get-Summary -File D:\Document.txt -Verbose
“Since my letter [of October 28], the FBI investigation team has been working round-the-clock to process and review a large volume of emails from a device obtained in connection with an unrelated criminal investigation,” Mr. Comey said. It was reported that there were 6,50,000 emails on that laptop.
VERBOSE: Content has been summarized from 875 to 48 Words
Mention a '-Verbose' switch to view summarization ratio, i.e, Original number of words to number of words in Summary.
.EXAMPLE
PS Root\> Get-Summary -FromClipBoard -WordLimit 50
Indian Prime Minister Narendra Modi has won the online reader’s poll for TIME Person of the Year, beating out other world leaders, artists and politicians as the most .
Use -FromClipboard switch to summarize the content copied to clipboard
.INPUTS
None. You cannot pipe objects to Get-Summary.
.LINK
Get-Content
.LINK
http://RidiCurious.com
.NOTES
Author : Prateek Singh
Twitter : @SinghPrateik
Blog : http://RidiCurious.com
#>
Function Get-Summary
{
[cmdletbinding()]
[Alias('Summary')]
[OutputType([String])]
Param(
[Parameter(Position = 0)] [String] $File,
[Parameter(Position = 1)] [Int] $WordLimit = 100,
[switch] $FromClipBoard
)
Begin
{
If ($File)
{
$Content = Get-Content $File
}
elseif ($FromClipBoard)
{
Add-Type -Assembly PresentationCore
$Content = [Windows.clipboard]::GetText()
}
else
{
Write-Host "Please provide a file path or copy content to Clipboard"
}
}
Process
{
$TotalWords = 0
$Summary = @()
#Extracting Best sentences with highest Ranks within the word limit
$BestSentences = Foreach ($Item in (Get-SentenceRank $Content | Sort-Object SentenceScore -Descending))
{
#Condition to limit Total word Count
$TotalWords += $Item.WordCount
If ($TotalWords -gt $WordLimit)
{
break
}
else
{
$Item
}
}
If ($BestSentences)
{
#Constructing a paragraph with sentences in Chronological order
Foreach ($best in (($BestSentences |Sort-Object Linenumber).sentence))
{
If (-not $Best.trim().endswith("."))
{
$Summary += -join ($Best, ".")
}
else
{
$Summary += -join ($Best, "")
}
}
[String]$Summary
Write-Verbose "Content has been summarized from $($Content.split(" ").count) to $(([string]$Summary).split(" ").count) Words"
}
else
{
Write-Warning "Word Limit is too small to summarize the document."
}
}
End
{
}
}
Function Get-Intersection($Sentence1, $Sentence2)
{
$CommonWords = Compare-Object -ReferenceObject $Sentence1 -DifferenceObject $Sentence2 -IncludeEqual |Where-Object {$_.sideindicator -eq '=='} | Select-Object Inputobject -ExpandProperty Inputobject
$CommonWords.Count / ($Sentence1.Count + $Sentence2.Count) / 2
}
Function Get-SentenceRank($Content)
{
$Sentences = $content -split [environment]::NewLine | Where-Object {$_}
$NoOfSentences = $Sentences.count
$values = New-Object 'object[,]' $NoOfSentences, $NoOfSentences
$CommonContentWeight = New-Object double[] $NoOfSentences
#Get important words that where length is greater than 3 to avoid - in, on, of, to, by etc
$FrequencyDistribution = $Content.split(" ") |Where-Object {-not [String]::IsNullOrEmpty($_)} | ForEach-Object {[Regex]::Replace($_, '[^a-zA-Z0-9]', '')} |Group-Object |Sort-Object count -Descending
$ImportantWords = $FrequencyDistribution |Where-Object {$_.name.length -gt 3} | Select-Object @{n = 'ImportanceWeight'; e = {$_.Count * 0.01}}, @{n = 'ImportantWord'; e = {$_.Name}} -First 10
Foreach ($i in (0..($NoOfSentences - 1)))
{
$ImportanceWeight = 0
#Score each Sentence on basis of words common in every other sentence
#More a sentence has common words from all other sentences, more it defines the complete document
Foreach ($j in (0..($NoOfSentences - 1)))
{
$WordsInReferenceSentence = $Sentences[$i].Split(" ") | ForEach-Object {[Regex]::Replace($_, '[^a-zA-Z0-9]', '')}
$WordsInDifferenceSentence = $Sentences[$j].Split(" ") | ForEach-Object {[Regex]::Replace($_, '[^a-zA-Z0-9]', '')}
$CommonContentWeight[$i] = $CommonContentWeight[$i] + (Get-Intersection $WordsInReferenceSentence $WordsInDifferenceSentence)
}
Foreach ($Item in $WordsInReferenceSentence |Select-Object -unique)
{
#Keep adding ImportanceWeight if an Important word found in the sentence
If ($Item -in $ImportantWords.ImportantWord)
{
$ImportanceWeight += ($ImportantWords| Where-Object {$_.ImportantWord -eq $Item}).ImportanceWeight
}
}
''| Select-Object @{n = 'LineNumber'; e = {$i}}, @{n = 'SentenceScore'; e = {"{0:N3}" -f ($CommonContentWeight[$i] + $ImportanceWeight)}} , @{n = 'CommonContentScore'; e = {"{0:N3}" -f $CommonContentWeight[$i]}}, @{n = 'ImportanceScore'; e = {$ImportanceWeight}}, @{n = 'WordCount'; e = {($Sentences[$i].Split(" ")).count}} , @{n = 'Sentence'; e = {$Sentences[$i]}}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment