Skip to content

Instantly share code, notes, and snippets.

@peaeater
peaeater / csv-split.ps1
Created March 12, 2024 00:27
Split csv files with or without header rows, multiline columns with newlines ok.
<#
Split csv files with or without header rows, multiline columns with newlines ok.
#>
param (
[string]$in,
[string]$outdir = [System.IO.Path]::GetDirectoryName($in),
[int]$count = 1000,
[string]$delimiter = ',',
[string[]]$header = @()
@peaeater
peaeater / text_suggest_edge.xml
Last active January 4, 2024 14:18
Text suggest edge Solr field type
<fieldType name="text_suggest_edge" class="solr.TextField">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([\.,;:-_])" replacement=" " replace="all"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="30" minGramSize="1"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
</analyzer>
<analyzer type="query">
@peaeater
peaeater / self-signed-cert.ps1
Created April 11, 2017 23:40
Creates new wildcard self-signed SSL certificate for development purposes. Needs PowerShell admin.
# Creates new self-signed certificate for testing purposes
new-selfsignedcertificate -dnsname "*.domain.local" -friendlyname "*.domain.local Development Certificate" -certstorelocation "cert:\LocalMachine\My" -notafter (get-date).AddYears(100)
@peaeater
peaeater / optimize-pdf.ps1
Created May 25, 2018 17:47
Downsamples PDFs with ghostscript.
<#
Downsample PDF and convert to gray if necessary.
Requires Ghostscript (gswin64c).
#>
param (
[string]$indir,
[string]$outdir = $indir,
[string]$gs = "gswin64c",
[string]$dpi = "150"
@peaeater
peaeater / pdf2png.ps1
Last active June 7, 2023 18:26
Converts PDF pages to PNGs with imagemagick.
# convert pdf to png
# requires imagemagick w/ ghostscript
param (
[Parameter(Mandatory=$true,ValueFromPipeline=$true,Position=0)]
[ValidateScript({[System.IO.Path]::GetExtension($_) -eq ".pdf"})]
[string]$in,
[string]$magick = "C:\utils\imagemagick\ImageMagick-7.1.1-Q16-HDRI\magick.exe"
)
@peaeater
peaeater / ocr.ps1
Last active June 7, 2023 18:16
OCRs image file to plain text with tesseract.
# ocr tif/png to txt
# requires tesseract
Param(
[string]$ext = "tif",
[string]$indir = ".",
[string]$outdir = $indir,
[string]$tesseract = "C:\utils\tesseract\tesseract.exe"
)
@peaeater
peaeater / restore.ps1
Created December 15, 2016 19:06
Powershell script to restore a SQL database from a backup file, with progress indicator.
<#
sqlps dependency
If module sqlps does not exist, install from:
Microsoft SQL Server 2016 Feature Pack (https://www.microsoft.com/en-us/download/details.aspx?id=52676)
- SQLSysClrTypes.msi
- SharedManagementObjects.msi
- PowershellTools.msi
#>
param(
@peaeater
peaeater / ftp-dir-to-remote.ps1
Created December 12, 2022 18:03
Powershell using WinSCP to sync directories over FTP with file mask.
<#
Sync a directory to remote server in FTP mode
Peter Tyrrell
#>
param(
[Parameter(Mandatory = $false, Position = 0)]
[string]$logsrc = "",
@peaeater
peaeater / alphanumeric-field-type.xml
Last active September 16, 2022 09:21
Alphanumeric field type for Solr which lowercases, removes leading articles, and forces numbers to sort numerically.
<fieldType name="alphaNumericSort" class="solr.TextField" sortMissingLast="false" omitNorms="true">
<analyzer>
<!-- KeywordTokenizer does no actual tokenizing, so the entire
input string is preserved as a single token
-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<!-- The LowerCase TokenFilter does what you expect, which can be
when you want your sorting to be case insensitive
-->
<filter class="solr.LowerCaseFilterFactory" />
@peaeater
peaeater / text-mirror.ps1
Last active June 22, 2022 18:45
Create a text file mirror from PDFs, requires poppler
<#
1. Leaf
Given a text file of PDF filenames, extract content from PDFs recursively
and create mirror directory structure for text file outputs.
* Handles filenames with entry separators.
* Ignores PDF older than its text file mirror unless -force param is used.
* Requires poppler pdftotext.exe
.\text-mirror.ps1 -in C:\dev\abc\extract\extracted\pdfs\abc-pdfs-1.txt `