Skip to content

Instantly share code, notes, and snippets.

@Hugoberry
Last active March 20, 2017 23:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Hugoberry/ba0ef542d89e80562718f72e9ce289b7 to your computer and use it in GitHub Desktop.
Save Hugoberry/ba0ef542d89e80562718f72e9ce289b7 to your computer and use it in GitHub Desktop.
Replicating R unz function in Power Query. Based on http://sql10.blogspot.co.uk/2016/06/reading-zip-files-in-powerquery-m.html
(binaryZip,fileName) =>
let
//shorthand
UInt32 = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32,ByteOrder.LittleEndian),
UInt16 = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16,ByteOrder.LittleEndian),
//ZIP file header fixed size structure
Header = BinaryFormat.Record([
MiscHeader = BinaryFormat.Binary(14),
CompressedSize = UInt32,
UncompressedSize = UInt32,
FileNameLen = UInt16,
ExtraFieldLen = UInt16]),
//ZIP file header dynamic size structure
FileData = (h)=> BinaryFormat.Record([
FileName = BinaryFormat.Text(h[FileNameLen]),
ExtraField = BinaryFormat.Text(h[ExtraFieldLen]),
UncompressedData = BinaryFormat.Transform(
BinaryFormat.Binary(h[CompressedSize]),
(x) => try
Binary.Buffer(Binary.Decompress(x, Compression.Deflate))
otherwise null)]),
//Parsing the binary in search for PKZIP header signature
ZipIterator = BinaryFormat.Choice(UInt32, (signature) => if signature <> 0x04034B50
then BinaryFormat.Record([FileName=null])
else BinaryFormat.Choice(Header,(z)=>FileData(z))),
ZipFormat = BinaryFormat.List(ZipIterator),
out = List.Select(ZipFormat(binaryZip), each _[FileName]=fileName)
in
out{0}[UncompressedData]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment