Created
July 2, 2013 15:51
-
-
Save terjesb/5910504 to your computer and use it in GitHub Desktop.
deserializing php in clojure - using instaparse and serialized-php-parser
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(ns deserialize-php | |
(:require [instaparse.core :as insta])) | |
(def serialized-php-parser | |
(insta/parser | |
" | |
<S> = expr | |
<expr> = (string | integer | double | boolean | null | array)+ | |
<digit> = #'[0-9]' | |
<number> = negative* (decimal-num | integer-num) | |
<negative> = '-' | |
<integer-num> = digit+ | |
<decimal-num> = integer-num '.' integer-num | |
<zero-or-one> = '0'|'1' | |
size = digit+ | |
key = (string | integer) | |
<val> = expr | |
array = <'a:'> <size> <':{'> (key val)+ <'}'> <';'>? | |
boolean = <'b:'> zero-or-one <';'> | |
null = <'N;'> | |
integer = <'i:'> number <';'> | |
double = <'d:'> number <';'> | |
string = <'s:'> <size> <':\\\"'> #'([^\"]|\\.)*' <'\\\";'> | |
")) | |
(defn deserialize-php [data] | |
(first (insta/transform | |
{:key (fn [k] (if (string? k) (keyword k) k)) | |
:string str | |
:null (constantly nil) | |
:boolean (partial = "1") | |
:integer (fn [& more] (Integer/parseInt (apply str more))) | |
:double (fn [& more] (Double/parseDouble (apply str more))) | |
:array (fn [& more] | |
(if (even? (count more)) | |
(->> more | |
(partition 2) | |
(map vec) | |
(into {})) | |
(apply vector more)))} | |
(serialized-php-parser data)))) |
@kochb: You are right, this Gist doesn't parse all valid serialized PHP or check for valid lengths etc. It also returns a map and not a vector for arrays. It still has value and works for me in parsing some very specific input. I just tried the alternative SO implementation mentioned above. That implementation currently lacks support for null (sphp-null does not work), so I assume you don't have that in your input. Since I'm not familiar with Parse-EZ I haven't been able to fix it, but I've added a comment describing the issue.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Using a context-free grammar like BNF, it is not possible to decode a length prefixed string, such as those used in PHP's serialize notation.
Credit to A. Webb in suggesting an alternative implementation.