-
-
Save terjesb/5910504 to your computer and use it in GitHub Desktop.
(ns deserialize-php | |
(:require [instaparse.core :as insta])) | |
(def serialized-php-parser | |
(insta/parser | |
" | |
<S> = expr | |
<expr> = (string | integer | double | boolean | null | array)+ | |
<digit> = #'[0-9]' | |
<number> = negative* (decimal-num | integer-num) | |
<negative> = '-' | |
<integer-num> = digit+ | |
<decimal-num> = integer-num '.' integer-num | |
<zero-or-one> = '0'|'1' | |
size = digit+ | |
key = (string | integer) | |
<val> = expr | |
array = <'a:'> <size> <':{'> (key val)+ <'}'> <';'>? | |
boolean = <'b:'> zero-or-one <';'> | |
null = <'N;'> | |
integer = <'i:'> number <';'> | |
double = <'d:'> number <';'> | |
string = <'s:'> <size> <':\\\"'> #'([^\"]|\\.)*' <'\\\";'> | |
")) | |
(defn deserialize-php [data] | |
(first (insta/transform | |
{:key (fn [k] (if (string? k) (keyword k) k)) | |
:string str | |
:null (constantly nil) | |
:boolean (partial = "1") | |
:integer (fn [& more] (Integer/parseInt (apply str more))) | |
:double (fn [& more] (Double/parseDouble (apply str more))) | |
:array (fn [& more] | |
(if (even? (count more)) | |
(->> more | |
(partition 2) | |
(map vec) | |
(into {})) | |
(apply vector more)))} | |
(serialized-php-parser data)))) |
There seems to be a bug in this library, it can't deserialize strings that contain "
characters, ie:
s:15:"{"key":"value"}";
Posted a StackOverflow question seeking guidance, @terjesb do you happen to see a solution to this problem?
I'm not familiar with PHP, but poking around the web, it looks like PHP strings use escaping to nest double-quotes in double-quoted strings. So, I'd expect it to print more like this:
s:15:"{\"key\":\"value\"}";
Are you sure that
s:15:"{"key":"value"}";
is a valid serialized entity?
$ php -a
Interactive shell
php > echo serialize('{"key": "value"}');
s:16:"{"key": "value"}";
Using a context-free grammar like BNF, it is not possible to decode a length prefixed string, such as those used in PHP's serialize notation.
Credit to A. Webb in suggesting an alternative implementation.
@kochb: You are right, this Gist doesn't parse all valid serialized PHP or check for valid lengths etc. It also returns a map and not a vector for arrays. It still has value and works for me in parsing some very specific input. I just tried the alternative SO implementation mentioned above. That implementation currently lacks support for null (sphp-null does not work), so I assume you don't have that in your input. Since I'm not familiar with Parse-EZ I haven't been able to fix it, but I've added a comment describing the issue.
Make sure to have up to date clojure when using this library.
Took me a while to catch that, was stuck on this nondescript error: