public
Created

deserializing php in clojure - using instaparse and serialized-php-parser

  • Download Gist
deserialize_php.clj
Clojure
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
(ns deserialize-php
(:require [instaparse.core :as insta]))
 
(def serialized-php-parser
(insta/parser
"
<S> = expr
<expr> = (string | integer | double | boolean | null | array)+
<digit> = #'[0-9]'
<number> = negative* (decimal-num | integer-num)
<negative> = '-'
<integer-num> = digit+
<decimal-num> = integer-num '.' integer-num
<zero-or-one> = '0'|'1'
size = digit+
key = (string | integer)
<val> = expr
array = <'a:'> <size> <':{'> (key val)+ <'}'> <';'>?
boolean = <'b:'> zero-or-one <';'>
null = <'N;'>
integer = <'i:'> number <';'>
double = <'d:'> number <';'>
string = <'s:'> <size> <':\\\"'> #'([^\"]|\\.)*' <'\\\";'>
"))
 
(defn deserialize-php [data]
(first (insta/transform
{:key (fn [k] (if (string? k) (keyword k) k))
:string str
:null (constantly nil)
:boolean (partial = "1")
:integer (fn [& more] (Integer/parseInt (apply str more)))
:double (fn [& more] (Double/parseDouble (apply str more)))
:array (fn [& more]
(if (even? (count more))
(->> more
(partition 2)
(map vec)
(into {}))
(apply vector more)))}
(serialized-php-parser data))))

Make sure to have up to date clojure when using this library.

Instaparse requires Clojure v1.5.1 or later.

Took me a while to catch that, was stuck on this nondescript error:

ClassNotFoundException clojure.lang.IHashEq

There seems to be a bug in this library, it can't deserialize strings that contain " characters, ie:

s:15:"{"key":"value"}";

Posted a StackOverflow question seeking guidance, @terjesb do you happen to see a solution to this problem?

I'm not familiar with PHP, but poking around the web, it looks like PHP strings use escaping to nest double-quotes in double-quoted strings. So, I'd expect it to print more like this:
s:15:"{\"key\":\"value\"}";

Are you sure that
s:15:"{"key":"value"}";
is a valid serialized entity?

$ php -a
Interactive shell

php > echo serialize('{"key": "value"}');
s:16:"{"key": "value"}";

Using a context-free grammar like BNF, it is not possible to decode a length prefixed string, such as those used in PHP's serialize notation.

Credit to A. Webb in suggesting an alternative implementation.

@kochb: You are right, this Gist doesn't parse all valid serialized PHP or check for valid lengths etc. It also returns a map and not a vector for arrays. It still has value and works for me in parsing some very specific input. I just tried the alternative SO implementation mentioned above. That implementation currently lacks support for null (sphp-null does not work), so I assume you don't have that in your input. Since I'm not familiar with Parse-EZ I haven't been able to fix it, but I've added a comment describing the issue.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.