Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
deserializing php in clojure - using instaparse and serialized-php-parser
(ns deserialize-php
(:require [instaparse.core :as insta]))
(def serialized-php-parser
<S> = expr
<expr> = (string | integer | double | boolean | null | array)+
<digit> = #'[0-9]'
<number> = negative* (decimal-num | integer-num)
<negative> = '-'
<integer-num> = digit+
<decimal-num> = integer-num '.' integer-num
<zero-or-one> = '0'|'1'
size = digit+
key = (string | integer)
<val> = expr
array = <'a:'> <size> <':{'> (key val)+ <'}'> <';'>?
boolean = <'b:'> zero-or-one <';'>
null = <'N;'>
integer = <'i:'> number <';'>
double = <'d:'> number <';'>
string = <'s:'> <size> <':\\\"'> #'([^\"]|\\.)*' <'\\\";'>
(defn deserialize-php [data]
(first (insta/transform
{:key (fn [k] (if (string? k) (keyword k) k))
:string str
:null (constantly nil)
:boolean (partial = "1")
:integer (fn [& more] (Integer/parseInt (apply str more)))
:double (fn [& more] (Double/parseDouble (apply str more)))
:array (fn [& more]
(if (even? (count more))
(->> more
(partition 2)
(map vec)
(into {}))
(apply vector more)))}
(serialized-php-parser data))))

kochb commented Aug 27, 2013

Make sure to have up to date clojure when using this library.

Instaparse requires Clojure v1.5.1 or later.

Took me a while to catch that, was stuck on this nondescript error:

ClassNotFoundException clojure.lang.IHashEq

kochb commented Aug 29, 2013

There seems to be a bug in this library, it can't deserialize strings that contain " characters, ie:


Posted a StackOverflow question seeking guidance, @terjesb do you happen to see a solution to this problem?

I'm not familiar with PHP, but poking around the web, it looks like PHP strings use escaping to nest double-quotes in double-quoted strings. So, I'd expect it to print more like this:

Are you sure that
is a valid serialized entity?

kochb commented Aug 30, 2013

$ php -a
Interactive shell

php > echo serialize('{"key": "value"}');
s:16:"{"key": "value"}";

kochb commented Aug 31, 2013

Using a context-free grammar like BNF, it is not possible to decode a length prefixed string, such as those used in PHP's serialize notation.

Credit to A. Webb in suggesting an alternative implementation.


terjesb commented Oct 1, 2013

@kochb: You are right, this Gist doesn't parse all valid serialized PHP or check for valid lengths etc. It also returns a map and not a vector for arrays. It still has value and works for me in parsing some very specific input. I just tried the alternative SO implementation mentioned above. That implementation currently lacks support for null (sphp-null does not work), so I assume you don't have that in your input. Since I'm not familiar with Parse-EZ I haven't been able to fix it, but I've added a comment describing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment