Skip to content
Create a gist now

Instantly share code, notes, and snippets.

deserializing php in clojure - using instaparse and serialized-php-parser
(ns deserialize-php
(:require [instaparse.core :as insta]))
(def serialized-php-parser
<S> = expr
<expr> = (string | integer | double | boolean | null | array)+
<digit> = #'[0-9]'
<number> = negative* (decimal-num | integer-num)
<negative> = '-'
<integer-num> = digit+
<decimal-num> = integer-num '.' integer-num
<zero-or-one> = '0'|'1'
size = digit+
key = (string | integer)
<val> = expr
array = <'a:'> <size> <':{'> (key val)+ <'}'> <';'>?
boolean = <'b:'> zero-or-one <';'>
null = <'N;'>
integer = <'i:'> number <';'>
double = <'d:'> number <';'>
string = <'s:'> <size> <':\\\"'> #'([^\"]|\\.)*' <'\\\";'>
(defn deserialize-php [data]
(first (insta/transform
{:key (fn [k] (if (string? k) (keyword k) k))
:string str
:null (constantly nil)
:boolean (partial = "1")
:integer (fn [& more] (Integer/parseInt (apply str more)))
:double (fn [& more] (Double/parseDouble (apply str more)))
:array (fn [& more]
(if (even? (count more))
(->> more
(partition 2)
(map vec)
(into {}))
(apply vector more)))}
(serialized-php-parser data))))
kochb commented Aug 27, 2013

Make sure to have up to date clojure when using this library.

Instaparse requires Clojure v1.5.1 or later.

Took me a while to catch that, was stuck on this nondescript error:

ClassNotFoundException clojure.lang.IHashEq

kochb commented Aug 29, 2013

There seems to be a bug in this library, it can't deserialize strings that contain " characters, ie:


Posted a StackOverflow question seeking guidance, @terjesb do you happen to see a solution to this problem?


I'm not familiar with PHP, but poking around the web, it looks like PHP strings use escaping to nest double-quotes in double-quoted strings. So, I'd expect it to print more like this:

Are you sure that
is a valid serialized entity?

kochb commented Aug 30, 2013
$ php -a
Interactive shell

php > echo serialize('{"key": "value"}');
s:16:"{"key": "value"}";
kochb commented Aug 31, 2013

Using a context-free grammar like BNF, it is not possible to decode a length prefixed string, such as those used in PHP's serialize notation.

Credit to A. Webb in suggesting an alternative implementation.

terjesb commented Oct 1, 2013

@kochb: You are right, this Gist doesn't parse all valid serialized PHP or check for valid lengths etc. It also returns a map and not a vector for arrays. It still has value and works for me in parsing some very specific input. I just tried the alternative SO implementation mentioned above. That implementation currently lacks support for null (sphp-null does not work), so I assume you don't have that in your input. Since I'm not familiar with Parse-EZ I haven't been able to fix it, but I've added a comment describing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.