Skip to content

Embed URL

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
deserializing php in clojure - using instaparse and serialized-php-parser
(ns deserialize-php
(:require [instaparse.core :as insta]))
(def serialized-php-parser
(insta/parser
"
<S> = expr
<expr> = (string | integer | double | boolean | null | array)+
<digit> = #'[0-9]'
<number> = negative* (decimal-num | integer-num)
<negative> = '-'
<integer-num> = digit+
<decimal-num> = integer-num '.' integer-num
<zero-or-one> = '0'|'1'
size = digit+
key = (string | integer)
<val> = expr
array = <'a:'> <size> <':{'> (key val)+ <'}'> <';'>?
boolean = <'b:'> zero-or-one <';'>
null = <'N;'>
integer = <'i:'> number <';'>
double = <'d:'> number <';'>
string = <'s:'> <size> <':\\\"'> #'([^\"]|\\.)*' <'\\\";'>
"))
(defn deserialize-php [data]
(first (insta/transform
{:key (fn [k] (if (string? k) (keyword k) k))
:string str
:null (constantly nil)
:boolean (partial = "1")
:integer (fn [& more] (Integer/parseInt (apply str more)))
:double (fn [& more] (Double/parseDouble (apply str more)))
:array (fn [& more]
(if (even? (count more))
(->> more
(partition 2)
(map vec)
(into {}))
(apply vector more)))}
(serialized-php-parser data))))
@kochb

Make sure to have up to date clojure when using this library.

Instaparse requires Clojure v1.5.1 or later.

Took me a while to catch that, was stuck on this nondescript error:

ClassNotFoundException clojure.lang.IHashEq

@kochb

There seems to be a bug in this library, it can't deserialize strings that contain " characters, ie:

s:15:"{"key":"value"}";

Posted a StackOverflow question seeking guidance, @terjesb do you happen to see a solution to this problem?

@Engelberg

I'm not familiar with PHP, but poking around the web, it looks like PHP strings use escaping to nest double-quotes in double-quoted strings. So, I'd expect it to print more like this:
s:15:"{\"key\":\"value\"}";

Are you sure that
s:15:"{"key":"value"}";
is a valid serialized entity?

@kochb
$ php -a
Interactive shell

php > echo serialize('{"key": "value"}');
s:16:"{"key": "value"}";
@kochb

Using a context-free grammar like BNF, it is not possible to decode a length prefixed string, such as those used in PHP's serialize notation.

Credit to A. Webb in suggesting an alternative implementation.

@terjesb
Owner

@kochb: You are right, this Gist doesn't parse all valid serialized PHP or check for valid lengths etc. It also returns a map and not a vector for arrays. It still has value and works for me in parsing some very specific input. I just tried the alternative SO implementation mentioned above. That implementation currently lacks support for null (sphp-null does not work), so I assume you don't have that in your input. Since I'm not familiar with Parse-EZ I haven't been able to fix it, but I've added a comment describing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.