Create a gist now

Instantly share code, notes, and snippets.

Post explaining why objects often use less memory than arrays (in PHP)

Why objects (usually) use less memory than arrays in PHP

This is just a small post in response to this tweet by Julien Pauli (who by the way is the release manager for PHP 5.5). In the tweet he claims that objects use more memory than arrays in PHP. Even though it can be like that, it's not true in most cases. (Note: This only applies to PHP 5.4 or newer.)

The reason why it's easy to assume that objects are larger than arrays is because objects can be seen as an array of properties and a bit of additional information (like the class it belongs to). And as array + additional info > array it obviously follows that objects are larger. The thing is that in most cases PHP can optimize the array part of it away. So how does that work?

The key here is that objects usually have a predefined set of keys, whereas arrays don't:

<?php
class Test {
    public $foo, $bar, $baz; // <-- Predefined keys

    public function __construct($foo, $bar, $baz) {
        $this->foo = $foo;
        $this->bar = $bar;
        $this->baz = $baz;
    }
}

$obj = new Test(1, 2, 3);
$arr = ['foo' => 1, 'bar' => 2, 'baz' => 3]; // <-- No predefined keys

Because the properties for the object are predefined PHP no longer has to store the data in a hashtable, but instead can say that $foo is proprety 0, $bar is proprety 1, $baz is property 2 and then just store the properties in a three-element C array.

This means that PHP only needs one hashtable in the class that does the property-name to offset mapping and uses a memory-efficient C-array in the individual objects. Arrays on the other hand need the hashtable for every array.

To give you some numbers, let's quickly compare the different structures used by arrays and objects.

For arrays there are the HashTable structure (one per array) and the Bucket structure (one per element):

typedef struct _hashtable {
    uint nTableSize;
    uint nTableMask;
    uint nNumOfElements;
    ulong nNextFreeElement;
    Bucket *pInternalPointer;
    Bucket *pListHead;
    Bucket *pListTail;
    Bucket **arBuckets;
    dtor_func_t pDestructor;
    zend_bool persistent;
    unsigned char nApplyCount;
    zend_bool bApplyProtection;
} HashTable;

typedef struct bucket {
    ulong h;
    uint nKeyLength;
    void *pData;
    void *pDataPtr;
    struct bucket *pListNext;
    struct bucket *pListLast;
    struct bucket *pNext;
    struct bucket *pLast;
    const char *arKey;
} Bucket;

Assuming a 64-bit build both the HashTable and the Bucket use 8*9 + 16 = 88 bytes each (the 16 bytes are allocation overhead). Furthermore buckets need an additional 8 bytes for a pointer from the arBuckets array (actually it's a bit more due to power-of-two rounding). And due to the allocation overhead for arBuckets the hashtable get's another 16 bytes extra. All in all, for an array with n elements you need approximately 104 + 96*n bytes (which is a freaking lot if you think about it).

For (userland) objects there are also two structures. The first is the object store bucket and the second is the actual zend_object:

typedef struct _zend_object_store_bucket {
    zend_bool destructor_called;
    zend_bool valid;
    zend_uchar apply_count;
    union _store_bucket {
        struct _store_object {
            void *object;
            zend_objects_store_dtor_t dtor;
            zend_objects_free_object_storage_t free_storage;
            zend_objects_store_clone_t clone;
            const zend_object_handlers *handlers;
            zend_uint refcount;
            gc_root_buffer *buffered;
        } obj;
        struct {
            int next;
        } free_list;
    } bucket;
} zend_object_store_bucket;

typedef struct _zend_object {
    zend_class_entry *ce;
    HashTable *properties;   // <-- not usually used
    zval **properties_table;
    HashTable *guards;       // <-- not usually used
} zend_object;

The object store bucket needs 8*8 = 64 bytes (note that here there are no 16 bytes allocation overhead, because the object store is mass allocated). The zend_object needs another 4*8 + 16 = 48. Furthermore we need 16 bytes as allocation overhead for the properties_table and then 8 bytes per element in it. (The properties_table here obviously is the C-array I referred to above. This is what stores the property data). So what you get in the end is 128 + 8*n.

Now compare those two values: 104 + 96*n for arrays and 128 + 8*n for objects. As you can see the "base size" for objects is larger, but the per-property cost is twelve times smaller. A few examples (with different amount of properties):

N  | Array | Object
------------------
1  |  200  | 136
3  |  392  | 152
10 | 1064  | 208

It should be clear that arrays use quite a bit more memory and the difference gets larger the more properties you have.

Note though that in the above I have been considering objects with declared properties. PHP also allows "dynamic" properties (e.g. what stdClass lives off). In this case there is no way around using a hashtable (stored in zend_object.properties). Another case where hashtables are used is if the class uses __get-style magic. These magic property methods use recursion guards which are stored in the zend_object.guards hashtable.

Okay, so what do we conclude from this? Some points:

  • Upgrade to PHP 5.4 if you haven't yet! PHP 5.3 doesn't yet have this cool optimization.
  • Declaring properties isn't just a best practice for class design, it will actually also save you a good bit of memory.
  • Not using objects because they are "too heavy on the memory" is dumb. At least if arrays are the alternative.

And two more interesting (or maybe not) facts that are tangentially related:

  • The very same optimization is also used for symbol tables. Most of the time PHP will not actually create hashtables that contain your variables, instead it will just use a C-array with the variables. Only if you use things like variable-variables PHP will create a real symbol hashtable.
  • When looking up a property PHP often doesn't even have to access the hashtable containing the property-name to offset mappings. The property_info structure that contains the relevant information is polymorphically cached in the op array.

~nikic

@marijn
marijn commented Feb 22, 2013

This was very insightful, thanks!

@Xeoncross
  • What about lookup cost? (Searching an array hashtable vs searching the object property list to get the id and using that to find the value in the object value array?)
  • What about the number of keys used?
  • How long before someone figures out how do do this with arrays?
@jorisvandesande

Nice write up of some php internals, thanks!

@miraage
miraage commented Feb 26, 2013

Interesting, thanks you.

@ThePixelDeveloper

Cool. I'll be linking everyone to this when they bang on about arrays using less memory :P

@barryosull

Great write up. There's also the issue of using objects as parameters instead of arrays. When you pass an object as a parameter, you actually pass a reference back to the array. When you pass an array as a parameter, PHP duplicates the array, thus doubling the memory the array was using.

@uioreanu

we recently had to downgrade several production servers because of the incredible worse performance of 5.4.1 compared to previous versions. A simple test script loading a massive array into memory run 20 times slower on newer PHP versions - very disappointing. 5.2.17 is now the "upgraded" version that is still usable.

@MarkBaker

@xeoncross

How long before someone figures out how do this with arrays?

SPL library already has SPLFixedArray

@francisbesset

👍

@am06
am06 commented Mar 29, 2013

cool!

@tfont
tfont commented Oct 1, 2013

Wicked considerable read! Well elaborated on such a debatable subject.

@weierophinney

@barryosull PHP doesn't automatically copy an array when it is passed to a function. Internally, it uses a "copy-on-write" algorithm -- in other words, it will only copy the array if/when the function modifies it. As such, if you're only reading from the array, there's no additional overhead.

@wolffereast

Clear, concise, and extremely interesting. Thanks for the writeup

@webdevilopers

👍

@mattsparks

Good stuff!

@sc0ttkclark

Just cross-posting this from my other comment on a gist focused on memory+speed.

https://gist.github.com/Thinkscape/1136563#gistcomment-1561237

From PHP 5.6.12

# php -v
PHP 5.6.12 (cli) (built: Aug  6 2015 17:14:56) 
Copyright (c) 1997-2015 The PHP Group
Zend Engine v2.6.0, Copyright (c) 1998-2015 Zend Technologies
    with Zend OPcache v7.0.6-dev, Copyright (c) 1999-2015, by Zend Technologies

### PHP Array memory usage ###
# echo '<?php $s = array(); for($x=0;$x<1000;$x++){ $s[] = array("name"=>"Adam","age"=>35); }; echo memory_get_peak_usage(); ' | php
807336

### PHP ArrayObject() memory usage (no properties defined) ###
# echo '<?php $s = array(); for($x=0;$x<1000;$x++){ $o = new ArrayObject; $o->name = "Adam";  $o->age = 35;  $s[] = $o;} echo memory_get_peak_usage(); ' | php
1144440

### PHP MyArrayObject() memory usage (properties defined) ###
# echo '<?php $s = array(); class MyArrayObject{ public $name, $age; public function __construct( $name, $age ) { $this->name = $name; $this->age = $age; } } for( $x=0;$x<1000;$x++){ $o = new MyArrayObject( 'Adam', 35 );  $s[] = $o;} echo memory_get_peak_usage(); ' | php
583600


### PHP Array memory usage + time (no properties defined) ###
# time echo '<?php $s = array(); for($x=0;$x<100000;$x++){ $s[] = array("name"=>"Adam","age"=>35); }; echo memory_get_peak_usage(); ' | php
58871792
real    0m0.193s
user    0m0.087s
sys 0m0.107s

### PHP ArrayObject() memory usage + time (no properties defined) ###
# time echo '<?php $s = array(); for($x=0;$x<100000;$x++){ $o = new ArrayObject; $o->name = "Adam";  $o->age = 35;  $s[] = $o;} echo memory_get_peak_usage(); ' | php
100801296
real    0m0.320s
user    0m0.153s
sys 0m0.163s

### PHP MyArrayObject() memory usage + time (properties defined) ###
# time echo '<?php $s = array(); class MyArrayObject{ public $name, $age; public function __construct( $name, $age ) { $this->name = $name; $this->age = $age; } } for( $x=0;$x<100000;$x++){ $o = new MyArrayObject( 'Adam', 35 );  $s[] = $o;} echo memory_get_peak_usage(); ' | php
44004544
real    0m2.209s
user    0m1.007s
sys 0m1.190s

We can conclude the following at least in PHP 5.6:

  • Winner for Speed: Arrays are faster than objects (with undefined or defined properties)
  • Winner for Memory: Objects with defined properties use less memory, 50%+ less than objects with undefined properties, only slightly less than Arrays

Ordered Fastest to Slowest in Speed for 100,000 items:

  • 0.107s - Arrays
  • 0.163s - Objects with undefined properties
  • 1.190s - Objects with defined properties

Ordered Least to Most Memory Usage for 1,000 items:

  • ~583kb - Objects with defined properties
  • ~807kb - Arrays
  • ~1.1mb - Objects with undefined properties

I ran these from Mac OSX terminal with local PHP, so the speeds aren't going to be the same as on a real server with ongoing load.

@KorvinSzanto

Just for an example of this after instantiating 10,000 objects, with just one property on the class it's a difference of several megabytes of memory usage!

Property not declared vs Property declared

@Bilge
Bilge commented Jan 13, 2016

Not using objects because they are "too heavy on the memory" is dumb.

I love nikic, he is so kawaii.

@JDGrimes

The speed difference for arrays vs objects seems to be much, much less in PHP 7.0 (scroll down to the second half of that comment).

@MZAWeb
MZAWeb commented Apr 3, 2016

@nikic is this still true on PHP 7+?

@LC43
LC43 commented Dec 20, 2016

Hi everyone. i'm a bit confused, maybe because i have been up all night coding, but...

i think this

"And as array + additional info > array"

should be " object + additional info" and that

"The thing is that in most cases PHP can optimize the array"

should end in "object" too.

buckets need an additional 8 bytes for a pointer from the arBuckets

i think it should be "hashtables",

right?

thanks,

@LC43
LC43 commented Dec 20, 2016 edited

also, @MZAWeb, you probably have already found that the hashtable:

In PHP 7 the value is down to 36 bytes, or 32 bytes for the packed case.

-- https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html

@rgmeow
rgmeow commented Jan 18, 2017

The rub of course in using objects in PHP .vs. arrays in many real world cases revolves around collections of data. That is to say, if one has a dataset of 1000 complex structures in OOP these use cases revolve around collections and the ability to select from said collection(s). In PHP one often see's object oriented code that results in arrays of objects.

Since PHP is built atop C++ it will never be able to continue survive the evolution of the technology itself, use cases coming and then some.

The future of the Internet is literally pivoting atop the "do anything anywhere on any device" albeit presently in early stages. Whether watching TV, playing complex console games or placing web phone calls and a ton inbetween the ability of computer languages to perform, scale and deliver become more and more paramount. Languages built atop general purpose compiled programming languages will and do suffer loosing efficiency at literally every stage. PHP can never for example match C++ which can run on the order of 4000% faster given same tasking as a straight up command line PHP application.

Without enormous changes to PHP providing native executable code that targets runtime server architectures and environments its future is already actually sealed. Java has remained king in enterprise IP enabled applications due to its CPU targeted runtime JIT engine. Where it has lagged for most start-up developers is the inherent education needed to write good Java code. The Zend team for example writes good C++ code resulting in PHP being a language capable of moderate application usage.

Years on back it was impossible for example to write say PC Games in BASIC as and thus assembler and C++ filled the use cases for the projects. Now, proprietary sets of "engines" (frameworks) power much of the console gaming universe with scripting and glue assemblies providing the ability for high performance given nearly any architecture as the devices become true smart clients and the servers become the runtime architectures.

That is the future.

Windows and the Mac OS are rapidly being moved towards a online client/server based software architecture enabling device independence.

PHP simply does not have the facets necessary to support this at any form of reasonable scale given a world connected in.

In fact, this is exactly why Microsoft open sourced so much of their prize Gem of the .NET framework and why Mono exists. For near 20 years now they have been methodically refining their entire development platform. Visual Basic was essentially shelled over the top of C# to provide a path for businesses who'd relied greatly on VB. C++ a bastion of core ability to actually target CPU architectures becoming an equitable language to help produce their JIT and MSIL runtime environment while in parallel developing not only the most refined IDE environment but also the most robust framework that exists in .NET with C# becoming the flagship.

While for years one end the mass of the Microsoft environment (and cost to obtain it) set it unavailable to the lions share of small startup projects PHP thrived. It became the unit of choice. Easy access, easy to learn, loosely typed, all of it. PHP filled the vacuum if you will as they choices were Java, PHP or ASP.NET. The Microsoft platform was bloated, slower and a considerably larger learn than PHP and in many respects even Java.

That has now all changed. The Microsoft platform is faster than both Java or PHP given complex applications. While Java is slightly more memory efficient thats where it ends. PHP is no longer even in the ballpark in performance or resource usage even with PHP 7. PHP 7 exists because if Zend did not do something to vamp performance its fate becomes sealed much faster. C# is on the order of 400% faster without attempting optimize a thing. The main reason, it targets the server (or PC, or mobile etc) architecture directly producing low level MSIL code that very very closely resembles machine code native to the central processor. The abilities to use multiple cores, parallel process and much more place it now at the head of the pack.

This did not happen via happenstance, none of it. Microsoft has been moving this way for near 20 years and had braod support from governments across the globe in doing so. For those entities, they desire one mainstream technology platform as that affords the capability to control it. Linux was clearly not going away and that is why the Mono project came to be and why Microsoft open sourced the lions share of their Gem, the .NET framework which for ages moved ahead under a propreitary nature with its source code unavilable.

The reality, perhaps sadly or not is now Microsoft is literally the only company that has the software technologies to natively bridge all smart client devices coming forth. When commercial entities come along that have better targeted mousetraps that gain acceptance not only does Microsoft jump in but they help fund them.

NodeJS became a game changer. While PHP coders ,often say "It does not impact PHP" the reality is NodeJS is another nail in PHP's grave. It again moves towards the smart client and efficient server based architecture. While Zend can attempt to move towards native scalable executables they are in fact facing a wall they cannot overcome in the actual server software architectures. Its not just PHP but also the server stack. The concepts of for example how Apache and similar web server packages work are brick walls towards PHP's ability to advance as well.

In the Microsoft universe several very high targted high performance open projects are well underway now and viable PHP money making prospects continue to dry up. People point to Wordpress which is an anomoly. An anomoly trying to pivot towards appearing cutting edge while under its hood it struggles in a mass of spaghetti functions built atop how PHP performed some 10 years+ ago.

Masses of developers trying monetize now on what was free as the available monies from webmasters begin to create the race to the bottom of monetizing whilst technology moves ahead that will make advanced web publishing as easy as using an old Windows 98 desktop publishing package on steroids with drag and drop assemblies, properties and associations.

The internet is being driven towards a "new internet" where barriers such as languages are breached, talking websites, and unified nation regulateble access exists using any smart client device. While the capability to access programming materials for the layman to work with it are quite available right now, the core transports for large scale are quite proprietary and the server based technologies needed to service it are and will be kept priced out of most small operation more or less home based or handfull of people operations.

Again, its not happenstance, its not random events. There has been a methodical movement going fowards and it continues.

All things change and on the Internet its all about the consumer of web pages and the technologies. They will completely embrace the marvels coming which make what we have been accustomed to appear like DOS vs Windows did.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment