Skip to content

Instantly share code, notes, and snippets.

@nikic
Last active December 5, 2024 16:13
Show Gist options
  • Save nikic/5015323 to your computer and use it in GitHub Desktop.
Save nikic/5015323 to your computer and use it in GitHub Desktop.
Post explaining why objects often use less memory than arrays (in PHP)

Why objects (usually) use less memory than arrays in PHP

This is just a small post in response to this tweet by Julien Pauli (who by the way is the release manager for PHP 5.5). In the tweet he claims that objects use more memory than arrays in PHP. Even though it can be like that, it's not true in most cases. (Note: This only applies to PHP 5.4 or newer.)

The reason why it's easy to assume that objects are larger than arrays is because objects can be seen as an array of properties and a bit of additional information (like the class it belongs to). And as array + additional info > array it obviously follows that objects are larger. The thing is that in most cases PHP can optimize the array part of it away. So how does that work?

The key here is that objects usually have a predefined set of keys, whereas arrays don't:

<?php
class Test {
    public $foo, $bar, $baz; // <-- Predefined keys
    
    public function __construct($foo, $bar, $baz) {
        $this->foo = $foo;
        $this->bar = $bar;
        $this->baz = $baz;
    }
}

$obj = new Test(1, 2, 3);
$arr = ['foo' => 1, 'bar' => 2, 'baz' => 3]; // <-- No predefined keys

Because the properties for the object are predefined PHP no longer has to store the data in a hashtable, but instead can say that $foo is proprety 0, $bar is proprety 1, $baz is property 2 and then just store the properties in a three-element C array.

This means that PHP only needs one hashtable in the class that does the property-name to offset mapping and uses a memory-efficient C-array in the individual objects. Arrays on the other hand need the hashtable for every array.

To give you some numbers, let's quickly compare the different structures used by arrays and objects.

For arrays there are the HashTable structure (one per array) and the Bucket structure (one per element):

typedef struct _hashtable {
    uint nTableSize;
    uint nTableMask;
    uint nNumOfElements;
    ulong nNextFreeElement;
    Bucket *pInternalPointer;
    Bucket *pListHead;
    Bucket *pListTail;
    Bucket **arBuckets;
    dtor_func_t pDestructor;
    zend_bool persistent;
    unsigned char nApplyCount;
    zend_bool bApplyProtection;
} HashTable;

typedef struct bucket {
    ulong h;
    uint nKeyLength;
    void *pData;
    void *pDataPtr;
    struct bucket *pListNext;
    struct bucket *pListLast;
    struct bucket *pNext;
    struct bucket *pLast;
    const char *arKey;
} Bucket;

Assuming a 64-bit build both the HashTable and the Bucket use 8*9 + 16 = 88 bytes each (the 16 bytes are allocation overhead). Furthermore buckets need an additional 8 bytes for a pointer from the arBuckets array (actually it's a bit more due to power-of-two rounding). And due to the allocation overhead for arBuckets the hashtable get's another 16 bytes extra. All in all, for an array with n elements you need approximately 104 + 96*n bytes (which is a freaking lot if you think about it).

For (userland) objects there are also two structures. The first is the object store bucket and the second is the actual zend_object:

typedef struct _zend_object_store_bucket {
    zend_bool destructor_called;
    zend_bool valid;
    zend_uchar apply_count;
    union _store_bucket {
        struct _store_object {
            void *object;
            zend_objects_store_dtor_t dtor;
            zend_objects_free_object_storage_t free_storage;
            zend_objects_store_clone_t clone;
            const zend_object_handlers *handlers;
            zend_uint refcount;
            gc_root_buffer *buffered;
        } obj;
        struct {
            int next;
        } free_list;
    } bucket;
} zend_object_store_bucket;

typedef struct _zend_object {
    zend_class_entry *ce;
    HashTable *properties;   // <-- not usually used
    zval **properties_table;
    HashTable *guards;       // <-- not usually used
} zend_object;

The object store bucket needs 8*8 = 64 bytes (note that here there are no 16 bytes allocation overhead, because the object store is mass allocated). The zend_object needs another 4*8 + 16 = 48. Furthermore we need 16 bytes as allocation overhead for the properties_table and then 8 bytes per element in it. (The properties_table here obviously is the C-array I referred to above. This is what stores the property data). So what you get in the end is 128 + 8*n.

Now compare those two values: 104 + 96*n for arrays and 128 + 8*n for objects. As you can see the "base size" for objects is larger, but the per-property cost is twelve times smaller. A few examples (with different amount of properties):

N  | Array | Object
------------------
1  |  200  | 136
3  |  392  | 152
10 | 1064  | 208

It should be clear that arrays use quite a bit more memory and the difference gets larger the more properties you have.

Note though that in the above I have been considering objects with declared properties. PHP also allows "dynamic" properties (e.g. what stdClass lives off). In this case there is no way around using a hashtable (stored in zend_object.properties). Another case where hashtables are used is if the class uses __get-style magic. These magic property methods use recursion guards which are stored in the zend_object.guards hashtable.

Okay, so what do we conclude from this? Some points:

  • Upgrade to PHP 5.4 if you haven't yet! PHP 5.3 doesn't yet have this cool optimization.
  • Declaring properties isn't just a best practice for class design, it will actually also save you a good bit of memory.
  • Not using objects because they are "too heavy on the memory" is dumb. At least if arrays are the alternative.

And two more interesting (or maybe not) facts that are tangentially related:

  • The very same optimization is also used for symbol tables. Most of the time PHP will not actually create hashtables that contain your variables, instead it will just use a C-array with the variables. Only if you use things like variable-variables PHP will create a real symbol hashtable.
  • When looking up a property PHP often doesn't even have to access the hashtable containing the property-name to offset mappings. The property_info structure that contains the relevant information is polymorphically cached in the op array.

~nikic

@MZAWeb
Copy link

MZAWeb commented Apr 3, 2016

@nikic is this still true on PHP 7+?

@LC43
Copy link

LC43 commented Dec 20, 2016

Hi everyone. i'm a bit confused, maybe because i have been up all night coding, but...

i think this

"And as array + additional info > array"

should be " object + additional info" and that

"The thing is that in most cases PHP can optimize the array"

should end in "object" too.

buckets need an additional 8 bytes for a pointer from the arBuckets

i think it should be "hashtables",

right?

thanks,

@LC43
Copy link

LC43 commented Dec 20, 2016

also, @MZAWeb, you probably have already found that the hashtable:

In PHP 7 the value is down to 36 bytes, or 32 bytes for the packed case.

-- https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html

@vonhraban
Copy link

I'd love to see similar stats for PHP7

@saurabhchandrapatel
Copy link

great

@drupalmk
Copy link

What About ArrayObject vs array?

@cre-mer
Copy link

cre-mer commented Sep 26, 2019

Great article!

@erlangparasu
Copy link

Thanks!

@deleugpn
Copy link

deleugpn commented Feb 24, 2020

I just found this gist and was running @sc0ttkclark scripts against PHP 7.3 and I see arrays taking LESS memory than objects with defined properties?

/app # php -v
PHP 7.3.14 (cli) (built: Jan 27 2020 21:40:40) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.14, Copyright (c) 1998-2018 Zend Technologies

/app # echo '<?php $s = array(); for($x=0;$x<1000;$x++){ $s[] = array("name"=>"Adam","age"=>35); }; echo memory_get_peak_usage() . PHP_EOL;' | php
424136

/app # echo '<?php $s = array(); class MyArrayObject{ public $name, $age; public function __construct( $name, $age ) { $this->name = $name; $this->age = $age; } } for( $x=0;$x<1000;$x++){ $o
 = new MyArrayObject( "Adam", 35 );  $s[] = $o;} echo memory_get_peak_usage() . PHP_EOL; ' | php
502696

Arrays seem to be spending nearly 15% less memory than the object, which seems to contradict the gist. What changed in recent PHP that made arrays more efficient? Is there a way to get objects to be more efficient?

cc @nikic

@nikic
Copy link
Author

nikic commented Feb 24, 2020

@deleugpn You're using the same array for each element, but different objects. You need to include something dynamic inside the array to measure realistic memory usage.

@deleugpn
Copy link

I came back here to reply to my own comment, but Nikic already beat me to it. I decided to drop the dummy script and run it against my real use case and I saw a ~15% memory usage drop when using Object instead of Array.

@vrkansagara
Copy link

This was very insightful, thanks!

@RichardTMiles
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment