An Introduction on How Variables Are Handled by the PHP Interpreter December 30, 2012Posted by Tournas Dimitrios in PHP.
Typically we think of PHP variables as named containers of a value , their content could be any type of data , for instance : a Boolean , an array , an integer , a resource or even a 10-Megabits string . We often write our code in such a way where the same variable has to be copied several times during the execution process of the script . Copying of a variable can be done with a “straightforward” process (explicitly) , by assigning its content into another variable ($a = $b) . A variable is also copied , implicitly , during some programming conditions (when passing a variable into a function , when a function is returning a variable , when iteration occur on a variable — for/while/foreach loops — ) . By default , PHP-variables are assigned by value , except if we overwrite this functionality with a “by-reference-assignment” ($a =& $b ) . One might say , when a variable is copied (implicitly or explicitly) , its entire content is copied into the destination variable . That makes sense , you might say . Well , no …… What would happen If every time a copy of a variable was needed and it was done by just duplicating it ? Simple , it would occupy extra memory locations , and that can lead to severe memory duplication (which can result into performance issues ) . The way that PHP solves this “problem” is by implementing the copy-on-write strategy , in simple words , data-blocks are stored in buckets (zval containers) . Each bucket is initially shared by all variables (borrowers) that would need a copy of that bucket . This state of affairs can be maintained until a “borrower” tries to modify its “copy” , instantly the bucket is duplicated and assigned to that specific borrower . Now , the borrower becomes the owner of the newly created bucket and can modify the contents as his desire .
PHP implements this intuitive technique by separating the variable-name (also called symbol) from its value (content) . All symbols are stored in a symbol table of which there is one per scope . There is a scope for the main script (i.e. , the one requested through the browser) , as well as one for every function or method . Content is stored in a container called a “zval” , which also stores three additional bits of meta-information :
- type : Represents the type of data : Boolean , int , string
- is_ref : Is a Boolean , indicates whether or not the variable was copied by a “copy-by-reference” assignment . From this bit of information the PHP interpreter knows how to differentiate the two “variable models” (assign-by-value , assign-by-reference) .
- refcount : Is a number , representing how many variable-names (symbols) point to this one zval container .
What PHP actually does , is separating the variable (name) from its value (content) and that a variable name is nothing more than a pointer to a container . Each time a variable is copied , only its pointer is copied , not the zval container (of course , up to the point where the borrower wants to modify that copy ) .
Just a reminder : When a variable is copied with the “copy-by-reference” assignment , it shares a common bucket with equal rights . The “copy-on-write-strategy” is only applicable to copies made with the “copy-by-value” assignment .
Later on , 10 screenshots will clarify all these concepts . A couple of paragraphs with “boring” theory is absolute necessary , before these screenshots make any sense to you . Also , knowing how PHP handles variables “behind the scenes” might prove a time saver when dealing with debugging tools like Xdebug .
Let’s recap :
- we have variables (symbols) that act as pointers to zval-containers , each zval-container is referenced by one or more pointers (depending from how many times a variable was copied ) .
- Initializing of a variable is actually done by creating a symbol (pointer) with a corresponding container (zval) , each subsequent need for copying of that variable is only done by copying of its pointer and assigning to it a new name .
- Each “zval” container stores a “is_ref” indicator (Boolean) , it designates how the copy was made . If the indicator has a value of “FALSE” (0) , the PHP-interpreter applies the “copy-on-write-strategy” . A “TRUE” value designates the container as “common bucket” .
- Each “zval” container also stores a “refcount” value (integer) . This value indicates how many symbols (pointers) are referring to it (how many copies were made) . Variables can also be deleted (with the unset($varName) command) . A “refcount” indicator can also have a zero value (just because the script was designed at some point to totally delete a variable ) , in this case , “symbol” and “zval” container are designated as candidates for garbage collection .
Hooray …… boring theory is at the end , let’s present a few practical examples . Xdebug will be used to mirror the internal functionality of the PHP interpreter . Xdebug is a PHP extension which provides debugging and profiling capabilities . A future article will go into more details of how to install and use this excellent PHP- tool , but for now , just follow along .
Step 1 : Just a new variable name (pointer) with its container (zval) are initialized into the memory .
Step 4 : Notice what is happening on this step . On variable “z” a new value has been assigned (integer 33) . This time , a pointer is deleted (compare picture from previous step ) and a new container (zval) is created in memory (of course with a new pointer) . How does PHP know that ? Simple , the “is_ref” indicator notifies PHP that the original container can only be changes by its owner and applies the “copy-on-change” strategy .
Step 4a: On this step we define that the copy should be done with the “assignment by reference” method ($z =& $x ) , again all pointers are referring to the same container (zval) . Notice the “is_ref” indicator , this time it has a Boolean value of “TRUE” , that means that all pointers have the permissions to change its content .
Step 4b : Continuing from the previous example , at first , a value of type “string” is assigned to the $x variable (and to all other variables) . Immediately , a new value is assigned to $z (an integer) . The “is_ref ” notifies PHP that all pointers have equal permissions and are allowed to change the content of the container . PHP happily , changes the content of the original container (it will not duplicate the container , saves a few bytes of memory location ) .
Step 4c : Something interesting is going on here , first a string is assigned into $x . Then an “copy-by-value” is made for variables $y and $z , nothing new as its the same simulation as on “Step 4 ” . But this time , the same string is re-assigned into variable $x . The PHP interpreter can’t recognize that there is actually nothing changed from the previous assignment and applies the “copy-on-write” strategy .
Step 4d : The scenario on this example is similar (not identical) as in previous example . This time the PHP interpreter recognizes that nothing is changed in variable’s $x , from previous status , and just ignores the event .
Step 4e : Although the same string is re-assigned into variable $z , the PHP-interpreter applies the “copy-on-write” strategy and duplicates the zval container .
Step 5 : This step is really interesting , at first variables are assigned (like previous step) , but immediately deleted (unset) . The container and the two pointers are still occupying memory resources , but these are now “first class candidates” for garbage collection . Although memory is occupied by their existence , we can’t access them (we don’t have any pointer anymore ) . What will happen ? At some point PHP’s garbage collection functionality will destroy their memory occupation and releasing that memory location to the script . When will that happen ? One thing’s sure , if at some point the PHP interpreter has reached his memory limitations , it will “cleanup” all memory locations that are candidates for garbage collection .
Final thoughts :
This article made just a basic introduction , many details have been left out of focus .The PHP Manual also has some information on references , although it does not explain the internals very well it is still a valuable resource to push our knowledge level a bit higher . A few links for further reading :