05
- April
2017
Posted By : Sunny Srinidhi
Understanding PHP Variables

PHP

If you know PHP, you know that it’s written in C. If you know C, you also know that it’s statically typed. What does this mean? This means that you need to declare the type of a variable when you are declaring the variable. This is how you declare a variable in C:

int a = 0;

And this is how you do the same in PHP:

$a = 0;

So how does PHP know that $a is an integer and not a string? Or any other type? How does PHP convert this dynamic typing into static typing for the underlying C code? To understand this, you need to understand how PHP handles variables in it’s code. And that’s what we are going to do now.


PHP is an open source, so you can find the code on GitHub. If you are new to the source code, it could be a bit intimidating. We’ll leave out the directory structure of the code for now. You can explore the repository on your own and try to figure out how everything works together. But we are interested for now is a structure called ZVAL. Let’s see how it’s written:

struct _zval_struct {
    /* Variable information */
    zvalue_value value;  /* value */
    zend_uint refcount__gc;
    zend_uchar type; /* active type */
    zend_uchar is_ref__gc;
};

In C, struct, or structure, is something like ‘Class.’ You can define a struct and then use that as a datatype for declaring variables. In the PHP code, all variables are of type ZVAL, which is nothing but the _zval_struct.

As you can see from that definition, there are four variables in the struct. Let’s look at them a bit more closely.


Value

The variable value is of type zvalue_value. If you are thinking that’s another struct, you are right. That means there must be a zvalue_value definition somewhere in the codebase:

typedef union _zvalue_value {
    long lval;  /* long value */
    double dval;  /* double value */
    struct {
        char *val;
        int len;
    } str;
    HashTable *ht;  /* hash table value */
    zend_object_value obj;
} zvalue_value;

Looks a bit messed up, doesn’t it? Doesn’t feel as straight forward as the previous struct declaration, with union keyword and a struct inside this struct and all that. Also, even if there are multiple types inside this struct, it’s a single type. To understand what that means, let’s see how C handles types.

In C, data types are used only as labels, as identifiers for memory locations. What I mean to say is, a 4-byte string in C is not all that different than a 4-byte int. The only difference is the label. The data types in C tell the compiler what to expect in that memory location and how to handle it. This also enforces that memory location to have a value which has the characteristics of that type, to put it simply.

Now what’s a union, you ask? A union is a single type (as I already mentioned) in C which could be interpreted in various ways, depending on how it’s accessed. This means, PHP developers have to just use one data type in their code to represent all the data types supported by the language. That makes life easier, right? Not so much. This means that all data types take up the same amount of memory. In the case of a 64-bit compiler, a variable takes up 96 bits.

One more thing to notice is that there are only 5 data types in the definition. What about the other data types supported by PHP? Well, it turns out, these 5 types are enough to handle all the types supported by PHP. Wondering how? We’ll see later. Here’s what the types means:

long = int

double = float

str = string

hashtable = array

zend_object_value = object


Type

The ‘zend_uchar type’ variable is the variable that does the magic of handling the type information. The zvalue_value variable holds the value of the variable defined in PHP, but it has no type information. The type variable, which is just a single unsigned character, holds a value from the zend type constants. So it’s basically just a constant integer to define a type. For example, zval.type = IS_LONG means that the variable is of type int. Nothing much here.


IS_REF

This one is pretty self-explanatory. It just represents if the variable is a reference or not. A value of 1 means it’s a reference, and a value of 0 means it isn’t. The next variable has more information which can be coupled with this to make sense.


REFCOUNT

This variable holds the number of references for a given zval. If the refcount is 1, it means that there’s exactly 1 reference to the zval instance. If the value is 2, there are two references to the zval instance. You get the point. This is relevant for a number of reasons, garbage collection, copy-on-write, etc.


That’s pretty much it. Now you know how PHP handles dynamic typing when the underlying technology is statically/strictly typed. You also know how PHP makes sense of different data types when there’s only 5 types it understand. You can check the PHP documentation site for more information on this, or get your hands dirty with the source code.

PS: It’s easy to browse through the source code if you clone the repo and open it in an IDE such as PHP Storm, which lets you easily go deep into the code by clicking through the definitions. PHP had an LXR site to make browsing code easy in a browser. But looks like it’s down now.

Leave a Reply