VOOZH about

URL: https://lwn.net/Articles/379909/

⇱ Resetting PHP 6 [LWN.net]


👁 LWN.net Logo
LWN
.net
News from the source 👁 LWN
| |
Log in / Subscribe / Register

Resetting PHP 6

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Jonathan Corbet
March 24, 2010
Rightly or wrongly, many in our community see Perl 6 as the definitive example of vaporware. But what about PHP 6? This release was first discussed by the PHP core developers back in 2005. There have been books on the shelves purporting to cover PHP 6 since at least 2008. But, in March 2010, the PHP 6 release is not out - in fact, it is not even close to out. Recent events suggest that PHP 6 will not be released before 2011 - if, indeed, it is released at all.

PHP 6 was, as befits a major release, meant to bring some serious changes to the language. To begin with, the feature which is the whipping boy for PHP security - or the lack thereof - will be consigned to an unloved oblivion; the "register_globals" feature will be gone as well. The proposed traits feature would bring "horizontal reuse" to the language; think of traits as a PHPish answer to multiple inheritance or Java's interfaces. A new 64-bit integer type is planned. PHP was slated to gain a keyword (though the plan was to avoid the scary name and add target labels to instead). Some basic static typing features are under consideration. There was even talk of adding namespaces to the language and making function and class names be case-sensitive.

The really big change in PHP 6, though, was the shift to Unicode throughout. Anybody who is running a web site which does not use Unicode is almost certainly wishing that things were otherwise - trust your editor on this one. It is possible to support Unicode to an extent even if the language in use is not aware of Unicode, but it is a painful and error-prone affair; proper Unicode support requires a language which understands Unicode strings. The PHP 6 plan was to support Unicode all the way:

PHP6 will have Unicode support everywhere; in the engine, in extensions, in the API. It's going to be native and complete; no hacks, no external libraries, no language bias. English is just another language, it's not the primary language.

Unicode, however, appears to be the rock upon which the PHP 6 ship ran aground. Despite claims back in 2006 that the development process was "going pretty well," it seems that few people are happy with the state of Unicode support in PHP. Memory usage is high, performance is poor, and broken scripts are common. The project has been struggling for some time to find a solution to this problem.

From your editor's reading of the discussion, the fatal mistake would appear to be the decision to use the two-byte UTF-16 encoding for all strings within PHP. According to PHP creator Rasmus Lerdorf, this decision was made to ease compatibility with the International Components for Unicode (ICU) library:

Well, the obvious original reason is that ICU uses UTF-16 internally and the logic was that we would be going in and out of ICU to do all the various Unicode operations many more times than we would be interfacing with external things like MySQL or files on disk. You generally only read or write a string once from an external source, but you may perform multiple Unicode operations on that same string so avoiding a conversion for each operation seems logical.

But a lot of strings simply pass through PHP programs; in the end, the conversion turned out to be more expensive and less convenient than had been hoped. Johannes Schlüter describes the problem this way:

By using UTF-16 as default encoding we'd have to convert the script code and all data passed from or to the script (request data, database results, output, ...) from another encoding, usually UTF-8, to UTF-16 or back. The need for conversion doesn't only require CPU time and more memory (a UTF-16 string takes double memory of a UTF-8 string in many cases) but makes the implementation rather complex as we always have to figure out which encoding was the right one for a given situation. From the userspace point of view the implementation brought some backwards compatibility breaks which would require manual review of the code.

These all are pains for a very small gain for many users where many would be happy about a tighter integration of some mbstring-like functionality. This all led to a situation for many contributors not willing to use "trunk" as their main development tree but either develop using the stable 5.2/5.3 trees or refuse to do development at all.

The end result of all this is that PHP 6 development eventually stalled. The Unicode problems made a release impossible while blocking other features from showing up in any PHP release at all. Eventually some work was backported to 5.3, but that is always a problematic solution; it brings back memories of the 2.5 kernel development series.

Developer frustration, it seems, grew for some time. Last November, Kalle Sommer Nielsen tried to kickstart the process, saying:

I've been thinking for a while what we should do about PHP6 and its future, because right now it seems like there isn't much future in it.

Things came to a head on March 11, when Jani Taskinen, fed up with being unable to push things forward, (1) committed some disruptive changes to the stable 5.3 branch, and (2) created a new PHP_5_4 branch which looked like it was meant to be a new development tree. That is when Rasmus stepped in:

The real decision is not whether to have a version 5.4 or not, it is all about solving the Unicode problem. The current effort has obviously stalled. We need to figure out how to get development back on track in a way that people can get on board. We knew the Unicode effort was hugely ambitious the way we approached it. There are other ways.

So I think Lukas and others are right, let's move the PHP 6 trunk to a branch since we are still going to need a bunch of code from it and move development to trunk and start exploring lighter and more approachable ways to attack Unicode.

And that is where it stands. The whole development series which was meant to be PHP 6 has been pushed aside to a branch, and development is starting anew based on the 5.3 release. Anything of value in the old PHP 6 branch can be cherry-picked from there as need be, but the process of what is going into the next release is beginning from scratch, and one assumes that proposals will be looked at closely. There are no timelines or plans for the next release at this point; as Rasmus explains, that's not what the project needs now:

We don't need timelines right now. What we need is some hacking time and to bring some fun back into PHP development. It hasn't been fun for quite a while. Once we have a body of new interesting stuff, we can start pondering releases...

So timing and features for the next PHP release are completely unknown at this point. Even the name is unknown; Jani's 5.4 branch has been renamed to THE_5_4_THAT_ISNT_5_4. There has been some concern about all of those PHP 6 books out there; it has been suggested that a release which doesn't conform to expectations for PHP 6 should be called something else - PHP7, even. There's little sympathy for the authors and publishers of those books, but those who bought them may merit a little more care. But that will be a discussion for another day. Meanwhile, the PHP hackers are refocusing on getting things done and having some fun too.