PHP

17 Mar 2005

require_once, one optimization too many?

I noticed a small thread on pear-dev about require_once, the concept that having require_once to lazy load files, is slowing things down seems to crop up every so often.

The crux of the issue appears to be the impression that lazy loading is slowing things down somehow, and that doing something like this may improve performance.

class_exists('PEAR') or require_once 'PEAR.php';
Or even worse, thinking about using __autoload magic..

In early versions of PHP4.3, and before, each require_once call had to do quite a bit of work to determine if a file had already been included.

It made the assumption that you might have changed the include path, and therefore, the file you where requesting might actually not have been loaded. So each call went through your every path in your include_path, made sure each part of the directory existed, and the tried to open the file, this resulted in quite a few stat calls (via realpath), as well as a few opens.

How much this was slowing things down was never really examined in detail, (although from what I remember Rasmus indicated that Y! had done a few patches to address this), but the existance of this patch and the general assumtion was that stat and open where relatively expensive made the situation sound kind of serious.

After considering the issue, a few of the core developers (Andi and Rasmus I think) added a stat cache feature. So rather than stat'ing the whole path on each require, it looked it up in a cache. The result can be seen by running this

strace php4 -r 'require_once "PEAR.php"; require_once "PEAR.php";' 2>&1 \ 
| grep -E '(stat|open|close|read)' | tail -30
As you would see from the output, what happens now is that the second call to require_once, calls open once on each possible location of the file (normally something like ./PEAR.php and /usr/share/pear/PEAR.php)

This should be pretty efficient, as long as you dont modify the path during the your php script (like move a directory or something).

However, as the discussion this week shows, this questionable performance issue still hasnt disappeared. So I got bored today and wondered what would be involved in making it even more efficient. (basically optimizing the second call to any [require|include]_once)

This is the result, not a working patch, more just a concept. http://docs.akbkhome.com/simple_cache.patch.txt

The idea being that assuming most people dont change the include path that often (probably only once when the app starts), then caching the strings that get sent to [require|include][_once] and testing them before doing any file operations could basically kill this kind of talk. The concept and code are simple enough that it shouldnt have too many knock on effects, and shouldnt use up too many resources to save a few open()'s..

The question is though, is if this is really an issue or just the impression of an issue....
Posted by in PHP | Add / View Comments()

17 Mar 2005

removing that stupid is_a warning.

My new pet hate about PHP5 is currently the rather stupid warning:

"Strict Standards
: is_a(): Deprecated. Please use the instanceof operator in"

Why on earth is that in there? instance of requires that the class or interface you are testing against exists. That means loading code that may not actually be used if you are using negative testing.


That shouts out ineffeciency, and doesnt really give you and readibility or particularly major gains in terms of code doing the testing for you.

It's about the only warning that PHP4 code emits when running under E_STRICT if you disable it when loading the code.

here is the simple patch to get rid of this crazyness..
 

--- zend_builtin_functions.c 1 Feb 2005 19:05:56 -0000 1.256
+++ zend_builtin_functions.c 17 Mar 2005 08:20:42 -0000
@@ -672,7 +672,6 @@
Returns true if the object is of this class or has this class as one of its parents */
ZEND_FUNCTION(is_a)
{
- zend_error(E_STRICT, "is_a(): Deprecated. Please use the instanceof operator");
is_a_impl(INTERNAL_FUNCTION_PARAM_PASSTHRU, 0);
}
/* }}} */


Posted by in PHP | Add / View Comments()

05 Mar 2005

PHP's XML DOM , almost there...

As part of the QA for the first release of DBDO, I'm migrating my current website from PHP4 to PHP5, while the original code ran on PHP5 with only a few minor changes (adding clone() to a few locations), Obviously there was more that could/should be done to it.
  • migrate DB_DataObject code to DBDO - which consists of.
    • add the DBDO::config() lines
    • change DB_DataObject::factory calls to DBDO::factory, and add the database alias as the first arg.
    • change find() and find(true) to query() and query();+fetch()
    • comment out the bits I havent finished yet (like escape().. - which is pretty critical)
  • replace XML_Tree with php's DOM...
The navigation code on my site uses a HTML file which is just a simple <UL> etc. and is just chopped up, a few extra tags added, then rendered with CSS when you view a page.. It was done as a proof of concept, as I really liked the single file nav concept (based off of Paul Wiki thing), but using wiki/just text and those dumb wiki DancingCaps names everywhere just sucked bigtime.
I also rather liked the idea of modifying the thing in a HTML Editor (until I could be bothered creating a real editor), so that's how the current idea came about.

To do the rendering / url rewriting, the file is passed and modified by XML_Tree and a few node iterators (good ole function calls and foreach(array_keys($node->children) as $i) ...... I've still not been convinced that PHP5 Iterators had anything other than being 'cool', I suspect they will make code less readable, and more magic.

After having done quite a bit DOM with Javascript, I made the decision to replace the XML_Tree code, with DOM, It turned out however that Javascript Implements DOM++, and PHP only impliments DOM...

The biggest difference between PHP and Javascript's DOM is that Javascript has effectivly decided to implement alot of SimpleXML's features within the DOM model. These are the differences, which I find more than a little annoying in PHP.

Fetching Children:
Javascript: child = node.childNodes[12];
PHP: $child = $node->childNodes->item(12);

Fetching Attribute value
Javascript: href = node.attributes['href']
PHP: $href = $node->getAttribute('href');

Setting Attribute value
Javascript: node.attributes['href'] = 'somevalue';
PHP: $node->getAttribute('href','somevalue');


There are a few other things that would be nice, that both miss out on.

Appending Elements

Javascript:
node = doc.createElement("span");
parent_node.appendNode(node);
PHP:
$node = $document->createElement("span");
$parent_node->appendNode($node);

PHP 'Natrual Way':
$node = new DOMElement("span"); // AFAIK this may work.
$node->childNodes[] = $node; //

From what I remember you can convert a simpleXML document to a DOM document, but It would be far better if DOM just implemented a few of SimpleXML Features..

PHP should really be about clarity, simplicity, and getting things done.. DOM is pretty close, but could really do with pushing it the last mile..


Posted by in PHP | Add / View Comments()

22 Feb 2005

the new E_ANAL error_reporting setting

After quite a few chats on various php-irc developer channels, it looks like PHP is going to be getting a new error_reporting level, E_ANAL.

while E_STRICT has been quite sucessfull in getting people to migrate from C# and Java, there is a sense that it is not quite complete. Apparently quite a number of them are missing their fatal exceptions on mundane issues). hence the new planned E_ANAL notices will start appearing when
  • you forget to declare a variable type before using it. (just giving it a default value doesnt count)
  • you try and do boolean tests on strings, integers, object (or anything that isnt a boolean)
  • string comparison on non-strings (although I'm tempted to suggest this for E_STRICT)
  • you declare a function (that's not part of a class)
  • you forget to put the return type of a method.
  • you forget to wrap a method call that can throw an exception in a try{ } catch block.
  • you make any property public (only available with the E_ANAL_JAVA extension).
  • any usage of PHP native array's or array functions.
  • any method call that does not involve at least 3 objects.
  • using any variable name that matches a object or method name anywhere in the imported methods or functions.
Of course while E_ANAL is regarded as esential for developing any enterprise applications, It is however highly recommended that it's turned off if you actually want to get anything done.

Developers are still open to new ideas for it.
Posted by in PHP | Add / View Comments()

16 Feb 2005

fighting C# just to do simple things.

Another day at the C# school of torture.

Querying a database is nice and easy in PHP, especially with some of the nice code in PEAR, In the quick example below I just get a user account from the database and update the activity log (using DBDO - although DB_DataObject works the same.):

$tbl = DBDO::factory('mydb','user');
if (!$tbl->get('username', $_POST['username']) ||
!($tbl->password == md5($_POST['password'])) {
return false; // access denied....
}

Logging the access:
$log = DBDO::factory('mydb','activity');
$log->user_id = $tbl->id;
$log->at_time = date('Y-m-d H:i:s');
$log->insert();

Now turning to C#, this nice clear piece of code turns into a noisy mess with potential to explode at any time.

SqlCommand dbcmd = new SqlCommand(
"SELECT * FROM user_details_basic WHERE " +
"username = @username"
,dbcon);
SqlParameter param = new SqlParameter("@username",
SqlDbType.VarChar );
param.Value = Request.params['username'];
dbcmd.Parameters.Add(param);
ArrayList user = getResults(dbcmd); // This is a 40 line method!
if (user.Count < 1) {
closeDB();
return 0; // not found..
}
// as I said before (md5 is a 10 line method)
if (0 != String.Compare((String) qmd5(password) ,
(String) ((Hashtable) user[0])["password"])) {
closeDB();
return 0;
}
.....
SqlCommand dbcmd = new SqlCommand(
"INSERT INTO activity " +
" ( user_id, at_time )" +
" VALUES ( @user_id , GETDATE() )",dbcon);
param = new SqlParameter("@user_id", SqlDbType.Int);
param.Value = (int) ((Hashtable) user[0])["id"])
dbcmd.Parameters.Add(param);
// imagine this code with quite a few more parameters.
reader = dbcmd.ExecuteReader();
The more I code in C#, The more I wonder if it's ever going to grow on me... it seems that although it's got a standard library, the quality of it is extremely poor, and not designed to help you solve problems, but rather waste time coding up simple things over and over again..

Posted by in PHP | Add / View Comments()

02 Jan 2005

DBDO status update - zend object internals

Just before Christmas, I ran into another roadblock with DBDO. one of those problems that you know isn't going to be fixed quickly, so you pospone it until you really have a bit of free time to solve it.

This particular issue was partly brought on by trying to implement sleep/serialization, but had cropped up before and I tried to ignore it - however sleep/serialization basicaly forced it to be addressed.

A 'post-query' DBDO objects represents an row of the database, if you modify the values, it is supposed to store the seperatly so that it knows you modified them. Hence when you update an object, it can work out what has changed an build a nice effecient query..
This makes the read_property and properties_get methods a little complex (as it has to decide if you asked for a property which is returned from the database or one you assigned afterwards.)

Unfortunatly, the read and write access to object properties in PHP's internals is a little haphazard. (funnily enough it appears to violate all data encapsulation ideas that OO principles are supposed to encourage - and that DBDO's direct access usage also breaks.)

A normal PHP object instance stores all it's properties in a single hash table. Access for writing and reading this hash is not strictly enforced (eg. by always using get/set properties on the object) , and many parts of PHP's internals write and read directly from it.

This has been causes all sorts of problems when attempting to decide if something has been changed, and hence should be updated.

functions like print_r() access the access the get_properties method of the object, however it also appears it might assign the objects properties to the returned value... however serialization calls the __sleep method, then access the property hash directly, rather than using either the objects property_get or get_properties internal methods.

Add to this, the uncertainty that zvals returned from get_properties may not actually be free'd or dtor'd (so as far as I could tell, i have to store the return value of get_properties and hash_destroy it at object destruct time..) makes the whole clever object storage stuff a little more complex that initially envisioned.

But hopefull a first beta should be out by the end of january, when I'll see if I can reduce the 3 hashtables that a DBDO object stores internally a little.
Posted by in PHP | Add / View Comments()

27 Nov 2004

Of editors and C

DBDO continues to grow, and fortunatly I'm slowly shrinking the number of memory leaks. (As I get a better understanding of all the libraries involved.)

Most core functionality is working, query/insert/update/delete. I'm slowly fixing some of the design mistakes that I made with DB_DataObject (like being able to create update/delete statements that wiped out data. - This is fixed in DBDO, by forcing you to use the DBDO::BUILD argument if you dont want to delete anything more than the current row.)

I also decided to utilize php_error(E_WARNING......) for the debuging code. DBDO::debugLevel(1) issues all the built SQL statements as warnings. (I think this is quite a nice lightweight solution) - that also gives the line/file etc. and can give you backtraces in xdebug.

My other little toy flexyparser has had most of the parsing syntax for the flexy engine added. (I've still got to fix the style parsing stuff.). This little back burner project is still pondering if it should do the template building etc. in C or just be used as a parsing library.... or it could be really clever and build zend opcodes :)

But what caught my eye this week was Harry's post about IDE's, Having written one myself, mainly due to fustration with the existing alternatives on Linux. I agreed alot of the points he made, and some of the commenters added.

If the ability to program in a language depends on the developement tool. (C# and Java come to mind) Then reading code without the editor becomes horiffic.... I wonder if I'm in the group that regard source code as an expression of intent, and if it cant be read, it's very difficult to read intent.

Anyway, I noticed a couple of weeks the sysadmin at one of my clients using gphpedit, I was initially pretty impressed. (I guess like most people are with Zend Dev Studio, until they try it on a slower machine..).

gphpedit has a very nice features set, basically it's very similar to phpmole, but in C with alot of gnome libraries. It had a very nice class browser and syntax highlighting etc. using the scintilla widget, very much like phpmole. It was only until I had a browse round the soucecode in CVS, that I started pondering some of the mistakes.
  • PHP parsing for classes was done by a hand coded lexer, which looks for class/function etc. This is what I originally did with phpmole (it was a mistake then, and given the fact that gphpedit is in C) it is just as bad now. (I ponder if using the lex/bison code from Zend/* would not be more efficent, either that or using phpembed, to actually use the library might be better..
  • Project (bookmark) based browsing was missing, I found this to be invaluable when I designed it into phpmole. Although the interface could probably do with changing. The principle of navigating from  root directly to all the projects you working on works really well. (actually on unix I should really have used softlinks - but I didnt have vfs then...)
  • Class method hinting. This is something that is near impossible in php (but would be quite usefull) - yes Zend Dev Studio does it, but in reality alot of time it's guesswork. (although it gets alot easier with PHP5 argument hints.). it also tends to slow down the application alot (especially on my pathetic P3/1000), making the program pause just when you really in the flow of things.
One thing gedit did make me consider though, was the idea of building a php editor as a php extension...... php -d extension=editor.so :)

Posted by in PHP | Add / View Comments()

12 Nov 2004

Frankfurt Summary

Well, thanks to 20 hours traveling, my 1hr talk seemed to go reasonably well, given I was a little jetlagged/recoving from flu, and had to drink alot of water during it.

The slides are available as a openoffice presents file, The do point out a few of the more annoying gotcha's on PHP5, that appear to crop up. and discuss some of the more interesting issues, one I fumbled around at the talk was that I cant see any clean way to make current PHP4 packages E_STRICT safe, without changing the package name. (It's just going to shock too many people, even if the installer 'can' fix it)

The rest of the conference was taken up chatting to a few of the other developers (interestingly alot more PEAR developers, where around, and rather few PHP developers). An afternoon nap, trying to adjust breifly to European timezone, and a reasonable amount of drinking.

I did manage to go to a few of the talks, Mark Mathew's talk about Mysql Performance (part of the mysql com con) tuning was interesting, not only as I've been having huge problems with a few queries recently, but also that it had some focus on java and EJB's, which drew java developers into a crowd of PHP geeks. Rather than being an opportunity to convert, or argue the benifits of both languages, it showed that developers with any language suffer the same or similar real issues.

The Mysql talk was reasonably useful (although the App server stuff was obviously not that relivant), the mention of 'sar' for monitor and log server performance over time, and mention of index's. As I mentioned to Mark breifly after the talk, He could perhaps expanded the section about indexes and explain.

He seemed to skip over what seems to be an inhouse secret at mysql. When searching multiple columns of a table (or multiple ones in a join), mysql will only use one index.
eg.

SELECT a,b,c FROM d WHERE e > 5 and c <6 order by g

I used to thing a perfect index arangement for table d would be.
CREATE INDEX d_e ON d(e);
CREATE INDEX d_c ON d(c);
CREATE INDEX d_g ON d(g);
This, it turns out is completly useless..

You should create a multiple column index
CREATE INDEX d_q1 ON d(e,c,g);

The only other talk I caught before I had to fly out was harmut's PECL_Gen, which while not following PEAR standards, looks reasonably interesting. I think however, until it supports something that looks more like C for the body part of the code generation, the thought of writing code into an XML file just scares me..

Ah well, nice to get back to humidity and hot weather. (28C when I landed back in Hong Kong)
Posted by in PHP | Add / View Comments()

01 Nov 2004

Making simple things easy, and difficult things possible. yet another html parser.

When John released his bindings to html tidy, I joked with him, that it would have been far more interesting (as a project), to write a proper HTML lexer, rather than bind to an existing library. (mainly cause having written one in PHP, I didnt think it would be that difficult), and I have a strange idea of fun...

Well, over the weekend, I was re-pondering this. Partly due to the fact I had used the Flexy Parser to try and parse HTML from a web site, and found the tokenizer in Flexy was getting slower with age (5seconds on average to parse a page). While this is not a huge issue normally, as this parsing is cached during the compiling phase of template engine. It is a huge issue if you are pulling pages down, parsing out the forms, and reposting the forms in a web test script.

So over the weekend after a little google search and discover trip, I ran across a little w3c project, "A Lexical Analyzer for HTML and SGML", It looked interesting, but it wasnt until I pulled the code down, untared and built it, that I realized it could be used to write a really fast, and simple HTML tokenizer. (not only that, it could easily form the basis of a C based backend for Flexy.)

To create an extension that used the code (not a library, but just pulled in the C code into a PHP extension), and parse a string of HTML took about 30 minutes.. - It took an extra 3 hours, on and off over a few days, to make it return a array of tokens (with attributes sorted into a sensible structure.)

So now I have a cute extension that has 1 function, and 1 result, KISS at it's best..

<?php
print_r(
flexyparser_tokenize(
file_get_contents("..some file...")
));

Outputs:

[0] => Array
(
[0] => 14 // token type (look up the source)
[1] => // data (tag name or string)
[2] => 1 // line number
[3] => 0 // character position
)

[1] => Array
(
[0] => 1
[1] =>

[2] => 2
[3] => 50
)

[2] => Array
(
[0] => 2
[1] => HTML
[2] => 2
[3] => 51
)

[3] => Array
(
[0] => 2
[1] => HEAD
[2] => 2
[3] => 57
)
.....
......
[15] => Array
(
[0] => 2
[1] => A
[2] => 6
[3] => 212
[4] => Array // array of attributes
(
[HREF] => "/pub/WWW/Consortium/"
)

)

[16] => Array
(
[0] => 2
[1] => IMG
[2] => 7
[3] => 243
[4] => Array
(
[align] => bottom

[src] => "/pub/WWW/Icons/WWW/w3c_48x48"
)

)
the code is in my svn server, under akpear/flexyparser, works perfectly with PHP5 and PHP4 at the moment.

I really want to do a tree version of this, that loads data into a user defined object: eg.
<?php
$tree = flexyparser_toTree($data, new MyClass);

so it can be used 'how you want it...'


Posted by in PHP | Add / View Comments()

22 Oct 2004

PDO - Why it should not be part of core PHP!

Ever since PDO was proposed, there have been warm fuzzy feeling about it's iminent arrival. I hate to say, though I fear it could be the biggest mistake PHP ever makes.! (well I guess among the worse).

That's not a lightweight statement, and I'll fire of the disclaimers, before I go any further.
  • I'm working on an alternative at present.
  • I know the authors of PDO personally, and I probably owe them a drink for dissing the work.
  • From what I've seen of the code in PDO, it's pretty damn good.

So what you may ask is so wrong with it?

The major issues I have with PDO are
  • Reuse, recycle...

    When there was at least 2 capable, cross platform database libraries available. Rather than examine if they where a feasible option, it was decided to go off and start from scratch.

    Having spent quite a time working with libgda, probably the best library available to do this, It has a 5 year head start, fully tested, and is considerably more feature rich than PDO will ever be.

    PDO seems to be slowly gaining backends, however, It's a long way from having as many as libgda. And for each of the features missing in PDO, It will take time to ensure they are enabled in all the backends, let alone fully tested.

  • Copying C is not always a good idea..

    The worst features of PDO is variable binding (similar to pointers). This is what could be regarded as 'minefield programming'. In some of the worst wars in history, army's laid mines all over stratigic borders, so that the oposing army would be delayed in crossing. What happened after the war was a tragic travasty, the abandoned minefields started injuring innocent civilians.

    The way variable binding is done in PDO mimics this behaviour, while it achieves the goal of making updating rows relatively simple, It has the significant downside that it makes code unpredictable, splattering variables, which if changed may update a database, without making it clear and simple that this may occur. This whole issue should have been dealt with by using objects to represent database rows, and only make these usable as 'updateables' (see the introduction to DataObjects) which is exactly how DBDO was designed.

  • Cool features dont always make great features.

    Iterators where introduced with much fanfare in PHP5, using foreach on an object that implements the iterator methods, and result in a fetch operation on every foreach. The simple example below illustrates the crux of the problem. (and I've already seen people pastebin iterator code, and you really are making wild guess as to the intent of the code)

    Without sensible variable names, this rather confusing code easily evolves into complete jiberish..
    foreach ($pdoresult as $row) {
    ....?? is $row an object/array/or something else. ??
    }

    compared to the clarity below

    $do->query();
    while ($do->fetch()) {
    $data = $do->toArray();
    }


    While I believe you can still use PDO without using iterators, They do lie at the heart of the problem. I dont know if it was an off-the-cuf remark, but It appears that PDO started as a proof of concept for using iterators on Databases, and as it evolved became a limited Database abstraction layer, which leads into the last issue.

  • Design by evolution, at the beginning or end.

    I dont think I've ever come away and designed an interface correctly the first time, by now DBDO has a long history.

    • Started off as Midgard's Data Model emulation in Midgard Lite
    • dboo, was an improved version, introducing query building from object vars, primary key knowledge
    • DB_DataObject, introduced a generator, simple data type knowledge, PEAR DB backend, Joins, and alot more.
    • DBDO takes DataObjects as a base, removes alot of cruft that has built up, and at the same time introduces extensive datatype support, lazy fetching.
This now leaves DBDO at a stage where most of the documenation is already done (as part of DataObjects).

In comparison, since PDO does not appear to have a history, It's first generation API has not had a chance to be picked apart by thousands of users. It lacks documenation (although that apparently will be fixed soon.)

I'm sure there are some valid reasons why DBDO could not stand up as the next generation Database library for PHP5, but I wonder seriously if those are less significant than PDO's.
  • Win32 Support
    After considerable work over the last month, this is pretty much solved. (Although I only bothered building mysql/postgres for win32, to continue the process should be pretty simple.). Given this work already, the amount of effort to make it 'enterprise ready', is pretty minimal now.
  • Too many libraries?
    Libgda has a midsized dependancy list, although this is not that major compared to libxml2, which is now part of the standard distribution. Currently DBDO needs
    • libgda (obviously)
    • libglib
    • libgmodule
    • libgobject
    • libgthread (maybe not necessary)
    • libxml2 (used by PHP's xml extension already)
    • libxlst (used by PHP's xlst extension already)
    • iconv (used by PHP already)

Not sure if this blog will change the fact, but by not using libgda, I think PHP is missing a huge opportunity to stand on the shoulders of giants, and get a world class Database backend, with a limited effort.
Posted by in PHP | Add / View Comments()
« prev page    (Page 5 of 7, totalling 63 entries)    next page »

Follow us on