Work post

Feb. 27th, 2008 01:49 pm
pyesetz: (Default)
[personal profile] pyesetz

Lorem ipsum dolor sit amet.  This is some pointless text to go with the animated elephant, which was drawn by Vincent Pontier, who is apparently friends with Oliver Plathey who wrote the FPDF module for PHP.

When I started working at Company 𝔾 in June '06, I didn't know anything about MySQL, so "Mr. Bear" suggested that I use XML instead.  Being a n00b, I stupidly chose the DOMXML module, which is specific to PHP4 and has prevented us from migrating to PHP5.  Now PHP4 is at "end of life" and we really need to get rid of it.  Also the nightly validation report has started failing because even 48 MB of RAM is no longer enough to load in all the databases and check them for validity (because DOMXML has serious memory-leak issues).

This month I've been busting my tail, sniffing the grindstone, trying to get my billable hours up high enough to staunch the financial bleeding.  My RSS feed of LiveJournal posts now has 266 289 unread messages.  Last weekend I had gotten it down to below 100 but the posts just keep coming!  Oddly enough, there is no post from [livejournal.com profile] xolo about his Christmas celebrations.  And no reply for my email to [livejournal.com profile] loganberrybunny re the use of state/province to refer to the "home countries" of the UK; has offence been taken?

For the last three days I've been removing XML and replacing it with PHP arrays.  Instead of
<entry>
<_key>deadbeef</_key>
<name>Joe Schmoe</name>
<country>USA</country>
<region>Colorado</region>
<city>Pike's Peak</city>
</entry>
it's now
$DB_Entries['deadbeef'] = array (
'name' => 'Joe Schmoe',
'country' => 'USA',
'region' => 'Colorado',
'city' => 'Pike\'s Peak',
);

Looks about the same, loads 10‒20 times faster!  Really, this is a serious demerit for XML.  The whole point of XML was that, since everyone would use it, it was worthwhile to optimize the Hell out of the parser for it, so then XML should be parsed faster than any other format.  But PHP's program-code parser is much faster than their XML parser.  In part I think this is because of the attributes.  XML tags have optional attributes, even though I don't use them, so the isomorphism between XML and PHP-arrays potentially could fail, although it doesn't in my case.

Making this change required touching just about every actively-used file at the website!  All the databases needed conversion.  Any program that reads databases needed to start reading them the new way.  Any program that generates databases needed to start writing them the new way.  I've probably introduced dozens of bugs.  We'll see if I get any bug reports.  Most pages at the website are now served in only ⅓ the time they used to take; will anyone notice?

And so, with this little side-problem taken care of, I can get back to the main project for this winter.  Unlike most Company 𝔾 projects, this one actually has a deadline because there's a conference in May that it has to be ready for.  I need to convert various files to PDF and then combine the PDFs on the fly, hence the need for FPDF, whose homepage has an elePHPant and a link to Vincent Pontier, hence this post.  Bye!

Date: 2008-02-28 08:58 am (UTC)
From: [identity profile] xolo.livejournal.com
He reminds me of Sidney the Elephant :)

Date: 2008-02-27 08:07 pm (UTC)
From: [identity profile] loganberrybunny.livejournal.com
And no reply for my email to [livejournal.com profile] loganberrybunny re the use of state/province to refer to the "home countries" of the UK; has offence been taken?

No, of course not; I simply didn't see that email. It doesn't seem to be in any of my inboxes; which address did you send it to? And what did you have to say anyway? =:)

Date: 2008-02-27 08:44 pm (UTC)
From: [identity profile] loganberrybunny.livejournal.com
I've received it at the "rabbiteer" account now, so will reply to you by email later on. But it's an interesting subject, which I may end up writing about on LJ sometime anyway.

To answer the question you ask here: it might well be considered mildly offensive, yes, since England is considered a country by most of its inhabitants. The word country does not necessarily connote political independence; the nearest thing we have to an official term is "constituent country (http://en.wikipedia.org/wiki/Constituent_country)".

"State" wouldn't do since that usually implies something to do with government control - "state schools", for example - rather than a sub-UK division. "Province" is no good, since only Northern Ireland actually is a province; Wales is a principality while England and Scotland are kingdoms. Besides, "the provinces" is a slightly derogatory way to say "outside London".

You could probably get away with "nation", though, as in the Six Nations rugby championship.

Date: 2008-02-28 12:03 am (UTC)
From: [identity profile] loganberrybunny.livejournal.com
I don't think this matters. Massachusetts isn't a "state"...

But, as you say, most people call it one. Most people do not refer to, say, Wales as a "province".

Wales *used* to be a principality, until 1277.

The Treaty of Aberconwy was indeed 1277, but the lands of the Welsh Princes were not legally incorporated into England until the Treaty of Rhuddlan in 1284. It's true that there is no Welsh "Head of State" in the person of Charles, though most people do use "principality". You wouldn't see it on an official form, though, no.

England and Scotland haven't really been kingdoms since 1649 or so.

Why 1649? The Union of the Crowns was 1603, and the Act of Union was 1707.

every website now says "city" as the generic term

Only because most of the software used for this sort of thing is American. People in Britain would never refer to a small town like Bewdley as a "city", and it's not used as a generic term for "urban settlement" either.

Date: 2008-02-27 08:55 pm (UTC)
From: [identity profile] loganberrybunny.livejournal.com
Having said the above... "nation" will only do if you're keeping to the four constituent countries of the UK. Jersey, Guernsey and the Isle of Man are Crown dependencies, while the other remaining British possessions (eg Gibraltar and Anguilla) are British Overseas Territories. None of them are actually part of the UK at all.

Date: 2008-02-28 12:10 am (UTC)
From: [identity profile] loganberrybunny.livejournal.com
My understanding is that Jersey is "part of the UK" in the sense that its residents have a birthright for UK passports.

True; there's no such thing as a "Jersey passport" since Jersey is a dependency. It's a separate possession of the Crown outside the UK (and the EU), I suppose rather as Canada is a separate Commonwealth Realm from the UK. However, as it isn't part of the UK, I would need a passport to visit Jersey.

Jersey's semi-detached status is probably slightly more known than it used to be, because of its consequent exclusion from the EU. Among other things, that means that if I send a parcel to Jersey, I need to complete a customs sticker, which I wouldn't were I sending it to France.

Date: 2008-02-27 10:42 pm (UTC)
From: [identity profile] giza.livejournal.com
My understanding of XML is that it was intended more as a data interchange format, and not something to store data in full time. With a syntax as complex as XML, that's a lot parsing involved.

As it stands with your rewrite, there's still PHP code that is going to be parsed on every page load. Why not use seralize() (http://us.php.net/serialize) to store your data in a format which needs even less parsing? If there are concerns about human-readability of the data, you could always check the timestamp of the PHP source and only re-generate the serialized data if the serialized data value is missing/non-existent.

I wrote a cache system awhile back that stored large pieces of data (more than 100K in size) in files as serialized strings. PHP loaded and ran unserialize() on them amazingly fast, in less than a second. It was nice.

Alternatively, why not just put everything into a database? :-) Since you mention formerly not knowing anything about MySQL, I'm assuming that you have some knowledge of it now.

Edited Date: 2008-02-27 10:43 pm (UTC)

Date: 2008-02-28 01:20 am (UTC)
From: [identity profile] giza.livejournal.com
Maybe I didn't make it clear in my original comment, but the serialized files would not be intended to be edited by hand. You'd still edit the PHP files by hand, but they would only by read by your code on the first pass, at which point serialized copies of the data would be written and then read on future passes. The code would look something like this:

if (!cache_file()) {
   $data = read_php_file();
   write_cache_file($data);
} else {
   $data = read_cache_file();
}


In this example, cache files == the serialized data. That is essentially the same algorithm that I implemented on a prokect awhile back so I could avoid some particularly expensive database queries.

> It insists on putting all its databases in a secret place (typically
> /var/lib/mysql) instead of storing them in the directories where they'll
> be used—this makes it a bit more involved to archive an entire project.

That's perfectly normal for any database. You never want to archive the database files directly anyway since those files are subject to modification while the database server is still running--you'd have to shut down the database server to perform a backup. (Sidenote, Oracle actually let you put tablespaces in "archive mode" during which changes would be writen to a separate log instead. But that was insane)

What you want to do for archival is write a short script that calls the mysqldump program and dumps the entire database to a secure location.

> The program is a behemoth. It's reasonably fast once it gets going, but
> the first query after an idle period takes several seconds (on a shared
> server) while all that code gets swapped in.

From what you describe, it sounds like the server is running out of RAM. That could be either physical RAM, or memory allocated to MySQL. A properly tuned database server will have plenty of RAM so most if not all of the database can be loaded into RAM, thus avoiding the problem you described. To clarify: this is not MySQL's fault, it is a database administration issue.

> Mr. Bear's preference is for a "punctuation optional" search algorithm
> that doesn't match standard SQL operators, so everything would need to be
> stored twice ("search form" and "display form"). It's easy enough to
> write a custom search-engine in C for readable data like XML or PHP
> arrays, not so easy to try to read MySQL data files directly!

I'm not sure I understand this last part. You should never ever be reading MySQL database files directly, that's what SQL is for.

I think the best advice I can offer here would be to get your hands on a book on SQL and start studying up on it. Understanding table joins and how relational databases work is an essential skill for web development, and it will help you build larger sites.

Date: 2008-02-28 08:51 am (UTC)
From: [identity profile] xolo.livejournal.com
Oddly enough, there is no post from xolo about his Christmas celebrations.

It's coming - fret not. :)

Profile

pyesetz: (Default)
Pyesetz/Песец

August 2025

S M T W T F S
      12
3456789
1011 1213141516
17181920212223
24252627282930
31      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 18th, 2026 10:01 am
Powered by Dreamwidth Studios