Three Years of Hell to Become the Devil: Can Crescat Have Comments?

Due to a momentary glitch on Jeremy Blachman's part, the eternal debate on whether Crescat Sententia, and by extension the rest of the legal blogosphere, should have comment features has once again erupted. Most of you know that I like my comment feature, and it's not going anywhere. But I'd like to add a slightly less-addressed term to the Crescat debate: can it have comments at all?

A lot of this is fairly technical blogging gibberish, but if you're a reader interested in how some of these things work on the back end of a blog, you may find it interesting.

First, a bit of full-disclosure: one of the things I like about blogging is how it gives me the chance to play with tech. If you look around the law-school blogosphere, every now and then you'll run across some bit of my handiwork, something that I've been asked to fix, implement, or design. If you look real close, despite Mr. Baude's occasional rivalry, you'll note that I have a small thank you for helping to keep their site running. This is, by the way, greater praise than actually deserved for the small amount of work I've done, but I do have some knowledge about the back end of Crescat that others may not.

The key thing about Crescat, knowledge available to anyone, is that it's big. As of today, the homepage is a relatively bloated 171K, and that's after they've made several efforts to trim down the site size. (By way of reference, my homepage is 58K as of this morning, although it often hits 80K. As mentioned below, site bloat is the least of my bandwidth problems.) Several times the site's gone down because they've run out of bandwidth, resulting in a need to purchase more. Of course, this isn't a bad thing: Crescat has an awful lot of interesting information, a wealth of readers, and thus deserves every bit of its size. But it points to an item that few bloggers--especially the successful ones--ever really think about: site efficiency. [1]

Site Efficiency: The TYoH Problem
Most bloggers who host their own implementation of MoveableType (i.e. they don't use Blogger or Typepad) use close to the standard set of templates, modified slightly by stylesheets. The site architecture--homepage, archive pages, comments pages--remains fairly consistent, varying only with whether they've turned on comments and trackbacks. They also use the standard MT rebuild script, which tells the program what pages need to be rebuilt every time a change is made to the site. This causes a problem because whenever a comment or trackback ping is made to a site, large amounts of the site must be rebuilt. If this can't be done within a reasonable amount of time, a lot of things can go wrong.

Take Three Years of Hell. Many of my readers have noticed a lot of double-posted comments on the site. This results because the comment script will time-out after a reader has submitted a comment, and they hit 'refresh'--which then reposts the comment. Why does the script timeout? Because every time you leave a comment, the rebuild script has to build my homepage, and the rebuild script isn't that smart. Theoretically, it could just increment the number of comments for a given entry, leaving most of the page intact. In fact, it just rebuilds the whole page.

The trouble is that I use MT-RSS to populate my blogroll. This means that every time you make a comment, my site checks about a dozen other sites to see if they've made updates (at least if there's been a comment in the last hour). This takes a good amount of time, even if all the other sites are functioning properly, which they often aren't. End result? A crash, and then a double post. [2]

OK, But What Does That Have To Do With Crescat?
Crescat, of course, doesn't use MT-RSS, but it has similar structural problems, mostly due to its sheer massive size. For instance, take a look at its other law category. This page is around 870K (as of today)--almost a whole megabyte of legal monologue. Every time another entry is added to this category, that page has to be rebuilt.

Now imagine what happens if Mr. Baude writes a post to this category that attracts a lot of comments. Every time one of his readers adds their little bit of opinion, not only is 150K+ of homepage rebuilt, but nearly a megabyte of category will end up rebuilt, all in order to increment the number of comments entry by one. Ditto for the weekly archives.

This already happens for Trackback entries: when I (and presumably other bloggers) attempt to ping Crescat for trackbacks, we frequently get timeout errors. Once you know what the error is, you know to ignore it. (It still causes problems because MT will attempt to ping Crescat again if you make any changes to the entry, which results in double-pings over at Crescat.) But because comments are much easier and more frequent, the problem would only be compounded. And depending upon how Crescat's host calculates bandwidth or server usage, they may hit their limits again.

So Can Crescat Never Have Comments?
The flip answer to that question, of course, is it depends on how long Mr. Baude keeps breathing. But even assuming a sudden Damascene insight on his part, a blog the size of Crescat would have to think very carefully about comment implementation.

One option would be only to allow access to comments off the front page, and maybe the weekly archives. Both of these pages--unlike category archives--will be limited to a relatively fixed size, because they only print pages from a specified date range. I'll be honest: I don't know if the rebuild script would have to be modified, because I don't know if it's intelligent enough to know that it needn't rebuild category archives if there are no comments in them, but there's a number of 'smarter' rebuild scripts out there that could solve this problem.

Crescat could also take advantage of the fact that it has no individual entry archives. First of all, it doesn't have to rebuild those whenever there's a comment. But more importantly, it means that every reading of a comment would be in a pop-up window. This dramatically cuts down on what has to be rebuilt.

Finally, Crescat could generate many of its fixed features through PHP include files. (See note 2 below for how I need to implement a similar feature on here.) The header, right navigation, and blogroll could be set to static files. This is an efficiency tradeoff, of course: every time a visitor views their homepage, Crescat's server would be assembling little bits of code. However, it cuts down remarkably on how long it takes MT to rebuild a page, because it reduces the size of the templates dramatically.

To illustrate, look at the Crescat blogroll of Chicago blogs: simply that section of the site is 8.2 kilobytes of data, which would have to be rebuilt with every comment. As a PHP include, the relevant template part merely becomes:
<?php require 'chicagoblogroll.php'?>

Doctor, Heal Thyself
As I mentioned, there's nothing I'm saying about Crescat that doesn't go twice for this site: there's a lot of bibs and bobs on my pages that need fixing, updating, or just honest-to-god recoding. But thankfully, I don't have to worry that much about bandwidth. First of all, I have nowhere near Crescat's readership. Even if I did, I have more bandwidth than I could ever hope to get through and more server power than I need, because of a particularly sweet hosting arrangement. (I host multiple sites, none of which get as much traffic as TYoH, but my bandwidth is allocated collectively.) But in any event, as those at Crescat continue their debate on whether to engage their collective multitude, it's worth considering that implementing comments is going to take quite a lot of technical work, and considerable thought, if the ship is not to founder against the shoals of technical limitation. [3]

UPDATE: One notices that at the end of Will's post he suggests that bloggers sign up for their own website at Blogger.com. Dear god, no. Look, if you're going to do this, at least go to Typepad or Blog City. They at least give you RSS feeds instead of ATOM, and don't have the reliability problems. (Well, at least not as bad.)

FURTHER UPDATE: Yes, the first comment on this entry points out something I'd neglected to mention--that Crescat could just use a comment service like Haloscan and avoid having to put the comments on its server at all. I'm not that fond of Haloscan--several Blogspot blogs that use it seem to have comments disappear on a regular basis--but it's another possibility. Does anyone know about Haloscan's scalability, however? I've not seen it implemented on any site with the size and traffic of Crescat.

[1]: Of course, bloggers aren't the only ones who put too little emphasis on efficiency, load balancing, and making sure a website can run under strain. A case in point would be Columbia's website for the Early Interview Program, which collapsed under too great a workload when the whole class tried to use it at once. Much of this would have been avoided by a few architectural changes.

[2]: I know how to fix this, and of course will do so when I have time. (This means approximately never.) For those with similar problems, it's pretty easy to solve. The MT-RSS feed list should be generated automatically every hour on the server, creating a new html file, say feeds.html. That file can then be included in the homepage using a PHP include function. This would both ensure that the feed list is updated hourly whether there is a comment or not, and cut down dramatically on rebuild time. As I said, easy, I just haven't done it yet.

[3]: As an aside, this entire techical limitation matter has gotten me thinking about what might become a considerable blog annoyance at some point: automated denial of service attacks. I'm nowhere near technically inclined enough to design something like this, but imagine an email virus sent trackback pings instead of emails. The pings could come from a list (sent with the virus) of 100 popular blogs, and be set to send a ping from their most recent entry to the target blog. Because the rebuilding of a blog takes so much more server time than just an HTTP request, this might be a good way to disable a blog. Given that the pings would be coming from different IP addresses, and would indeed seem to come from different sources, blocking the attack might prove technically challenging.

Again, I'm not sure I've got the technology completely right, and of course I don't have time to go off and learn enough to program such a 'virus.' But it strikes me that the vulnerability is out there. (If I'm wrong and there's a reason this is technically unfeasible, tell me: I'd really like to know.)

Comments

Couldn't Crescat also just use something like Haloscan and take the comments off their server entirely?

Posted by: Daniel | August 12, 2004 3:25 PM

Daniel: Probably--I'll admit, though, that I'm not entirely certain how Haloscan works, or whether it's appropriate for a site that gets Crescat's traffic.

Posted by: A. Rickey | August 12, 2004 3:28 PM

Surely the TrackBack ping contains data about the actual IP of origin, not simply the IP of the blog allegedly sending the ping. Hence, although the DoS could slow the machine, it ought to be blockable (unless the attackers take the next step and make it a DDoS, but that does require additional sophistication). Also, I have to say I'm wondering what sort of ISP Crescat is with that they should have serious bandwidth problems with what is (isn't it?) almost entirely text content and should labor so sadly over a TrackBack ping. Bandwidth is a heck of a lot cheaper than it used to be.

Posted by: Sarah | August 12, 2004 6:43 PM

Sarah: Again, the idea would be to send the required code out as a virus, so that infected computers would send the ping. These would each have different IP addresses. I'm sure there's some way of dealing with this, but I couldn't tell you what it is. I'd suspect, though, that what really keeps this from happening is a lack of overlap between those who (a) know how to program such and thing, and (b) really want to annoy a blogger.

Posted by: A. Rickey | August 12, 2004 8:23 PM

But then you're just talking about a garden-variety DDoS. I guess it would be somewhat more efficient in wasting resources than just flooding the server with HTTP POST requests that don't trigger a rebuild would be, but at the scale you'd want to seriously slow down the server, I'm not sure that represents a massive gain in efficiency. More importantly, it would be a heck of a lot easier to filter/drop incoming TrackBack pings (which are just HTTP POST requests to a specific URL for each post) than it is to cope with a more general attack. So it's the same vulnerability that any HTTP server has, except that a target can choose to turn off TB with only minor loss of functionality, but it requires considerably more effort/loss of functionality to filter or limit incoming HTTP traffic to deal with blanket attacks. People being annoying, someone may well write some such virus some day, but although I'm not an expert on network architecture, I don't think it would represent a particular advance in troublemaking.

Posted by: Sarah | August 13, 2004 12:10 PM

Sarah: Given that rebuilds for a site like Crescat are quite a lot more pernicious than repeated HTTP POST requests--they take a lot more server time--that it wouldn't take a great deal of distribution to take things down. Of course, you could just turn off trackbacks altogether, but that reduces a blogs functionality. As I said, a bit beyond my experience, and I'm sure it's not a big improvement on DDoS, but I wonder if it won't pop up some day.

Posted by: A. Rickey | August 13, 2004 1:36 PM

Yes, Haloscan would work just fine. All it does is fire up a pop-up window displaying comments hosted on the Haloscan server. Its entirely independent of the hosting blog. Which explains why you sometimes see people who's comments are unavailable due to server failure, even though the rest of the blog is fine.

Posted by: Martin | August 13, 2004 6:03 PM

Eh... well, that makes an entire long technical article on potential solutions useless. ;)

Posted by: A. Rickey | August 13, 2004 7:08 PM

The question of whether something like Haloscan is useful also depends a little on what role you think comments play. If your blog's comments are ephemeral things along the lines of "Me too !" or "See you in Con Law tomorrow ?" then it does the job nicely. When I'm blogging, though, I regard some of the comments as almost as valuable as the entry itself. As such, issues like archiving become relevant and I don't really want the relevant information being stored in some random 3rd party location over which I have no control. The real problem here is with the sloppily-written nature of some of this blog software. As both you and Sarah imply, it ought to be possible to efficiently update a small set of text-only pages in response to a comment. (Also: blogs shouldn't need database backends. But I'll say no more on that topic or it'll turn into a largely irrelevant rant about over-engineered software !)

Posted by: Bateleur | August 14, 2004 2:40 AM

Another limitation of Haloscan, unless they've altered it recently, is the 1,000 character limit on comments. I have every belief this would frustrate a great many of Crescat's readers...

Posted by: A. Rickey | August 14, 2004 9:43 AM