What is AG Gonzalez Thinking?
Amazingly, Bush may just have lost my vote. Not because of threats to civil liberties, but merely through his administration's ungentlemanly thuggishness with aggregate data and his Justice Department's obsession with Project: No Child Sees A Behind.
When the Attorney General subpeonas Google asking for massive amounts of data, I think it's fair to ask what he wants to do with it. Having read the relevant court document, I've only come away more confused. Let's look at what the government has asked for:
- A list of 1 million random URLs available for search in Google. (This down from a request for all URLs available through Google. The mind boggles at the size of that file.)
- All queries entered on Google's search engine over a one week period (originally one month)
Those are some big files. While I agree with Chris that the privacy concerns aren't that significant (they're not asking for IP addresses), it still seems like a ridiculous fishing expedition.
The legal arguments for turning over the data are fairly straightforward. AG Gonzalez's memo becomes an exercise in obfuscation, however, when it comes to how all these URLs are going to help his case. The data will allow the Government to "draw conclusions as to the prevalence of harmful-to-minors material on the portion of the internet available through search engines" (Motion at 8) or to "understand the behavior of web users" (Id. at 4). Apparently the AG needs massive data files to conclusively prove that (a) there's a lot of porn on the internet, and (b) people search for that porn. I simply can't believe that the ACLU wouldn't stipulate to those facts. (See UPDATE.)
Of course, one suspects those aren't the primary issue. This elaborate exercise in datamining is actually supposed to "measure the effectiveness of filtering technologies in screening [obscene material]." (Id.) But in the immortal words of Ogden Nash, "You can't get there from here," although I can see some stunningly bad ways to massage this data. For instance, you could have someone trawl through one million URLs and figure out how many were obscene sites. You could then run one week worth of searches and figure out how often those obscene sites appeared. (That's a pretty big task in itself.) Finally, you could measure whether nasty sites still turned up when you added filtering software, from which you'd then derive the "effectiveness" of the filters.
But this result is methodologically flawed. To be fair, one would have to account for which search strings were searching for porn in the first place, an inherently subjective matter. Searches for "breast," for instance, can have any meaning from the pornographic to the medical to the culinary. Further, one would have to assume that the filter is the only source of control withthat comes from filtering software. Most programs include simple add-ons that let parents see what has been browsed on the machine. The most effective "filter"? Simply telling the child, "I can track what you see, and if I find you've been visiting Playboy.com, I'll punish you once for breaking my rules on porn and a second time for not being able to find any better dirty material in all the great wide internet."
That, however, is the closest I can get to "proving" the effectiveness or otherwise of filters from the data the AG wants. The best I can see resulting from this subpeona are some spurious statistical arguments that will "show" that some mythical aggregate internet user will stumble upon pornography once every X number of days. Given that the government's civil liberties credentials aren't everything they could be right now, it would seem prudent for the AG to outline in detail exactly how he plans on using this data before throwing requests for data at one of the most-used (and possibly most-beloved) companies out on the Net.
Then again, I could be missing something. Comments on exactly how one measures the effectiveness of filtering software from these two massive data files (or privacy problems that I might have overlooked) are very welcome.
UPDATE: Above I say that I can't believe the ACLU isn't willing to stipulate to some very broad claims, a point which is flippant enough to obscure my argument. For clarity, I can see why the AG would want some relatively solid data on the prevelance of pornography online, but don't see why one has to subpeona search engines to get that data. Assuming the DoJ has the number-crunching resources necessary to process Google's records if it gets them, it must also be able to send out spiders to index portions of the net, or to run simulated searches based upon the most common search terms used. Certainly this could be handled without the ugly mallet of a subpeona and the thuggish aura it exudes.
Comments
Posted by: Karen | January 20, 2006 7:16 PM
Posted by: A. Rickey | January 21, 2006 4:29 AM
Posted by: Martin | January 21, 2006 5:50 AM
Posted by: A. Rickey | January 21, 2006 3:03 PM
Posted by: Len Cleavelin | January 22, 2006 1:31 PM
Posted by: martin | January 24, 2006 8:48 AM