Black Tuesday


Yesterday, Neil Patrick Harris retweeted David Blaine‘s funny observation that "one of these things is not like the others."pYwBM

My AOL data (see #836. How to be a sex goddess) was a little thin on "why is he" queries, but a broader "why is" search didn’t fail to disappoint. Here are a few; the full list in alphabetical order (not safe for work, as you might guess) is after the jump.

  • why is the earth so important
  • why is renaissance art emotional
  • why is pi irrational
  • why is a frog difficult to hold
  • why is there dirt in my air
  • why is the tympanum located in the abdomen of the grasshopper
  • why is scary movie2 rated r

(more…)

Leave a Reply

Early this morning, Wikileaks began posting alphanumeric pager messages from four carriers (Arch, Metrocall, Skytel, and Weblink_B) that were intercepted during a 24-hour period beginning early on September 11, 2001. Alphanumeric pager messages are unencrypted, and, like communications over a public 802.11 wireless network, they’re skimmable with the right (and not exotic) software and hardware.

  • “Due to today’s tragic events, it makes sense to cut back wherever feasible on payroll. Expect a very light business day. Please call all stores and review payroll issues”
  • “RING ALL CHICAGO AIPORTS AND EVERY MAJOR BUILDING DOWNTOWN. BUSH IS DOING A SPEECH.  THIS IS SERIOUS POOH..”
  • “Holy crap, are you watching the news.”
  • “I hope you have gone home by now. The BoA tower and space needle here are closed. I suspect tall buildings across the country will be closed. Take care my love.-cb”

This might be the most interesting public data mine since the AOL breach. The total volume is far less, but unlike the AOL data, this data hasn’t been anonymized. There are full names, phone numbers, and other identifying information in the mix.

Leave a Reply

Almost every semester, I use the AOL Breach data as a point of departure for something in at least one of my classes. The data is fascinating. Most data is fascinating, but this data is particularly so: at once shocking, funny, creepy, poignant, sad, frightening, noble, ignoble, shrewd, and lewd. It’s also rich in the way data can be rich. It’s completeness—for a sample of several thousand AOL accounts, it includes the complete account search history during March, April, and May of 2006—which includes timestamped search strings and the result rank and destination of clicks-through, makes it ripe for discovering all sorts of patterns of human thought and behavior.

It’s AOL data week in one of my classes now. This morning, I proposed several nontrivial questions about the data that could be answered with SQL queries. We looked at the results and discussed what they might say about the unwitting study subjects. Then I asked my students to suggest some questions of their own. What are the typical time-of-day and day-of-week patterns of an individual AOL customer’s searches? Are there identifiable differences in the patterns (and by extension in the sleep, social, and perhaps employment or school behavior) of people whose searches included, say, “britney”? For what kinds of searches do users most often click through several pages of results? And so on.

One of my students suggested an excellent simple question. What are the most common searches of the form “how to …”? Out of millions of queries in the AOL data, there were many thousands of “how to … ?” searches. The most frequent was “how to tie a tie,” requested 92 times by a total of 47 distinct users. The rest of the top ten (in terms of most distinct users asking the question) were how to write a resume, gain weight, have sex, get pregnant, write a book, write a bibliography, start a business, lose weight, and make money, each sought by a dozen or more different people. AOL converted the queries to lower case and removed much of the punctuation, but they didn’t correct spelling. Consequently, how to masterbate and how to masturbate appear separately at ranks 49 and 51 respectively. The question would have nearly hit the top 10 without the misspellings.

Here’s a PDF file of the top 1000 “how to” queries submitted through AOL explorer by a sample of AOL users in the spring of 2006. You can probably guess that it’s not safe for work. Although there are no pictures, plenty of sex, drugs, and gambling is spelled out, and there are more than a few questions likely to offend in one way or another. Have a look.

2 Responses to “#836. How to be a sex goddess”

  1. Greg Everitt Says:

    Wow professor, this list is… people are interesting, is all I’m saying.

  2. Steve Kass » Why, why, why? Says:

    [...] AOL data (see #836. How to be a sex goddess) was a little thin on "why is he" queries, but a broader "why is" search [...]

Leave a Reply

This is huge. To call it one for the history books is an understatement. (Links)

Comments are closed.