November 2009


On Thursday, November 19, and Friday, November 20, the Dessoff Chamber Choir backed up Ray Davies live at Town Hall in New York in a set of songs from the Kinks Choral Collection.

You can see all the tenors in this photograph from a superb collection posted to SmugMug by Arnie Goodman of Bluestormmusic.com and Elmore Magazine:

Tenors, L to R: Jeff Lunden, Andrew Willett, Daniel O’Brien, Steve Brautigam, Erol Tamerman, Tansal Arnas, Douglas Riccardi, Steve Kass

And here’s a YouTube clip of us singing. Friday’s concert was recorded for radio broadcast in early December. Stay tuned for details.

It was a fantastic gig. Ray, the band, and the sound and stage crew were all as nice as could be, and David Temple, the choir director was brilliant. Also brilliant was Dessoff soprano Christine Hoffman, without whom we couldn’t have done this. She prepared us superbly in the weeks before the concerts. Best of all was the crowd, who were wildly appreciative of the choir both during and after the show.

Finally, Dessoff would be happy to meet fans, friends, or supporters at our upcoming CD Release party:

schnippers

Yesterday, the Italian postal service misprocessed a bunch of ATM and credit card transactions. Specifically, the virgola was shifted two places, appending two zeros to the transaction amount. There’s no telling exactly how this happened, but it wouldn’t surprise me if it had something—if not everything—to do with localization in one way or another. In Italy, a comma (virgola), not a period, precedes a number’s decimal part, but software might see things otherwise.

Some software interprets number strings according to the operating system localization (unless overridden). Other software ignores the OS localization. SQL Server’s CAST operator, for example, only accepts a period as the decimal separator, and it disregards commas in strings intended to represent numbers.

At least it does this as of 2005; previous versions followed a complicated set of rules in an attempt to disallow numbers that weren’t valid in the U.S., India, or China. In India (ones, thousands, lakhs, crore, thousand crore, lakhs crore, etc.), digit groups bounce between two and three digits, and 1,234,56,70,000.0 is a valid number. In China (yi1, wan4, yi4, wan4 yi4, etc.), it would be 123,4567,0000.0. Interpreting human-readable representations of numbers is no simple task. Explaining the issue isn’t much easier.

In all versions of SQL Server, this happens regardless of language or culture settings.

select cast('115,00' as money) as TooMuch;

TooMuch
---------------------
11500.00

[From Slashdot, noting ilsole24ore.com]

Early this morning, Wikileaks began posting alphanumeric pager messages from four carriers (Arch, Metrocall, Skytel, and Weblink_B) that were intercepted during a 24-hour period beginning early on September 11, 2001. Alphanumeric pager messages are unencrypted, and, like communications over a public 802.11 wireless network, they’re skimmable with the right (and not exotic) software and hardware.

  • “Due to today’s tragic events, it makes sense to cut back wherever feasible on payroll. Expect a very light business day. Please call all stores and review payroll issues”
  • “RING ALL CHICAGO AIPORTS AND EVERY MAJOR BUILDING DOWNTOWN. BUSH IS DOING A SPEECH.  THIS IS SERIOUS POOH..”
  • “Holy crap, are you watching the news.”
  • “I hope you have gone home by now. The BoA tower and space needle here are closed. I suspect tall buildings across the country will be closed. Take care my love.-cb”

This might be the most interesting public data mine since the AOL breach. The total volume is far less, but unlike the AOL data, this data hasn’t been anonymized. There are full names, phone numbers, and other identifying information in the mix.

If you read my last post, you know I’ve been looking at some very fishy survey data from Strategic Visions, LLC. The data seems to stink no matter how anyone looks at it, and mathematicians, statisticians, and programmers have been looking hard and every which way. Instead of throwing yet another heavy mathematical brick at the poor numbers, Let’s see how it stands up to a feather.

You don’t have to read many poll or survey results to be familiar with the phrase “totals may not equal 100% because of rounding.”

So guess what? The numbers in Strategic Visions’ results all add to exactly 100. Oops. That’s a huge red flag. Huge, like big enough to wrap a planet in.

Ok, I didn’t check all their polls. Just the most recent 73, which is how many I checked before stopping. For the record, I didn’t stop because poll #74 added to 99 or 101. That would have been statistical misconduct on my part. I didn’t look at poll #74 or any others, because I wanted to write this up. (If anyone checks further, let me know and I’ll post an update.)

If Strategic Visions were not lying (or being extremely sloppy in a systematic way, which is their only hope of explaining this—see below), the chance the 73 most recent polls would add to 100 is less than 1 in 2,000. (That assumes intentional rounding to minimize 99s and 101s. For any pre-determined rounding rule, however, we’re talking 1 in 10,000,000 or worse. Maybe a real pollster can fill me in on what’s industry practice among those who aren’t lying.)

One-in-two-thousand-ish stuff happens all the time, but believe me, I didn’t find this on a data dredging excursion. I noticed that the first few poll results I saw added to exactly 100, I formulated a plausible hypothesis based on all the evidence at hand, I carried out an experiment, and I calculated the p-value.* Small enough to be incriminating in my home court, and I tend to be a benefit-of-the-doubt guy.

Just two more things. First, the possibility of systematic sloppiness:

The consistent adding-to-100 could be the result of a systematic error, as opposed to cheating. Of the possible excuses, this is the one I suggest SV choose if they decide not to come clean. Logically it can’t be distinguished from lying, and they can attribute it to a whipping boy like the web site designer. (This excuse doesn’t defend against the mountain of mathematical bricks I mentioned earlier, however.) They can say that they made a regrettable decision that to avoid the appearance of error, they calculated one percentage in each survey from the other percentages, not from the survey results. I won’t believe it for a minute (though if they show me their programs or confirm that some commercial product makes this error, I’ll reconsider), but it might get them out of hot water.

And second and last, an example and a bit of the math behind my calculations:

Most survey results are short lists of whole number percentages that express fractions to the nearest whole percentage point. Suppose 600 likely voters were polled in a tight race between Tintin and Dora the Explorer. Tintin had the support of 287 people, Dora was close behind with 286, and the rest of the 600 people surveyed (that would be 27 of them) said they weren’t sure. To the nearest whole percentages, that’s 48% for Tintin, 48% for Dora, and 5% undecided. The sum of the rounded percentages is 101%, and that’s due to honest mathematics, not fraud.

Let me skip some really fun mathematics and tell you that for survey questions that have three answers, the percentages add to 100 most, but by no means all of the time. Exactly how often they don’t add to 100 depends on several factors, two of which matter much at all in this case: the number of people surveyed, and how numbers ending in .5 are rounded to whole numbers. Strategic Visions, LLC’s usual survey sample size is 800, and even if ending-in-.5 numbers are rounded differently in each poll to avoid a sum of 99 or 101 when possible, three-choice result percentages should still add to 99 or 101 at least one time out of ten.

* Not to be taken as evidence of a non-Bayesian persuasion on my part. The frequentist approach seemed to me pretty straightforward and justifiable here, that’s all.

Today’s clicking (especially from fivethirtyeight.com) led me to two strikingly similar declamatory reports about high school student’s knowledge of civics, complete with chart-laden survey results.

“Arizona schools are failing at [a] core academic mission,” concludes this Goldwater Institute policy brief.

“Oklahoma schools are failing at a core academic mission,” announces this Oklahoma Council of Public Affairs article.

When asked to name the first president of the United States, only 26.5% of the Arizona high school students surveyed answered correctly. Only 49.6% could correctly name the two major political parties in the United States. An even smaller percentage of Oklahoma high school students gave correct answers to these and other questions from the U.S. citizenship test study guide. None of the thousands of students surveyed in either state answered all ten questions correctly.

The shocking thing is that these are garbage studies. Made-up numbers, probably. The acme of vulpigeration. Evil. Makes me sick. (Glad I coined the word, though.)

No way these are real studies. Danny Tarlow over at This Number Crunching Life has taken a mathematical hammer to the Oklahoma “study” quite effectively. (The blatant similarity of the Arizona “study” blows away any shred of possibility that the Oklahoma study is legit. I’d love to see Danny’s face when he sees the Arizona report.)

What’s frightening is that this kind of snake oil has far too good a chance of surviving as fact (which it isn’t) and influencing public policy.

The guilty parties? The Goldwater Institute, which as you might guess is a conservative “think” tank. The OCPA, which describes itself as “the flagship of the conservative movement in Oklahoma.” Matthew Ladner, the author of both reports, who is vice president of research for the Goldwater Institute. And last but not least, Strategic Vision, LLC, which Ladner says “conducted” the studies. In my opinion, the word is concocted. Read about them yourself.

[Updated with correct business name: Strategic Vision, LLC.]

Need to run a one-sample t-test or z-test? Here’s a little calculator written in Excel to help you out.

HypTest

Need to calculate z-scores, percentiles, or scores based on a normal distribution? Here’s little calculator for that, too.

Normal