Steve Kass

Scoops

Archived Posts from this Category

25 Sep 2011 22:46

Heteroscedasticity in the Residuals?

The possible existence of heteroscedasticity is a major concern in the application of regression analysis, including the analysis of variance, because the presence of heteroscedasticity can invalidate statistical tests of significance that assume the effect and residual (error) variances are uncorrelated and normally distributed. —Wikipedia

Perhaps I’m overeager to use one of my favorite words, but the more I look at Figure 11 of The Neutrino Preprint, the more I think I see a hint of heteroscedasticity in the residuals. If present, it would support the possibility that the model used for the best fit analysis (a one-parameter family of time-shifted scaled copies of the summed proton waveform) was not appropriate. See my previous post for some background.

The figure above (which is the bottom half of Figure 11) shows the best fit of the complete summed proton waveform (red) vs. the observed neutrino counts (black), summarized using 150 nanosecond bins. For both extractions (left and right), the residuals of the fit (the distances from the red curve to each black dot) appear possibly heteroscedastic in two ways.

First, they seem to be slightly (negatively) correlated with the time scale — positive residuals are more likely towards the beginning of the pulse, negative residuals towards the end. Second, there may be a slight negative correlation of the variance of the residuals with the time scale as well. The residuals seem to become more consistent — vary less in either direction from zero — from left to right. [I didn’t pull out a ruler and calculate any real statistics.]

To be fair, there is little evidence of heteroscedastic residuals in Figure 12 (below), which shows a zoomed-in detail of the beginning and end of each extraction, summarized into 50 nanosecond bins. In all, only about a sixth of the waveform is shown at this resolution. (A data point appears to have been omitted from this figure; between the first two displayed bins in the the second extraction, there should probably be a black point to indicate that zero neutrinos were observed in that 50 ns interval.)

The authors report some tests of robustness; for example, they analyzed daytime and nighttime data separately and found no discrepancy. They also calculated and report a reduced chi-square statistic that indicates a good model fit. They may also have measured the heteroscedasticity of the residuals, but they don’t mention it.

They do say a fair bit about how they obtained the summed proton waveform (the red line) used for the fit, but so far I don’t see any indication that they considered the possibility of a systematic process occurring over the length of each proton pulse that caused the ratio of protons to observed neutrinos to vary.

Then again, I don’t understand every sentence in the paper that might be relevant, such as this one: “The way the PDF [the probability density functions for the proton waveform] are built automatically accounts for the beam conditions corresponding to the neutrino interactions detected by OPERA.” And I’m not a physicist or a statistician.

One Response to “Heteroscedasticity in the Residuals?”

Eric Jones Says:
November 18th, 2011 at 12:01 pm
Here’s an update on the original results, http://news.sciencemag.org/scienceinsider/2011/11/faster-than-light-neutrinos-opera.html, which appears to rule out the statistical argument (which I really liked).

24 Sep 2011 16:10

My $0.02 on the FTL Neutrino Thing

Posted by Steve under Science , Scoops , Statistics
[7] Comments

[I’ve posted a follow-up here: Heteroscedasticity in the Residuals?]

When applying statistics to find a “best fit” between your observation and reality, always ask yourself “best among what?”

The CERN result about faster-than-light neutrinos is based on a best fit. If the authors were too restrictive in their meaning of “among what,” they might have missed figuring out what really happened. And what might have really happened was that the neutrinos they detected had not traveled faster than light.

The data for this experiment was, as usual, a bunch of numbers. These numbers were precisely-measured (by portable atomic clocks and other very cool techniques) arrival times of neutrinos at a detector. The neutrinos were created by shooting a beam of protons into a long tube of graphite. This produced neutrinos, some of which were subsequently observed by a detector hundreds of miles away.

Over the course of a few years, the folks at CERN shot a total of about 100,000,000,000,000,000,000 protons into the tube; they observed about 15,000 neutrinos. The protons were fired in pulses, each pulse lasting about 10 microseconds.

A careful statistical analysis of the data, the authors report, indicates that the neutrinos traveled about 0.0025% faster than the speed of light. Whooooooosh! Furthermore, because the experiment looked at a lot of neutrinos and the results were consistent, the experiment indicates that in all likelihood the true speed of neutrinos was very close to 0.0025% faster than the speed of light, and it was almost without doubt at least faster.

If the experimental design and statistical analysis are correct (and the authors are aware they might not be, though they worked hard to make them correct), this is one of the great experiments of all time.

So far, I haven’t read much scrutiny of the statistical analysis pertaining to the question of “among what?” But Jon Butterworth of The Guardian raised one issue, and I have a similar one.

Look at the graph below, from the preprint.

The statistical analysis of the data was designed to measure how far to slide the red curve (the summed photon waveform) left or right so that the black data points (the neutron observation data) fit it most closely.

The experiment didn’t detect individual neutrinos at the beginning of the trip. The neutrons were produced by 10-microsecond proton bursts, and neutrinos were expected to appear in 10-microsecond bursts at the other end. The time between the bursts, then, should indicate how fast the individual neutrinos traveled.

To get the time between the bursts, slide the graphs back and forth until they align as closely as they can, and then compare the (atomic) clock times at the beginnings and ends of the bursts.

For this to give the right travel time, and more importantly, to be able to evaluate the statistical uncertainty, the researchers appear to have assumed that the shape of the proton burst upstream of the graphite rod exactly matched the shape of the neutrino burst at the detector (once adjusted for the fact that the detector sees about one neutrino for each 10 million billion or so protons in the initial burst).

Why should the shapes match exactly? If God jiggled the detector right when the neutrinos arrived, for example, the shapes might not match. More scientifically plausibly, though, at least to this somewhat-naïve-about-particle-physics mathematician, what if the protons at the beginning of the burst were more likely to create detectable neutrinos than those at the end of the burst? Maybe the graphite changes properties slightly during the burst. [Update: It does, but whether that might affect the result, I don’t know.] Or maybe the protons are less energetic at the end of the bursts because there’s more proton traffic.

The authors don’t tell us why they assume the shapes match exactly. There might be good theory and previous experimental results to support the assumption, but if so, it’s not mentioned in the paper. The authors do remark that a given “neutrino detected by OPERA” might have been produced by “any proton in the 10.5 microsecond extraction time.” But they don’t say “equally likely by any proton.”

If protons generated early in the burst were slightly more likely to yield detectable neutrinos, then the data points at the left of the figure should be scaled down and those at the left scaled up, if the observational data is expected to indicate the actual proton count across the burst.

If that’s the case, then the adjusted data might not have to be shifted quite so far to best match the red curve. And the calculated speed would be different.

Whether this would make enough of a difference to bring the speed below light-speed, I don’t know and can’t guess from what’s in the preprint. And of course, there may be good reasons for same-shape bursts to be a sound assumption.

[Disclaimer: I’m a mathematician, not a statistician or a physicist.]

7 Responses to “My $0.02 on the FTL Neutrino Thing”

Steve Kass » Heteroscedasticity in the Residuals? Says:
September 25th, 2011 at 10:46 pm
[…] family of time-shifted scaled copies of the summed proton waveform) was not appropriate. See my previous post for some […]
Joe Says:
September 27th, 2011 at 7:25 am
You kindof shoot yourself in the leg with your speculations.

You go on about how uncertainty about the neutrino creation process could have distorted the resulting measurements.

But if you look at the graph you posted it seems clear that there are multiple peaks within the graph that are shifted by exactly the same ammount as the whole graph.

The red line is a computer prediction based on neutrinos traveling -at- the speed of light.
Notice that the shape of the red graph pretty much has exactly the same shape as the data points, just shifted.
This means that the simulation used for the prediction has a very precise understanding of the neutrino generation process and what the resulting measurement amplitude series will be.
The only discrepancy is the detection time.

If what you say were true then the arriving data points would have had distorted rise and fall but would otherwise have its peaks match the predicted graph to at least fall on the speed of light instead of faster than the speed of light.

So based on that graph i think you are thinking in the wrong direction to find the flaw (if there is one).
Steve Says:
September 27th, 2011 at 9:35 am
You’ve missed my point.

“Pretty much exactly the same shape’ is not a statistical or mathematical statement. The data (black points) do not fit the red curve exactly when shifted. They come close, and among all possible horizontal shifts, 1048.5 ns gives the closest fit. But the six-sigma statistical claim assumes that the distribution from which the black data points were a random sample is a copy of the shifted red line and not any similar but different shape.

This assumption is not addressed in the paper. The shifted red line used for the statistics is the shape of the proton waveform hundreds of miles upstream of the detector at Gran Sasso. The data is not a random sample of protons from that waveform. The data is a sample (presumably random) of neutrinos hundreds of miles away, produced from the precisely-understood waveform of protons by several intermediate processes (including pion/kaon production when the proton beam strikes the graphite target and subsequent decay of the particles produced at the target into neutrinos later on). The arrival waveform clearly has a similar shape, but the authors give no theoretical or statistical evidence to suggest it must have an identical shape.

If the intermediate processes systematically change the shape of the proton waveform even slightly (as it becomes a pion/kaon waveform and then a neutrino waveform), the statistics reported are not valid.

In addition, the data in the paper is only a summary of the actual data into bins (150 ns wide for Figure 11, and 50 ns wide for Figure 12). The experimental result yields a neutrino speed only 60 ns faster than light-speed, so it’s impossible to “notice” the best fit to such high precision only from the paper’s graphs. In Figure 11, where the multiple peaks are visible, “exactly the same amount” can’t be determined to 60 ns accuracy. Even if the black data points, when shifted by 1048.5 ns, all lay exactly on the red line (and they do not at all), one cannot conclude that the actual data (not given in the paper, which summarizes it into bins) fits just as perfectly.
Philip Meadowcroft Says:
September 28th, 2011 at 4:23 am
Is the same true at the detection end? If the first detection in any way compromises the likelyhood of another detection in the same burst.

May be insignificant due to the low number detected per burst.
Gareth Williams Says:
September 28th, 2011 at 4:49 am
OK, so add an extra parameter. Scale the red line from 1 at the leading edge to a faction k at the trailing edge (to crudely model the hypothesis that the later protons, for whatever unknown reason, are less efficient at producing detectable neutrinos), and find what combination of translation and k produces the best fit.

If there is no such effect we should get the same speed as before and k=1. But if we get speed = c and k = 0.998 (say) then we have an indication where the problem is.

It would be interesting in any case to just try a few different constant values of k and see how sensitive the result is to that.

(It also occurs to me that k could arise from a problem with the proton detector, if the sensitivity changes very slightly from the beginning to the end of the pulse you would get the same effect).

This does not look too hard. I would do it myself but I am busy today [/bluff]
Steve Says:
September 28th, 2011 at 7:50 am
Philip: I think there was a similar question at the news conference given by OPERA, and it was answered to the satisfaction of the person who asked.

Gareth: Yes, absolutely. If the complete neutrino arrival data is posted, I might try this. But I would be happy to see you do it for me!
Gareth Williams Says:
October 27th, 2011 at 10:43 am
What you said, I think:

http://arxiv.org/PS_cache/arxiv/pdf/1110/1110.5275v1.pdf

26 Aug 2011 19:41

The Cereal Comma

Posted by Steve under Food , Friends , Scoops
Comment on this post

I’m a serial comma guy, and so is my good friend Andy. Unfortunately for Andy, serial commas are verboten at his workplace, and this requires him “to violate a fundamental law of that which is right and good.” (I might have said “right, good, and just.”)

Hoping to assuage his hardship, I whipped up a batch of cereal commas for him as a birthday gift. He’ll have to decide whether or not he can risk sneaking some into work.

Shown: eight cereal commas in various sizes. Four were made with Rice Krispies and Fruity Pebbles, and four were made with Rice Krispies, Cocoa Krispies, and Alpha Bits. Also shown are two pieces of the Ateco Plain Comma Cutter Set with which they were cut [full set below].

Please note that the Ateco cutters are backwards. Instead of cutting comma shapes, they cut reversed comma shapes. Although their rolled edges prevented me from using them upside-down without injury, it was not difficult to turn the treats over after cutting. The treat at center left in the photo is unturned.

26 Jan 2011 19:53

Poodles! It’s Poodles!

Posted by Steve under News , Scoops
Comment on this post

Janet Napolitano isn’t making the official announcement until tomorrow, but this is 2011, folks. There are no secrets any more.

DHS to end color-coded terror alert system.

It will be called the National Terror Advisory System. DHS Secretary Janet Napolitano will officially make the announcement tomorrow at a “State of America’s Homeland Security” speech at George Washington University.

Brilliant, Janet. Brilliant!

Scoops

Heteroscedasticity in the Residuals?

One Response to “Heteroscedasticity in the Residuals?”

Leave a Reply

My $0.02 on the FTL Neutrino Thing

7 Responses to “My $0.02 on the FTL Neutrino Thing”

Leave a Reply

The Cereal Comma

Leave a Reply

Poodles! It’s Poodles!

Leave a Reply

Categories

Monthly