The possible existence of heteroscedasticity is a major concern in the application of regression analysis, including the analysis of variance, because the presence of heteroscedasticity can invalidate statistical tests of significance that assume the effect and residual (error) variances are uncorrelated and normally distributed. —Wikipedia

Perhaps I’m overeager to use one of my favorite words, but the more I look at Figure 11 of The Neutrino Preprint, the more I think I see a hint of heteroscedasticity in the residuals. If present, it would support the possibility that the model used for the best fit analysis (a one-parameter family of time-shifted scaled copies of the summed proton waveform) was not appropriate. See my previous post for some background.

Screen shot 2011-09-25 at 22.35.44

The figure above (which is the bottom half of Figure 11) shows the best fit of the complete summed proton waveform (red) vs. the observed neutrino counts (black), summarized using 150 nanosecond bins. For both extractions (left and right), the residuals of the fit (the distances from the red curve to each black dot) appear possibly heteroscedastic in two ways.

First, they seem to be slightly (negatively) correlated with the time scale — positive residuals are more likely towards the beginning of the pulse, negative residuals towards the end. Second, there may be a slight negative correlation of the variance of the residuals with the time scale as well. The residuals seem to become more consistent — vary less in either direction from zero — from left to right. [I didn’t pull out a ruler and calculate any real statistics.]

To be fair, there is little evidence of heteroscedastic residuals in Figure 12 (below), which shows a zoomed-in detail of the beginning and end of each extraction, summarized into 50 nanosecond bins. In all, only about a sixth of the waveform is shown at this resolution. (A data point appears to have been omitted from this figure; between the first two displayed bins in the the second extraction, there should probably be a black point to indicate that zero neutrinos were observed in that 50 ns interval.)

Screen shot 2011-09-25 at 22.43.39

The authors report some tests of robustness; for example, they analyzed daytime and nighttime data separately and found no discrepancy. They also calculated and report a reduced chi-square statistic that indicates a good model fit. They may also have measured the heteroscedasticity of the residuals, but they don’t mention it.

They do say a fair bit about how they obtained the summed proton waveform (the red line) used for the fit, but so far I don’t see any indication that they considered the possibility of a systematic process occurring over the length of each proton pulse that caused the ratio of protons to observed neutrinos to vary.

Then again, I don’t understand every sentence in the paper that might be relevant, such as this one: “The way the PDF [the probability density functions for the proton waveform] are built automatically accounts for the beam conditions corresponding to the neutrino interactions detected by OPERA.” And I’m not a physicist or a statistician.