Bumps part IV – the deflation of significance February 2, 2007Posted by dorigo in computers, personal, physics, politics, science.
In the previous part of this post, I started discussing how to perform a statistical study of the mass spectrum obtained by the Frascati group, to determine how unlikely it was that a statistical fluctuation of background events would produce a gaussian bump somewhere in the spectrum, of size equal or larger to that observed. Now I will describe the whole study in some detail, and draw the necessary conclusions.
So let me first correct some approximations I made in the former posts, where I was citing by heart – I now have the original CDF note in front of me. The number of events in the histogram (shown again on the left) is 53,242 (not 52000). And the signal whose significance is being questioned, the bump at 7.2 GeV, is returned by the unbinned likelihood fit to be 249.7+-60.9 events (not 200+-40 as I wrote previously).
Incidentally, we can compute the ratio between signal size and error, ps= 249.7/60.9=4.1 sigma, and give it a name: pseudo-significance. A signal with a real significance of 4.1 sigma is usually a quite serious business, since it happens by chance (i.e., due to a statistical fluctuation of the background) only once every twenty-five thousand times. But we said pseudo-significance, not significance…
The procedure by which I performed my fishing expedition, as I briefly outlined in the former post, is quite simple at first sight:
- construct a 53,242 event distribution (a pseudo-experiment) by randomly picking mass values from the same functional form -let us call it B(m) in the following – which fits the background shape in the real data, as shown in the figure above. The random picking produces a histogram quite similar to the originating function. The histogram contains no signal by construction, and displays the typical bin-to-bin fluctuations of any similarly sized and distributed sample.
- fit the distribution with the sum of B(m) and a gaussian signal shape G(m) with a width constrained to be close to 38 MeV, which is the expected resolution of a dimuon resonance.
- store the fit parameters: goodness-of-fit, size of gaussian signal and error, mass of gaussian signal and error.
- go back to step 1, and repeat a few thousand times.
- prepare a martini, and leisurely go back to the computer once the magic mixture has had its effect, hoping that a result appears.
Ok, that is the procedure… However, the devil is in the details, as we will see.
You might have gotten confused by the fact that a distribution known to be generated from B(m) is fit by assuming that it is modeled by a sum B(m)+G(m), which offers three additional degrees of freedom. But that is precisely the point. One wants to see what use the fitter does of the three additional degrees of freedom (one of which is constrained, as we have already noted). If the three-steps 1.2.3. above are repeated a sufficient number of times, and if step 5. is not repeated too many times, the experimenter can determine how probable it is that a large signal is fit by sheer chance.
The figure above shows the result of a first Toy Monte Carlo run of 1000 pseudoexperiments. No constraint is posed on the positivity of the number of signal events (plotted on the abscissa) found by the fitter. Does the graph take you aback ? Were you expecting to be looking at a gaussian distribution, centered at zero ?
The fact that there are practically no pseudoexperiments which return zero fitted signal events is due to the fitter’s ability to use at its best the two free parameters of the gaussian, in order to obtain a better agreement between the fitting function and the pseudo-data distribution. The gaussian disturbance G(m) is used at the mass value where the largest fluctuation away from the background shape arises. A fluctuation is certain to happen somewhere -actually everywhere- and the fit just picks the largest.
However, fits are usually lazy, and they will not scan the full parameter space (in particular, the gaussian peak position) if they are not forced to do that. So a fit with a random input value for the gaussian peak position will move a little bit, until it finds some fluctuation that benefits from using the gaussian parameters.
The above means we have to modify a little our procedure. And the change comes handy. In fact, it is easy to realize that have not yet inserted a tiny but crucial detail in our machinery: the behavior of the experimentalist. If you are examining a spectrum in search for a signal, you will give the fitter as a starting value for the signal mass parameter the point where you seem to see the largest upward fluctuation. If you do that, the fitter will take the suggestion, and most likely will stick to that fluctuation, and forget about negative ones occurring away from that mass value.
The experimentalists’ hand can be simulated by a simple pre-scanning of the pseudo-data distribution, in search of the 80-MeV window which shows the largest upward fluctuation from the originating background function. By giving to the fitter the center of the window as a starting value will save it the burden of scanning the full mass range – something it does not really want to do.
Inserting the mass value corresponding to that point in the pseudo-experiment fitting procedure produces the results shown on the left.
This time, only positive fluctuations are fit. And they show, on average, about 126 events in the signal, with a tail that dies out for larger signals but still exists up to 250 events.
Even more informative is computing the pseudo-significance of the fitted signal, as shown in the second plot on the left. And we thus learn two things. One is that on average, we have to expect a signal with a pseudo-significance of 2.1 sigma (does that number sound familiar?), just from statistical fluctuations. The other is that a pseudo-significance of 4.1 sigma or larger did happen a few times in the 6032 trials we performed.
The figure above shows the correlation between fitted mass and fitted number of signal events (top left) and pseudo-significance (bottom left). Also shown is a fit of the average of these quantities, showing a mass dependence of the signal size (top right), but no mass dependence of signal pseudo-significance, as it should (bottom right). And we can finally count six times when a result of at least 4.1 sigma was obtained in the 6032 trials pictured: once in a thousand.
We are now in the position of evaluating the real significance of a 4.1-sigma bump in the dimuon mass spectrum. Not one in 25,000, but one in 1,000, which -translated back in statistical jargon- is s=3.27 sigma.
3.27 sigma. Once in a thousand times. Still, it might really be a signal after all. So let me go back to CDF in 1999, when I found that result and presented it at my oversite committee for a discussion.
[To be continued… Ok, last time I lied, but this time I am honest: the next one will really be the last part of this long post!]