## Updated Higgs search from D0: WW final state, 1.7/fb December 2, 2007

Posted by dorigo in news, physics, science.

Just a few days ago D0 made available on their web server a new conference note describing their analysis of 1.7 inverse femtobarns of proton-antiproton collisions, collected since 2002 at the Fermilab Tevatron collider.

The analysis is an improvement over the former result, which was based on 1.1/fb. The additional 0.6/fb of data blended in comes only from the dimuon final state, however. If one had to take this datum at face value, one would expect an only minor improvement in their results, since the $\mu^+ \mu^-$ channel accounts for only 1/81 of possible WW final states, as opposed to the 3/81 of the other topologies included in the 1.1/fb (1/81 from $e^+e^-$ and 2/81 from $e^+ \mu^-$, $e^- \mu^+$): all in all, the increase would amount to 0.6/(1.1*4)=13%, with an expected improvement in the Higgs limit equal to $\sqrt 1.13$ or 6%.

That back-of-the-envelope estimate is wrong for two different reasons. One, muons are detected with much larger acceptance by the D0 detector, which makes their weight larger in the sum computed above. Two, the estimate neglects the continuous improvement of tools made possible by larger datasets and more time to think. The latter has always been the Tevatron experiments’ secret weapon, as is shown by the incredible precision they have been recently obtaining on the top quark mass (now known with 1% accuracy – a goal thought impossible to achieve with twice more data than that available now just six years ago), a result entirely due to a continuous refining of complex techniques.

But there is actually a bonus reason to be interested in this new result: because after obtaining a limit from the extended data, D0 includes the recently analyzed samples missing from the picture, obtaining their full 1.7/fb result for the WW final state!

So let us have a look at the works very briefly. The analysis consists in a selection of dilepton events with missing transverse energy – the signature of two leptonic W boson decays – which is optimized as a function of the Higgs mass. Great care is used to define the selection cuts, since their best value depends on $M_h$: for instance, at low mass of the Higgs boson (below 160 GeV, which is the threshold for resonant WW production) special care has to be used to maintain high efficiency. Then a neural network classifier – called “multi-layer perceptron”, MLP- is used to discriminate the signal from the sum of concurring backgrounds – the most important ones being continuum WW production and Drell-Yan production of lepton pairs.

The output of the MLP is studied for data before the application of all selection cuts in order to verify the understanding of the mixture of processes. In the figures below you can see that D0 does indeed an excellent job. Left to right, top to bottom; dielectron final state; electron-muon final state; dimuon final state, Run IIa 1.1/fb; dimuon final state, Run IIb 0.6/fb (the latest data). In all four plots, the Higgs is the irrelevant empty histogram at the bottom.

At the end of the game, the distribution of neural network output for the final candidates is interpreted with the CLs method using information of each bin rather than the integral of the distribution: this obviously improves the accuracy and the sensitivity. Below you can find the NN distributions and the number of events expected by D0 for different Higgs boson masses (remember, they correspond to different selections!).

In the table, for three reference Higgs mass values the number of events expected by SM processes for Higgs production and backgrounds are shown in the second and third line. The data appear to undershoot backgrounds a little, highlighting the absence of a Higgs contribution (but I warn you: the probability that among those 10 events lie as many as two 160 GeV Higgs decays is still a fat 2% if the Higgs is there)

In the end, a 95% confidence level limit is obtained on the cross section as a function of the Higgs mass, and from it a “times SM ratio” limit plot is extracted. The one shown below combines results from dielectron and electron-muon final states, so it is a genuine 1.7/fb result. In it, the curve shows the upper limit on the ratio of Higgs production rate over SM predictions. When the black curve will finally cross the line at unity, we will start to have a real exclusion of Higgs mass values. We are getting close!

After seeing this plot, which reaches a x2.4 SM value at 160 GeV, I am starting to be very curious to see the combination with CDF results (which stood at x1.9 SM at 160 GeV already last August). I think we will have to wait for winter conferences to get that plot, but I smell a x1.1 limit at 160 GeV: not yet any mass exclusion for winter 2008 (for the latest combination, yielding x1.4 SM, see here).

1. Andrea Giammanco - December 2, 2007

I don’t like those weird spikes in the NN output (bottom-right plot of the first figure).

2. Thomas Larsson - December 3, 2007

The observed limit is far larger than the expected one at 120 GeV. Does this mean anything?

3. dorigo - December 3, 2007

Hi Andrea,

yes, the NN they use seems to have some peculiar behavior, which I think is under control anyway. Its output is not the usual distribution from 0 to 1 with a spike at 0 for background and a spike at 1 for signal, but it seems to be doing a reasonable job anyway… The spikes are due, according to the paper, to “regions of phase space which are cut out for the training by the use of kinematical pre-cuts”. In fact, they have different NN configurations (different inputs and optimizations) as the cut level and test higgs mass vary….

Hi Thomas,

yes, the discrepancy with expectations is an effect due to the fact that at 120 GeV they observe 31 events while they expect only 20.8+-1.7 events (with an expectation for Mh=120 of 0.32 events from H->WW). If I do the simple math of extracting a xs from those numbers, I get (10.2+-5.5)/0.32=(32+-17)xSM, which means that a 95% CL stands at about x60SM (I’m making these computations by heart, so bear with me for inaccuracy). Larger, but not too different from what they get by taking into account the actual distribution of the NN output, after folding in systematics (I read x47SM in the plot). That means their binning in NN output does help, as it should (the NN output is still a discriminant after all cuts, as is evident in the plots).

Cheers,
T.

4. Paolo - December 3, 2007

These NNs… Anyway, an observation I already wanted to post some time ago: for practitioners of statistics, saying NN is almost void of meaning, frankly, similar to “some multivariate non-linear method”. If we really want to talk about NN (I would certainly like that!), I think that at minimum we should report the size of the various data samples, general topology and number of layers, number of nodes in each layer, some generics about the training technique. Otherwise, really, we have no idea what kind of model we are talking about…

5. Paolo - December 3, 2007

Of course, thanks a lot for your post, Tommaso, that should be obvious 😉

6. dorigo - December 3, 2007

Sure… As for the NN, it goes without saying. These days, NNs are given for granted a bit. Papers using them in HEP only have a reference to some other paper where the overall architecture of the device is described, but of course the level of detail you would like to have is only available in internal documents, which are never disclosed. This is a malpractice of which we all pay the price: indeed, it is much too easy to overtrain a network, or to underestimate some effects that may produce a biased result.

Cheers,
T.

7. Andrea Giammanco - December 3, 2007

I would like to add that this comes from the fact that in our field we tend to just proceed by trial and error in this kind of stuff.
I mean that probably most of the authors of studies in HEP involving a NN (*) just took a very general class of networks (supervised feedforward networks in ~100% of the cases, MLPs in ~99% of the cases) and just trained several nets with different architectures, and then chose the one which by chance gave the best S/B separation within a reasonable training time.

(*) I’m not throwing the first stone, I always did exactly the same. But I know for sure that I’m not alone, and by far not the worst 😉

8. Kea - December 3, 2007

Hee, hee, hee. No fairy fields?

9. dorigo - December 3, 2007

Hi Andrea,

sure I know you’re one of the best. I think the fact we give little importance to details of how NNs work has its roots on the fact that we all spent a large fraction of our life checking the results the first NNs used in HEP were giving meaningful results. I remember one time, in the mid nineties, when it was all about “how many events did you use for the training?”, “What is the estimated uncertainty?”, “How did you evaluate that ?” “How can you be sure?”, and the like. Students working with NNs used to be grilled on skewers.

Now, nobody asks those questions any more – it’s not fun: there are no mistakes to spot – usually!

Cheers,
T.

Kea, no, no fairy fields yet. I start to think we could well end up with nothing in our hands 🙂

Cheers,
T.

10. Markk - December 3, 2007

Funny, after reading the article I was going to make almost an identicle post on the use of Neural Networks as Paulo. The way you are using it is kind of magic. Really you are generating feedback weighted polynomials (if you are using perceptron based stuff). The weights may or may not have much physical meaning inside, but I would think that the final equations ought to be publicized along with the data. Are they? The fact that you used a neural network feedback model to get the weights is really secondary when you use the final output.

We used to be able to get very interesting (i.e. goofy) results by messing around with the input space of NN models for doing data mining. I always felt there were numerically “stiff” areas all over the “classification spaces”. I am sure what you talk about in learning how to use them was learning where they actually were applicable.

We used to take the output function weights and try to understand them by varying them, and looking at what happened to the classification, and then messing around with the inputs to give a clue as to how to get the variation we were applying. Is that what you were doing with your models? I would think that this is one of the perfect uses for NN’s as you can kind of nail down the variables and you don’t have the classic “all the pictures with the tank in it were when it was shady” junk.

11. dorigo - December 4, 2007

Hi Markk,
searches in particle physics are about as good as it gets as a case study example of applications of neural networks. There is a unlimited amount of data one can use for training; data can be biased in all conceivable ways to study the NN response to limit cases; control samples can usually be defined in ways that allow meaningful cross checks; a vast number of orthogonal or correlated variables can be chosen as inputs in the discrimination of different physical processes.
The equations being publicized… that is too much disclosure. Physicists have an attitude with the presentation of their results: “they know better”, so they just give the final answer, and you can study how it was reached in their papers, but are prevented from doing the same (allegedly because of the impossibility of handling the very complex data without years of training and studies of the detector and the relative physics).
Despite all that, sure, there can be mistakes… I do not trust too much NN results, but as long as we set limits, i.e. negative results, things do not worry me too much. I doubt we will ever claim a signal of a new particle based on NN analysis alone.

Cheers,
T.

12. Andreas - December 4, 2007

Hi,
Thanks for this post! I enjoyed it and its predecessors. However, I have one question: Is there a reference of some form of a paper for the combined Tevatron limit you mention (1.9 x SM at 160 GeV)?
And why is the combined limit in this paper http://arxiv.org/abs/0710.5127
not as good (see last Figure)?
Grazie,

Andreas

13. Andreas - December 4, 2007

Oops, I meant 1.4 x SM for the combined limit…

14. dorigo - December 4, 2007

Hi Andreas,
no, as far as I know the combined limit (of which I talked about here) is only available in the web pages of CDF, see here. Of course, I think there are by now published proceedings mentioning it. I will see if I can dig some out for you.

Cheers,
T.

15. Andreas - December 4, 2007

Thanks, Tommaso! I knew the CDF note 8958 posted on the site you have linked above (which has 2.0 x SM in it) and the “1-1.9 fb-1 CDF Combination” site which gives 1.8 x SM. Besides, I saw the CDF website about 07 conferences ( http://www-cdf.fnal.gov/physics/S07CDFResults.html#HIGGS ) where the Fig. in the category “Updated CDF+D0 SM Higgs Combination” seems to have the 1.4 x SM you mention in it, but I can’t seem to find a proceeding or other note/paper on the details of this Fig. I guess the limit in that Fig. for m_H=160 GeV is dominated by H -> WW, right?
Thanks again,
Andreas

16. island - December 4, 2007

I start to think we could well end up with nothing in our hands

As the noose begins to tighten around the far-extended necks of the “powers that be”.

And there’ll be dancin’
dancin’ in the streets

🙂

17. chris - December 4, 2007

hi tomaso,

just wanted to say thanks for this excellent post.

18. dorigo - December 5, 2007

You’re very welcome, christian.

Cheers,
T.

19. Tony Smith - December 5, 2007

Tommaso, about NN details, you said
“… in the mid nineties, when it was all about “how many events did you use for the training?”, “What is the estimated uncertainty?”, “How did you evaluate that ?” “How can you be sure?”, and the like. Students working with NNs used to be grilled on skewers.
Now, nobody asks those questions any more – it’s not fun: there are no mistakes to spot – usually!

The equations being publicized… that is too much disclosure. Physicists have an attitude with the presentation of their results: “they know better”, so they just give the final answer, and you can study how it was reached in their papers, but are prevented from doing the same (allegedly because of the impossibility of handling the very complex data without years of training and studies of the detector and the relative physics). …”.

The current situation has at least two consequences:
1 – any “mistakes” (even if not “usually” existing) will not be found by independent analysis;
2 – some NN will be elevated to Holy Writ, to be followed absolutely no matter how wrong they may be.

The situation is not confined to physics.
See for example a 31 July 2007 web article by Katherine Burton and Richard Teitelbaum at
infoproc.blogspot.com/2007/07/algorithm-wars.html
that said in part:
“… Renaissance … founded by Simons in 1988, is a quantitative manager that uses mathematical and statistical models to buy and sell securities, options, futures, currencies and commodities. It oversees \$36.8 billion for clients …[and it]… sued Belopolsky and Volfbeyn in December 2003, accusing them of misappropriating Renaissance’s trade secrets by taking them to another firm, New York-based Millennium Partners …
Volfbeyn said that he was instructed by his superiors to devise a way to “defraud investors trading through the Portfolio System for Institutional Trading, or POSIT,” an electronic order-matching system operated by Investment Technology Group Inc. …
Volfbeyn told superiors at Renaissance that he believed the POSIT strategy violated securities laws and refused to build the algorithm, according to the court document. The project was reassigned to another employee and eventually Renaissance implemented the POSIT strategy, according to the document. …”.

In short, it seems to me that the use of trade secrecy to hide the way hedge funds function is very much like the use of authoritarian arrogance by physicists to hide the way NN code works,
and that in both cases the only real reason for hiding the facts is to protect the powerful.

Tony Smith

20. New D0 Higgs limit combination for 1.7/fb « A Quantum Diaries Survivor - December 18, 2007

[…] by dorigo in internet, news, physics, science. trackback Just a few days ago D0 released their new results from searches of the   decay process. That result has now been combined with all their other […]

21. The Higgs almost excluded at 160 GeV!! « A Quantum Diaries Survivor - February 3, 2008

[…] let me add to this note a word of self-praise. Two months ago I discussed the D0 limit on Higgs cross sections in the WW production mode, and I ventured to make the following prediction: After seeing this plot, […]

Sorry comments are closed for this entry