More speculations on non-175 GeV single top from D0 data December 13, 2006Posted by dorigo in internet, physics, science.
Minutes after posting about D0’s recent evidence of single top production (to see it scroll down two posts, too lazy to link it here!) I got a message from Tony Smith (http://www.valdostamuseum.org/hamsmith/) containing many questions on the D0 analysis and their decision tree discriminant. I answered his questions, and then thought they might be of interest to two or three of the many of you who visit this blog erratically… So here is an amended version of the mail exchange.
as usual, you’re welcome, and as usual my answers have a fair chance of
being only partial answers to your questions. However I will try to do
> It may be that my questions are too naive to be useful
> because I don’t have much intuition about what DT means physically,
> so please feel free to tell me if that is the case,
> and in that case just ignore the questions asked below in this message.
> On the other hand, if you think that the questions might be useful,
> feel free to post this message including images on your blog entry.
I don’t know D0’s decision tree well enough myself, but I know the list
of variables which are fed in the trees, from slide 24 of the talk [see link in previous post]. There, you can see that they put in the “best” top mass as a kinematical selection variable. Moreover, many other variables which are directly correlated
with the top mass itself are fed into the DT. It is a perfectly legal thing to do, but once you do it, you have to be careful to interpret the results. In particular, a single top production process with a mass different from that with which you built your trees (your “signal”) will be treated as a background if the mass difference is large enough to make the branches split regular top and low-mass top differently.
What would tell us if that is really the case would be the relative weight that the final trees give to each of the variables. If the top mass is one of the vars which is given most weight in determining how to classify the event, then any top signal with mass significantly different from 175 GeV would be washed out.
Be careful here, “weight” is not a very well defined quantity here. Some decision tree algorithms have a built-in way to determine a posteriori (i.e. when the trees are built) what weight did a variable have in selecting signal from backgrounds. Others don’t. I would not be surprised if, by asking D0 what weight does the “best” top mass have in their DT, you got a perplexed look in return, or worse, a layman explaination that the DT is not a neural network. But they might also answer with a number straight away 🙂
In any event, I have the answer myself. If you look at the plots, they speak to you. [refer to plots below – only two of the three in the presentation are shown]. The three distributions at DT<0.3, intermediate, and DT>0.6 are VERY different in the “best” top mass. AND, the high-DT data have a perfectly coincident distribution for all backgrounds and for single top. That is to say, that variable has been totally “squeezed” for its discriminating power by the classifier. In other words, what one can gather from that plot is only the relative normalization of the expected contributions to the data points, since shapes will be coincident.
A point of relevance: the relative normalization of the various colors tells you indeed that the high-DT data favors the SM single top with respect to backgrounds, as it should. But it does so based on the top mass itself, and therefore that variable is no longer a very good one to display the final result! In fact, one would prefer to keep the most discriminant variable aside, and train one’s classifier with the others,
being careful to avoid variables that are correlated with the most powerful one: that way, one would retain discriminating power in the best variable after a cut on the classifier’s output. That is the strategy adopted for higgs searches at low mass in CDF, where the higgs mass is left aside, being very discriminant by itself.
So, to summarize:
– ask a D0 person about the weight of the DT to the top mass, but be
aware he could start telling you things you don’t want to hear about.
– maybe better, ask what are the inputs to the matrix element method.
if they are the same as the DT method (I suspect so since there is
no mention of those in the talk), your fancied low-mass top is still
– ask them if they would be willing to make the exercise of
attempting to set a limit to a SM-like Single top at 145 GeV.
> Looking at the image showing Tquark mass
> for DT less than 0.3 it seems to me that the high data points for singleT events
> are in the bins for 100-125 GeV and 150-175 GeV.
> However, I guess that low DT might mean that not many singleT events
> are expected, because the low DT histogram shows very little of the blue
> or cyan colors that correspond to expectation of singleT events,
> so maybe the low DT data is not very significant ?
Not necessarily. Low DT means low probability of a 175 GeV top, given a lot
of final state quadrimomenta AND a three-object mass are fed in the tree. So a lower mass top quark might get a low grade and end up there. By the way, have you noticed the tell-tale dip at 175 of the W+jets background (the green stuff) ? That is the sign that events with that mass are preferentially high-DT ones, if there are no more striking characteristics telling them apart from the Single top hypothesis – for
instance, ttbar does not get such a void at 175 because there are more useful variables to discriminate it from single top, and it clusters at 175 GeV anyway…
> Is there some physical reason that low DT sees events in 150-175 GeV,
> while the higher DT sees a deficiency of events in 150-175 GeV ?
Not necessarily physical – statistical probably. If systematic, then maybe it is connected to their way of training trees with so many variables correlated to each other. Usually, decision trees may get “overtrained” in such circumstances, and a way to avoid that is to do a random sampling of the variables used at each branching, and grow a huge number of trees rather than a single one, then asking trees to “vote” for a hypothesis. The random forests algorithm is such a delicious thing… I have a post about it (https://dorigo.wordpress.com/2006/03/03/random-forests/) which links to a informative site on that particular algorithm if you want more information.
> Is it reasonable to expect that more data at both CDF and D0 will
> answer these questions ?
I think more data always helps, provided you are willing to let a good hypothesis go if the data disprove it. But it is good to be stubborn for a while longer, especially since nobody did really a search focused on low-mass single tops…