Saving a good text from a few mistakes November 21, 2008Posted by dorigo in internet, personal, physics, science.
Tags: Lubos Motl
After spending some time with my family this evening, I found the courage to delve into Lubos Motl’s last post about the “ghost sample” cross-section issue. I must admit he wrote overall a good post -mainly the first part is good, at least-, which however contains quite a few inaccuracies -besides, of course, insisting on his original mistake. I think it is a good idea to spend a few lines pointing out the few mistakes of that text, which can be then used with profit.
Fine: so, 742/pb or 2100/pb?
For me, the interesting part of his post starts with the subtitle “Fine: so, 742/pb or 2100/pb?”, which comes after a few lines of text he could have avoided. The first mistake, unfortunately, comes right then at the first paragraph:
Of course, the total integrated luminosity, 2,100/pb (two thousand and one hundred inverse picobarns) must be used as the denominator, as Matt Strassler explains in detail […]
However, there follows a rather clear account of what luminosity is and other interesting information. At the end of the section, however, he falters again:
Now, there’s no doubt that the total integrated luminosity (of proton-antiproton beams) used to suggest the “lepton jets” in the recent CDF paper is 2,100/pb: see e.g. the second sentence of the abstract. If you want to keep things simple, the right denominator has always been 2,100/pb and there is nothing to talk about. But still, you may ask: why the hell Tommaso Dorigo is talking about 742/pb? Isn’t he supposed to know at least some basic things here?
The problem is, the CDF paper is not very clear. Lubos is totally correct: the abstract does quote 2100/pb. This in fact alleviates his guilt a bit, because he gets deceived.
The study first uses 742/pb, and only after page 28 an analysis of the larger dataset, 2100/pb which include the initial 742/pb, begins. The reason is that the smaller dataset, collected until 2005, did not withstand a complicated online selection called “prescale” which basically is enabled whenever the rate of proton-antiproton collisions is too high for data acquisition (which can save to disk no more than about 100 events per second).
Whenever the detector gets flooded with too high rates, prescaling factors are applied to specific triggers, such as the dimuon trigger which collected the 1400/pb used only in the second part of the study. The dimuon trigger until 2005 did not have a prescale, so it is much easier to use that dataset for cross sections and rates.
This is why the CDF paper uses 742 inverse picobarns of data until page 28, when kinematics is studied with more data (at that point, absolute rates are not important anymore, so CDF includes all the data in one single set).
Silicon vertex tracking
Then, a second subsection, titled “Silicon vertex tracking” starts. Here, Lubos falters again. He discusses the SVT trigger, which is not used to collect “ghost events” by the CDF study, nor by the former study of the correlated bb cross section. It is only used for some control samples, but he ignores this fact. It would have been better if he avoided discussing the SVT altogether, because it creates the conditions for another blunder:
“Now, only a subset of the events are picked by the strict SVT criteria: the jets in these events are said to be “b-tagged”. The precise percentage depends on how strict criteria the SVT adopts: it is partly a matter of conventions. In reality, about 24.4% of the events that excite the dimuon triggers also pass the strict SVT filter: this percentage is referred to as the “efficiency” of the (heavy flavor) QCD events. The silicon vertex tracker may also choose the events “loosely”; in that case, the efficiency jumps to 88% or so. However, if you assume that there is no new physics, pretty much all events in which the dimuon trigger “clicks” should be caused by heavy flavors – essentially by the bottom-antibottom initial states.”
Not even wrong! Lubos is confused. He confuses the SVT, which is an online trigger (not used by this analysis), with offline SVX requirements applied to the muon tracks used to select a sample where the composition is studied in detail. This is a minor mistake, although it shows just how much one can confuse matters by being careless.
Also wrong is that the SVT may select events loosely: again, it is offline selections that can do that: SVT has fixed thresholds, being an online algorithm implemented on hardware boards. But let’s not blame Lubos for not knowing the CDF detector.
More nagging is his other mistake above, also highlighted in red: by no means the simple selection of the dimuon trigger only selects bottom-antibottom! Indeed, that only accounts for 30% of the data or so. But there is an even more nagging mistake in the paragraph: he calls bottom-antibottom “initial states“, while those are FINAL states of the hard process. You have a negligible chance to find (anti)bottom quarks in the (anti)proton, so you only get them as the final product of the collision! Lubos, please use correct terminology if you want to have a chance to be taken seriously!
Unfortunately, inaccuracies pile up. Here is the very next paragraph:
“In these most special 24.4% events, bottom-antibottom pairs “almost certainly” appear at the very beginning. So at the very beginning, it looks like you just collided bottom-antibottom pairs instead of proton-antiproton pairs. If you now interpret the Tevatron as a machine where you effectively collide bottom-antibottom pairs, it has a smaller luminosity because only a small portion of the proton-antiproton collisions included protons and antiprotons that were “ready to make heavy flavor collisions”. Even though the remaining 75.6% dimuon events probably also contained bottom quarks, you discard the collisions as inconclusive.”
Amazingly, Lubos really means it: he thinks bottom-antibottom quark collisions happen at the Tevatron in numbers. Yes, he means it: “looks like you just collided bottom-antibottom pairs”. This is slightly embarassing. However, I must give Lubos a few points here for making a serious attempt at explaining things at a layman level. Let’s move on.
“You may define the corresponding fraction of all the events and normalize it in the same way as you would do with bottom-antibottom collisions. Assuming that the bottom quarks are there whenever the SVT says “Yes”, the integrated luminosity of this subset is just 742/pb, not 2,100/pb. The collisions up to this day that have passed the intermediate, loose SVX filter, give you the integrated luminosity of 1,426/pb or so.”
Again, not SVT triggering, but offline SVX cuts. Anyway: alas, Lubos, it really is that difficult, isn’t it ? This is very, very wrong, as a reader, Dan Riley, well explains in a thread here. HEP experimentalists do not do that: they do not assign integrated luminosity to subsets.
Integrated luminosity is a number which applies to a sample of data, and then, whatever cuts or further selections you make, that number remains. To make an example: you have 1000/pb of integrated luminosity, it corresponds to 10,000 events of some rare kind. The cross section of those events is of course 10,000/1000=10 pb. Now, imagine you select 5% of the data by requesting the presence of a high-Et jet. This sample has 500 events (5% of 10,000), but its integrated luminosity is still 1000/pb. Only, when you compute the cross section, you do not just do , but rather , where stands for the efficiency of the cut. One may say it is a convention (since still has units of integrated luminosity), but it in fact avoids the mistake Lubos gets into.
The data used for the studies mentioned in the paper correspond to 742/pb. All of the data! Both the subset of data selected with tight SVX cuts (143k events), or the subset of data making the ghost sample (153k events), or the total sample (743k events) which includes both subsets.
As I already mentioned, the CDF publication is not clear about this, since in the introduction it mentions the larger integrated luminosity used for later checks of the kinematics, from page 28 on. Here Lubos is utterly confused: he splits integrated luminosity in different subsets, deceived by the fact that there is a rough proportion between the two datasets sizes and the two subsets of integrated luminosity collected without prescale until 2005, and with prescale after then.
Then, another bad paragraph, unfortunately:
“So is it OK for someone to write 742/pb in the denominator when he calculates the cross section of the “lepton jets” ghost events? The answer is, of course, No. It’s because these “new” events are actually argued not to include bottom quarks as the initial states. For example, Giromino et al. claim that the Higgs is produced and subsequently decays to various h1, h2, and/or h3 pairs (and 16 tau’s at the very end). Nima and Neal use various supersymmetric particles instead. So you can’t normalize the initial states with the assumption that the bottom quarks are there in the initial states because they are not there.”
Again foncused. True, the “new” events do not include bottom quarks. But NOT as initial states, for god’s sake!!! Anyway, it is “Giromini”, Paolo Giromini. And of course, integrated luminosity is the same for all samples considered this far in the paper, and indeed, it is always in the denominator. Always 742/pb, never an ounce more. Sorry, Lubos. Not your lucky day.
The third subsection is called “Tables”. It is here that we get a glimpse of the faulty reasoning of Lubos, which got him stuck on accusing me of a mistake:
“Open the CDF paper on page 16. The set of all dimuon events – 743,006 – is divided to the 589,111 QCD events and our 153,895 ghost events. In the second column of this Table II, you see that only 143,743 events passed the tight SVX filter, neither of which was a ghost event.
Now, if you switch to page 12 and look at Table I, you may add the entries to get 143,000+ and to see that exactly these tight SVX-positive events correspond to the (smaller) integrated luminosity of 742/pb, as the caption of Table I says. For another “written proof” that the 742/pb luminosity corresponds to tightly SVX-filtered collisions, and not all (unfiltered) collisions as Tommaso seems to think, see page 11/52 of Giromini’s talk.”
What I highlighted in blue this time is the source of Lubos’ confusion: indeed, the 143k events which were used in the past analysis by CDF (the measurement of correlated cross section) belong to a dataset comprising 742/pb. But the rest of the data belong to it too!
The mistake of Lubos is to not reason like an experimentalist: he believes integrated luminosity follows subsets and divides accordingly, while it is a constant. The data (before any selection) amounts to 742/pb. Then, the tight SVX cuts select 143k events, or the loose cuts select more, but all samples derived from the original one all have the same denominator: 742/pb. Only, they get different efficiency factors at the denominator (the symbol used above).
Ok, I made this post longer than it needed be. Sorry to have bored many of you, but I felt there were still quite a few readers around who had not a clue yet of whom they should believe.
A note to those of you who are still undecided: I built the CMX chambers installed in CDF, with which the data we have been discussing was collected, with my very own hands, between 1999 and 2000. I have worked for CDF since 1992. I have signed the paper on anomalous muons, and I have followed a six-month-long review process before the publication. I befriend the main author, Paolo Giromini, and I have discussed Strassler’s paper with him over the phone at length. Do you not think it is a bit arrogant for a retired theorist to believe he can win an argument on such an exquisitely experimental matter with me ? I am not boasting: I am just stating a fact. Lubos is arrogant. This time, he got a lesson. Lubos, I still like you, but please, don’t mess with me on these matters.