jump to navigation

Single top: new results from CDF! October 11, 2007

Posted by dorigo in news, physics, science.

About one year ago D0, the Tevatron experiment competitor of CDF, announced they had obtained for the first time evidence for the long-sought standard model process whereby a single top quark is created in proton-antiproton collisions. Top quarks are less frequently produced alone than in pairs, and the event then possesses fewer characteristics useful to distinguish it from the large backgrounds. Indeed, the analysis methods used by D0 involved neural networks, multivariate approaches, the heaviest machinery.

CDF of course did the same, but was less lucky last year. Experimental searches nowadays have reached a comfortable level of belief in their Monte Carlo simulations, and they usually present along with the significance of the observed effect also an estimate of the “expected” significance, obtained through the application of the same analysis methodology to large pools of simulated experiments. Last year CDF should have found a better result than D0 in terms of significance of the signal but, although both experiments were expecting to be on the verge of finding the coveted “three-sigma” effect, D0 fell on the right side of the net and CDF on the wrong one.


“Three-sigma” means that you observe an effect which, if attributed to known processes other than the one you were looking for, happens by random fluctuations only about three times in a thousand. To make a simple example, suppose you count events with certain characteristics in a given dataset, expecting to see 100 from known background sources. You see 130: that is a surplus of +30 events, which is unlikely to be due to a fluctuation in the sample size. Usually, event counts follow Poisson statistics, which basically says that the variance of the 100 events is nothing else than \sqrt(100), i.e. 10. A Poisson distribution centered at 100 with a width of 10 is basically a gaussian function, which dies out quickly as you move away from 100 on either side. How quickly ? Well, you expect 68% of the distribution to be contained in the [90,110] interval – “1-sigma”; 97% to be within [80-120] – “2-sigma”; and 99.7% to be within [70-130] – a “three-sigma” inteval.

Now, we could go on, but a “three-sigma” or larger effect is usually called “evidence” by particle physicists looking for a particle decay signal. It means the data really fights with the interpretation of containing only background processes you have already accounted for when you estimated your central value (100 in the case above). A separate word is reserved for “five-sigma” effects, which are a really tiny probability of being due to accidental background fluctuations: in that case the effect is called “observation” of the sought particle.

Back to CDF, D0, and their single top searches: you now see what it means to be “lucky” in a search for a particle in a counting experiment: if your data contains background plus signal, and you expect that the size of the signal is sufficient to reach the “three-sigma” excess with respect to background alone, you are not necessarily going to find exactly that: your data might have fluctuated high or low, and the “evidence” might turn in a more robust or weaker signal.

CDF in fact expected to see a 2.6 sigma effect last year, but they got much less than that. Now, with 50% more data as much data analyzed, they expected to find a round 3.0-sigma effect in one search for single top events using a very refined matrix-element technique. And they found a 3.1-sigma excess, finally. I will describe the analysis in short below, but first I want to discuss the production of single top events.


1- pair production

At the Tevatron proton-antiproton collider, top quarks are produced in pairs by strong interactions through the diagrams shown below. The most probable process -responsible for the creation of 85% of the top pairs- occurs when a quark in the proton hits an antiquark of the same flavor in the antiproton. The two annihilate, producing a highly off-shell, “time-like” gluon which materializes in a top-antitop pair. “Time-like” just means that it is drawn as a particle propagating in the direction of time.

The other pair-producing process is called gluon fusion, which accounts for the remaining 15% of the top pair production cross section. Two energetic gluons from the proton and antiproton fuse together. The rest is similar to the former diagram.

In the two diagrams time flows from left to right, and space is represented by the orthogonal direction. That is the reason for calling them “time-like”, but it is a detail on which I will not elaborate, although the way diagrams are drawn has a lot to do with the resulting computation of the probability of the processes. Instead, I note in passing that the relative importance of the two diagrams strictly depends on the fact that we are asking what is the initial state of the collision given that a top pair has been produced: in fact, most proton-antiproton collisions involve gluons rather than quarks. However, at the Tevatron the energy necessary to produce two top quarks is a sizable fraction of the total available: and if you fish out of the proton a constituent carrying a large momentum fraction, it is likely to be a quark. At the LHC, curiously, the relative importance of the two diagrams is reversed: 15% quark-antiquark annihilation, 85% gluon fusion. A coincidence, due to the much smaller energy fraction required to make a top quark pair there.

Now, how can we produce a single top quark ? These things do get produced in pairs in strong interactions. Strong interactions are flavor-blind: since they are mediated by gluon exchange, and gluons are only sensitive to the color charge of bodies they interact with, they do not distinguish top quarks from other ones of the same color. You cannot create a top-anticharm quark pair (say) from a gluon, because gluons are not able to change the flavor of quarks.

“Wait a moment”, you could now say. “Ok, you convinced me that with QCD you cannot create single top quarks. But is it not possible to find a top quark inside one of the two projectiles -say the proton, and propagate it to the final state through a space-like graph ? The thing would exchange color quantum numbers with a parton in the antiproton, retaining its flavor. In the final state we would have one top quark and another parton.” (see plot on the right).

That is a good idea, but unfortunately while it is indeed possible to find in a proton a quark (or whatever other particle) which has normally a mass larger than the proton altogether, for a top quark this probability is extremely small. In order for the energy of the system to be conserved, your virtual top quark must be far off its mass shell. And the farther it is, the shorter the time it may exist inside the proton. Now, when you hit the proton with another hadronic particle, what you are effectively doing is “illuminating” it with a stream of partons. You are taking a snapshot of the instantaneous composition of quarks and gluons contained in the pictured body. A virtual top quark will almost never show up in your snapshot, because of the almost vanishing probability of popping out of the vacuum – or equivalently, the small time on average it spent on duty.

Since I felt inspired today, I cooked up a picture showing a proton and what you may find inside at a given instant. You have three “valence” quarks – the red, green, and blue points – providing the proton with its invariant characteristics: zero net color, unit electric charge (1=2/3+2/3-1/3), and half unit of spin (1/2=1/2+1/2-1/2, say, in the direction chosen as your measurement axis). You have gluons flying around, and being emitted and absorbed by themselves or by other quarks (the bicolored wiggly lines). And you have quark-antiquark pairs popping out of the vacuum for brief instants (the points close together).  

Then, in the second picture you see the momentary creation of a virtual top-antitop pair. One of them is drawn larger, for no particular reason other than drawing attention. Also, another criticism is that quark-antiquark pairs popping out of the vacuum ought to have vacuum quantum numbers, so no net color; however, they can be the result of a virtual splitting of gluons, so that is not a real error. In any case, now that I look at the picture in detail, it is not that interesting. Let us move on.

By now, you should have accepted that QCD allows you to get only an even number of top quarks (when I say top quarks, I count antiparticles too) out of a proton-antiproton collision. Four is actually possible, but very, very unlikely because of the large energy required. So let us instead find out what it takes to produce a single top.

2- single top production

The only way you can end up with a single top quark is by electroweak interactions. The W boson carries “charged-current” weak interactions, and indeed, it has the ability to create pairs of quarks of different flavor: weak interactions do not conserve flavor quantum numbers as QCD. You can then envision the processes pictured below, both yielding a single top quark (plus something else) in the final state. The first is “space-like” interaction between a gluon from one projectile and a W boson from the other, gW \to t \bar b; the second is the production of a W boson which subsequently decays into a top-antibottom pair, u \bar d \to t \bar b. In both cases, you end up with a top quark line in the final state (in red). Also note that the presence of two W-t-b vertices in both diagrams. We will come back to their significance later. Oh, also note that due to a unfortunate labeling (by Mandelstam) of s-channel and t-channel, they are the opposite of what you would think: s-channel is time-like, and t-channel is space-like… It sucks, I know!

I can hear somebody scream: wait, how can in the s-channel diagram (right) a W (whose mass is about 80.4 GeV) decay into a top quark (whose mass is more than double, 171 GeV), with a b-quark (mass 4.5 GeV) thrown in to boot ? Well, the W in the process is not on mass shell: it is produced with a high virtuality –that is to say, with a mass quite different from the nominal one. It does not matter, as long as it is an internal leg in the diagram. It happens, we can compute it, it does not violate any rule.

If you compute the cross section – i.e., the probability – for producing a single top quark by the diagrams shown above at the Tevatron, you come up with a number which is half the one for pair production: roughly 3 picobarns. That means that in a dataset of 1.5 inverse femtobarns (a femtobarn is a thousandth of a picobarn) you expect to have produced no less than 4500 single top events! How come, then, that we are only seeing a three-sigma evidence ? 4500 events are a lot of dough. The question is answered below, where I describe the troubles with the analysis.


Well, not “the search”, but “this search”. Indeed, we have searched for single top production for many years now, and with many different techniques. The one I am reporting about today is the most successful so far, the one that allowed us to find the long-sought evidence.

If you look at the production diagrams, you see that a single top quark is usually accompanied by a b-quark. You thus get either four hadronic jets in the case the top decays hadronically: t \to W b \to q \bar q' b with each of the three quarks producing a hadronic jet, plus a fourth from the other b-quark; otherwise, as shown in the diagram on the right, you get two b-quark jets and a lepton-neutrino pair from the W decay: t \to W b \to l \nu_l b.

The 4-jet final state is impossible to detect, since it is mimicked by strong interaction processes yielding four partons in the final state, and these have collectively a cross section that is four orders of magnitude larger. So you are left with the leptonic final state: a so-called “W+2 jets” signature. The W signal is not hard to extract from the data, since high-energy electrons and muons are seen with ease in CDF, and the energetic neutrino also leaves a striking imbalance in the transverse energy budget. However, discriminating p \bar p \to t \bar b \to W b \bar b from $latex p \bar p \to W b \bar b$ – a process that can happen without the creation of top – is hard. And not much easier is to discriminate the signal from top pair production: p \bar p \to t \bar t \to W b W \bar b because the additional W boson can escape undetected.

So we have large backgrounds, and their kinematics is not very different from that of our signal. Usually, one would try to find the best kinematical variables and use their value to select a signal-enriched sample. But that is by and large the past! New technologies and more confidence in the Monte Carlo simulations of signal and background processes allow much more refined techniques.

In the new and very successful CDF analysis, authored by my ex-co-convener (or co-ex-convener ?)  of the jet energy and resolution working group Florencia Canelli (now FNAL), together with Peter Dong, Rainer Wallny, and Bernd Stelzer (all from UCLA), the knowledge of the signal production mechanism is exploited to the utmost, by taking for each event the kinematics of all measured objects (jets, lepton, missing energy) and computing the probability that it arises from any of the possible configurations and energies with which single top events are produced. That is to say, use is made of the matrix element of the sought process  as a probabilistic weight for the event, once experimental transfer functions that modify energies and angles of the detected final state bodies have been taken into account.

The same is done for all the main backgrounds: Wb \bar b, which produces b-jets with a similar rate, and other processes yielding W bosons and jets.

A Neural network classifier is used to determine the likelihood that there is b-quark content in the jets. The output is a number 0<b<1, which is used together with the matrix element information in a global discriminant:

EPD = b P_{t} / ( b P_{t} + b P_{Wb \bar b} + (1-b)[P_{Wc \bar c}+P_{Wcj}] )

This event-probability discriminant is shown for the various processes in the plot below.


You can see that EPD is close to 1 for most of the single top signal, while backgrounds are more likely to peak at zero.

Once computed for the 1078 CDF data events passing a selection requiring a W+2 jet topology, the EPD is fit as a sum of signal and the concurring backgrounds. Backgrounds are constrained in normalization to the Monte Carlo prediction for their rates, and systematic uncertainties are taken into account with nuisance parameters in the fit. The output is shown below: the red area corresponds to single top events, and it amounts to a cross section $\sigma(t) = 3.0^{+1.2}_{-1.1} pb$.

The inset shows the region at high EPD, where in red you see the excess due to single top events. The experimental data is shown by the black points with error bars. 

From this measurement, it is straightforward to derive a measurement of the Cabibbo-Kobayashi-Maskawa matrix element V_{tb}, a number that specifies how likely it is that a W boson couple to a t and a b quark line. The cross section for single top production is in fact proportional to the square of that element. CDF finds V_{tb} = 1.02 \pm 0.18 \pm 0.07, where the second uncertainty is theoretical and it arises from the uncertainty in top cross section dependence on top quark mass, and other modeling details (fragmentation and renormalization scales, alpha_s value).

One more nice result in the bag for CDF, and a sigh of relief… This signal had to wait for a long time to emerge!

UPDATE: Tony Smith, in a comment below, asks for the distribution of reconstructed top quark mass of candidate events with a high value of EDT, a plot which last year caused some discussion (echoed  for the D0 analysis), given that it showed some excess at 140 GeV which could fit Tony’s hypothesis of a top quark at that mass value. Here is the updated plot:

For a ghost signal, I must say this 140 GeV top quark issue is hard to die… One bin up, one bin down, and there still is something to talk about! And it all started about 15 years ago… For more information you can read some details of the analysis in teh conference note of the analysis.



1. Plato - October 11, 2007

Layman struggling. How do you keep it all together? 🙂

Probability and outcome? Hmmmm……given the knowledge of the energy impute into any collision process, why is it not known the outcome in each given collision state?

Mendeleev knew to calculate, that for any new element in his table, it would be based on setting the stage for a “calculated space” in between each element.

The statistical sense of Maxwell distribution can be demonstrated with the aid of Galton board which consists of the wood board with many nails as shown in animation. Above the board the funnel is situated in which the particles of the sand or corns can be poured. If we drop one particle into this funnel, then it will fall colliding many nails and will deviate from the center of the board by chaotic way. If we pour the particles continuously, then the most of them will agglomerate in the center of the board and some amount will appear apart the center.

I know this is a far cry from the venue and was just thinking out loud. It just seems odd to me that such probability even at these levels, would not have some value if you continued past the regime of experimental science and moved forward to the theoretical framework?

Dyson, one of the most highly-regarded scientists of his time, poignantly informed the young man that his findings into the distribution of prime numbers corresponded with the spacing and distribution of energy levels of a higher-ordered quantum state

But then providing a “probability diagram outcome,” your saying the energy valuation and “all particle inclusions” have not been seen experimentally. So you provide for a “frame work of the possibility?”

You then supply two different situations. One worked theoretically?/experimentally? and the other didn’t? I hope I got that right?

Still something bothering me here.

2. Tony Smith - October 11, 2007

Tommaso said “… New technologies and more confidence in the Monte Carlo simulations of signal and background processes allow much more refined techniques. In the new and very successful CDF analysis, authored by … Florencia Canelli (now FNAL), together with Peter Dong, Rainer Wallny, and Bernd Stelzer (all from UCLA) … use is made of the matrix element of the sought process …”.

Isn’t this the same group that last year “… performed the first search for single top using a Matrix-Element based analysis …” with a result that Tommaso said in a 20 November 2006 blog post “… does measure a meaningful cross section – but a part of the excess of signal events clusters at low mass, indeed …” ?

If so, does their new result also have “a part of the excess of signal events [that] clusters at low mass”?

Are they using basically the same Matrix Element technique, and if not, what are the significant differences?

Do they have sensitivity charts (plotted as Events v. Mass) for increasing cuts on the Event Probability Discriminant, as they had last year?
If so, are they available on the web?

Tony Smith

PS – Wasn’t a competing Likelihood Function method also used last year by CDF, with a result that it found no events, thus disagreeing with the Standard Model?
Are there any newer results from the CDF Likelihood Function people?

3. dorigo - October 11, 2007

Hi Plato,

quantum processes are the quintessence of probability. What happens when a proton hits another proton at high energy – say head-on to reduce the degrees of freedom – is that the two behave as bags of garbage. Inside one bag there may be a few tin cans and a lot of old newspapers, inside the other there are similar hard things and soft things. Physicists are able to compute the average value of the area of one bag (as seen from the other) occupied by any given kind of particle, and so can compute the average probability that a collision between two tin cans occurs, as opposed to a softer one between a tin can and a softer thing.


4. dorigo - October 11, 2007

Hi Tony,

I partly answered you in an update of the post. As for the likelihood, I will report on that too in due time – but it was not “disagreeing with the SM”, it was a 2-sigma-ish downward fluke to me.


5. Tony Smith - October 11, 2007

Tommaso, thanks for the update and the link to the pdf file.

As to the Likelihood Method, sorry for using the language “disagreeing with the SM”,
when a direct quote from your November 2006 blog entry, with more context, is:
“… CDF can not measure the production of single top yet, and actually is in the awkward position of excluding its production according to the predictions of the Standard Model. …
Nobody really believes that single top production is not there: it must be. It probably is just a unlucky downward fluctuation of our data. But still, it starts to be embarassing! …”.

I am looking forward to your “in due time” report about any new Likelihood Method results at CDF.

Tony Smith

6. dorigo - October 11, 2007

Hi Tony, it’s tough to argue with people who quote you 🙂
Anyway, I will let you know soon about the likelihood, I think there is a blessed result out on that too.


7. Paolo - October 12, 2007

Hi Tommaso and thanks for one more exciting post.. I’m still digesting it… Right now I would be interested in learning more about the use of neural networks in particle physics. To tell you the truth, when I started earing about that (1 year ago?), I was a bit surprised: some 7-10 years ago, during my master thesis and afterwards, I studied all those techniques in some details and eventually was coming to the conclusion that all the initial excitement was more like a passing fad: more robust and better understood nonlinear models were available, often *much* better performing in *practice* (I could tell you a lot about this). Therefore, I would be curious to know along which path neural nets eventually became so popular in this area. Are there any new software packages tailored to particle physics? Any further info or guidance in the literature would be appreciated!

8. Doug - October 12, 2007

Hi Tommaso,

MIT online notes:
“Electrical engineering, originally taught at MIT in the Physics Department, became an independent degree program in 1882.
The Department of Electrical Engineering was formed in 1902, …”

If this independence had been in 1832, for example, would James Clerk Maxwell (13 June 1831 – 5 November 1879) be considered an electrical engineer [EE]?

The 2007 Physics Nobel awards [Giant Magnetoresistance] appear to demonstrate how close EE and physics remain.

9. dorigo - October 12, 2007

Hi Paolo,

neural networks were looked at with suspicion 15 years ago, because of the trouble controlling what they were actually doing, the trouble with determining systematics in a sound way, and the too high reliance on Monte Carlo models.

To give you an example of what I mean with “reliance on MC”: when deciding on a selection cut on a kinematic variable, one would “stay away from the tails” of the distribution, because while one would expect Monte Carlo simulations to correctly reproduce the “bulk” of a distribution (i.e., its mean and width), the tails were much less well understood.

Imagine you were looking for a unknown signal on a kinematical variable X, simulated with Monte Carlo. You expected backgrounds to have mean value MB=100 and root-mean-square WB=10, signal to have mean MS=110 and rms WS=20. In order to select a signal-rich sample you would be tempted to select events at arbitrarily large values of X (because, the higher X is above 110, the larger the signal/background ratio becomes under those conditions). Say X>130: you would reduce your background to almost zero (three-sigma! half of 0.3%=0.15%), and still get some signal (1-sigma: half of 32%=16%). This, however, was dubbed “cutting on the tails of a distribution”, and was considered a dangerous practice: the Monte Carlo could not be trusted to reproduce correctly the tails at high X of the signal distribution.

Neural networks would not mind cutting on the tails, and actually did much more, selecting nonlinear boundaries in multi-dimensional spaces constructed on well- and less well-understood distributions. The difficulty in controlling what they were doing was the main source of suspicion.

Now things have improved a whole lot in the modeling of event kinematics. The single most important development has been an improved modeling of QCD radiation, and PDF also are better known. Moreover, the NN technology has become an off-the-shelf item, with routines available even within standard analysis packages such as root.

Anyway I agree, NN are not the only way to work on multidimensional spaces. There are other methods…
See my own invention explained in my 2005 QD blog (if you dig, you’ll find more info about it there).

I would be happy to hear about your experience on NNs…


10. dorigo - October 12, 2007

Hello Doug,

yes, electrical engineering, as several other disciplines, is of course very much connected to physics. I think Maxwell would remain a physicist though, because he did much more than writing down a few differential equations in his lifetime, regardless of how successful those were. And in any case, one thing is scribbling equations, another is to develop a hard disk – both are things physicists can do well, and both are no exclusive of physicists!


11. jeff - October 12, 2007

another story that I’ve told before:
Military wanted to use a neural net to be able to recognize enemy tanks concealed behind trees. They took many photos with and then without tanks and trained the neural net on a large fraction of these photos. Then they tested the net on the remaining photos. WOW the net had a very high success rate: a very high fraction of test photos were correctly classified (tank hiding/no tank). They then made a realistic field test. The net performed terribly!

It turned out that the net had “focused” on a difference in the train set of photos: when the tanks were not present the sky was clear; when the tanks were finally wheeled into position there was over cast.

Tails are tails and everyone knows that all analysis suffer from poorly represented tails and indeed taking that ignorance into account is a big job that must be done well to convince someone you have found a true signal. But biases are worse! They lurk everywhere. And when you discover them it is usually late! You’ve wasted much time, money and enthusiasm.

12. Andrea Giammanco - October 12, 2007

Jeff, Tommaso, every single discriminating procedure that you can imagine, suffers from exactly the same troubles.
The more powerful is a multivariate technique, the more it will be sensitive to the details of the simulation.
A simple analysis with sharp cuts is less sensitive to the details of the simulation for the very same reason why its discrimination performance is worse than a NN: it just divides the phase space into two regions, which is clearly not an optimal use of the available information, while the NN takes into account all the shape of the distribution in the multidimensional space of the inputs.
And even when you apply a simple cut-based analysis you need control samples. It’s not difficult to do the same with NN!

For example, the funny anectode told by Jeff doesn’t tell anything about NNs: it tells only that the people who trained them were people with very little experience in self-training softwares (NNs or whatever).
If they were physicists, their mistake would be something like taking the “signal” from a Tevatron run at 1.96 TeV and the “background” from the old Run I at 1.8 TeV, and let the NN “discover” that the signal has higher energy on average than the background.
Please note that if instead of a NN they would have used any of your favourite multi-variate technique, the outcome would have been exactly the same. Although with less performance, of course 😉

13. dorigo - October 12, 2007

Hi Jeff, nice story – I had never heard it before. Yes, biases are everywhere, but their systematical effect is usually simpler to estimate. For neural networks, the art of determining systematics is still a little bit of black magic…

Andrea, I agree of course.What I was noting is that because the simulation has improved, several new techniques have become more widely used in HEP. However, there is a subtlety hidden in what you say. In a cut-based analysis, you use the most powerful discriminants _in_the_bulk_. You avoid tails, because you know you are not modeling them well. NNs may find a better discrimination in the tails, so they are less secure intrinsically.


14. Andrea Giammanco - October 12, 2007

It’s correct in most of the use cases, but there are also cut-based analyses which have to cut away the bulk because the signal lies on a tiny tail… This is dangerous but it is the only way, and of course it requires extra care.
I’m not an advocate of NNs (I used them in only one instance in my life), it’s just that I’ve heard too many myths about them.
And by the way saying that the systematic error increases when using a NN is a bit misleading. It is larger than with standard methods under the assumption that S(ignal)/B(ackground) is more or less the same.
In conditions of large B, the main systematic errors come from the B modeling and are proportional to B/S. So, if your multivariate method is worth using, by decreasing B/S it will greatly decrease the overall systematic. The price of increasing the error due to the modeling of S itself is usually over-compensated.

I agree with the advocates of “simple and clean methods” when the S/B separation is easy. (The S/B separation is easy when you can figure out some reasonable cuts by a simple eye inspection of the plots. The single top case at Tevatron is not in this category.)

15. Markk - October 12, 2007

Speaking as an EE guy this post actually looks more like the revenge of Chemistry – it has taken over experimental particle physics! Ok the details of the techniques are different (as to be expected at different energy levels), but the whole flavor of the thing – that using cross sections and a powerful theory of production rates you can tease apart a soup of interactions to get a list of the ones you want is chemistry to me. The part where you can mess with the initial conitions to get higher proportion of what you want seems to be very limited here as opposed to the molecular level of course.

Is that what something like the ILC would provide? an easier way to mess around to get better production rates of things that look interesting?

16. Tony Smith - October 12, 2007

Tommaso said “… In a cut-based analysis, you use the most powerful discriminants _in_the_bulk_. … the Monte Carlo could not be trusted to reproduce correctly the tails at high X of the signal distribution. … You avoid tails, because you know you are not modeling them well.
NNs may find a better discrimination in the tails, so they are less secure intrinsically.
… Neural networks would not mind cutting on the tails, and actually did much more, selecting nonlinear boundaries in multi-dimensional spaces constructed on well- and less well-understood distributions. The difficulty in controlling what they were doing was the main source of suspicion.
Now … the NN technology has become an off-the-shelf item, with routines available even within standard analysis packages such as root. …”.

Such problems are unfortunately not restricted to high-energy physics, but have been carried over (in large part by people who left high-energy physics to work as Quants, so this comment is not really very far off-topic) to plague our sophisticated economy.
Here are some excerpts from a web article at

“… At some point in the latter decades of the 20th century, someone sat down and thought: wouldn’t it be nice if all the money in the world was controlled by scientists rather than accountants and nice chaps from Eton? …[So]… the banks [decided to]pay good money … (£60-120k/$50-250k) for a very good physics PhD straight from university to become a quantitative analyst. …
Now, as we march headlong into the 21st century, full of sub-prime fallout, to a decent approximation, what we’re seeing is just … a better quality of screwup. …

a credit rating is not constant. Not even slightly. …
Of course, it’s simply not possible to work out all the combinations of price movements, so they throw a vast array of randomly generated values at it to simulate the next couple of weeks. Of course the same Monte Carlo approach is also used to value them in the first place … The nature of these randoms means that their handling requires care. First off, the standard VBA/C++ library functions are garbage, but still get used. Then there is the question of which distribution you should choose, which is controversial.
Many use the lognormal distribution … which resembles what we see in real life. Except that it gravely under-estimates the probability of big price movements, as Nassim Taleb has been telling people … for a long time. Ignoring this has led several banks to encounter issues that “should” happen only once in the life of the Earth, actually happening twice in the same month. …”.

Tony Smith

17. Fred - October 13, 2007

T. & Co.

In keeping with the spirit of the current Major League baseball playoffs here in the U.S., this post with it’s comments is a perfect metaphor for the time-honored phrase: ‘back-to-back homeruns.’ Tony, thanks for input and link. You’d make a damn good cleanup hitter.

And now… back to the game.

18. dorigo - October 14, 2007

Hi Andrea,

I totally agree – the comparison on systematic errors has to be made at the same point of B/S. Normally, NNs achieve a better B/s than simpler methods and thus one should take that into account when comparing the two.


19. dorigo - October 14, 2007

H Markk,

well, yes. To some extent, it all boils down to better production rates of some rare processes, and better reach to some processes that would be impossible to create with lower energy.

higher L –> higher rate of rare processes
higher E –> higher rate of rare processes; reach to production of more massive states.

higher L,E –> best of worlds.


20. dorigo - October 14, 2007

Hi Tony, thank you for your interesting comment.
Fred, I wish I understood baseball…

21. Paolo - October 14, 2007

Hi Tommaso and everyone and sorry about the delay on my part… First, thanks a lot about the explanations and the info that root also included NN now, didn’t know about that. About my personal experience, I have been mostly involved with unsupervised neural nets, like Kohonen nets, for example. I used those nets as a special vector quantization device for data compression. Afterwards, however, I wanted to understand in some detail the full spectrum of techniques, but, the more I was delving into that, the more was disliking all the hype and slang of the first NN literature in favor of a sound statistical point of view. To explain what I mean, I would suggest the book:


Also, in general I became a big fan of the bayesian point of view in general and in its application to neural net, in particular:


Actually, I was under the impression that this is the only statistically sound way of using neural nets. I *hate* all those tricks and had hoc recipes to control the model and avoid overfitting. In general, I was under the impression that many engineers (like me 😉 were using neural nets without really knowing well the full spectrum of traditional and newer statistical models and without knowing well how to control in a systematic way the bias-variance trade-off. Now, I really hope that these days physicists are much better than engineers one decade ago at those issues, otherwise I don’t know ho much I can trust the next claims of discovery! 😉 Certainly, today we have much more computing power and we can afford checks and techniques in general completely out of reach at the time, no excuses 😉 Now back to the details of your last post and also your 2005 Blog… Ah, one last thing, I wonder which is the opinion of an italian colleague of yours about these topics, Giulio D’Agostini, last time I read something from him sounded really sensible (IMHO)


Tommaso, do you know him in person?

Ciao to everyone!

22. dorigo - October 16, 2007

Hi Paolo,

sorry for my late reply here. No, I do not know Giulio in person, but I read a very good account of statistical issues he wrote. Yes, he is an excellent writer.


23. Alejandro Rivero - October 24, 2007

Hmm now I think about this issue… Will the next edition of the particle data group to publish measurements instead of bounds for the lifetime of the top, based on this work and the previous of D0? On one side, it measures |V_{tb}|, on another it is not only the decay reaction but also the synthesis, so really it measures something as |V_{tb}||V_{bt}|. The current edition of the pdg does not uplift this kind of results to the summary tables, and remarks “By assuming three generation unitarity” when incorporating hep-ex/0012029

24. dorigo - October 24, 2007

Hi, Alejandro,

I believe the measurements of single top cross section do obtain V_tb^2 from the rate, but they rely on the standard model more than the measurement quoted in hep-ex/0012029 does. If the top could also decay into something totally different, the latter result would stay valid, while the one from single top xs measurements would be invalidated. The B(Wb)/B(Wq) measuremend basically only relies on b-tagging, while the sigma(t) rate depends on the assumptions on the open decay channels.

I doubt the PDG will speculate… They will just quote the measurement above in their review of top physics if I am allowed to guess.


Sorry comments are closed for this entry

%d bloggers like this: