2.11 Event Reconstruction

2.11.1 Offline Processing

The raw data tapes written by the Data Acquisition system go through a number of standard processing steps before being used for individual analyses, such as the one described in chapter 4. Their task is to infer details of the products of the $ \ensuremathbox{\mathrm{e^+ e^-}}$ interaction from the electronic measurements made by the detector. Thus, for example, the raw data, consisting of digitized drift times and charge measurements on specified channels in the tracking detectors, are converted into trajectories through DELPHI from which the creation point, direction, and momentum of the charged particles can be inferred.

The majority of this generic analysis is performed by the DELANA program [54], which is based on the TANAGRA2.8 [55] data model and access routines. TANAGRA provides a common format for storing the results of each stage of the processing,2.9as described below.

  1. Calibrations are applied to the digitizations from the raw data and saved in a semi-standard form (TD TANAGRA banks).

  2. Where possible, pattern recognition, local to each subdetector, is performed. The sophistication of the output track elements (TE banks) depends on the subdetector concerned (e.g. individual $ \ensuremathbox{R\phi}$ or $ \ensuremathbox{R z}$ measurements from the VD, track segments from the ID, TPC, and OD, and energy clusters from the calorimeters).

  3. TPC track elements are extrapolated to the other detectors and an initial association is made with their track elements (not the VD at this stage). Additional searches are made for ID-OD (in the TPC cracks), FCA-FCB, and FCB-beamspot (for very small angles) track element associations in order to recover tracks that are not seen (or well measured) in the TPC. At this stage, ambiguous associations are maintained as separate track strings (TS banks).

  4. Each track string is then passed through the full track fit (see section 2.11.2). Ambiguities can now be removed using the fit $ \ensuremathbox{\chi^{2}}$, leaving a self-consistent set of tracks (TK banks).

  5. These tracks are then extrapolated to each detector and used to guide a second stage of local pattern recognition and, where necessary, the track is refitted. It is at this stage that the VD hits are associated and included in the track fit (done here in order to take advantage of the optimum track determination in the VD association). A final search for missed tracks is performed on unassociated TEs.

  6. Calorimeter energy clusters, muon chamber hits (see section 2.11.4), and RICH information are associated with the fitted tracks. Unassociated calorimeter energy deposits are used to form neutral `tracks'.

  7. A vertex search produces track bundles (TB banks), and a vertex fit performed to determine vertex positions (primary and possibly decay vertices; TV banks). At present this information is only used for diagnostic purposes, since the cuts used in vertex fitting are highly analysis-dependent.

Constants and parameters used by DELANA are read [56] from the CARGO [129] database based upon the time the event being analysed was captured by the DAS. Section 3.7 describes CARGO and the information written to it by the online system. CARGO is also used by DELANA for many of its other parameters, such as the detector geometry. The statuses of the various detector partitions are combined with the DELANA processing status to provide a set of flags, which are written out as run selection files for use in physics analyses.

Following the DELANA reconstruction, two types of tagging algorithms are applied. Both have very loose cuts in order not to reject events that might be accepted after post-DELANA corrections are applied. The `DELANA tags' [57] produce a broad categorization of each event as, for example, hadronic $ \ensuremathbox{\mathrm{Z^0}}$, leptonic $ \ensuremathbox{\mathrm{Z^0}}$, Bhabha seen in SAT/STIC, etc. The `Physics Teams tags' [58] select events of interest to specific classes of analysis. Events selected by either of these groups of tags are written to the main (`DST OR') output stream.

A full DELANA processing of a year's data can take many weeks. In order to allow corrections and refined calibrations (e.g. Vertex Detector alignment) to be updated at frequent intervals, a second stage of processing, DSTFIX [59], is performed (and can be reperformed) on the DST output of DELANA. DSTFIX can make changes at the TE level and refit tracks. It is also used to adjust the efficiency, cleanliness, and precision of simulated data to better match the quality of the real data. A number of particle identification algorithms, considerably more sophisticated than those used in DELANA, are run on the output of DSTFIX: charged hadrons (p, K$ ^{\pm}$, and $ \pi^{\pm}$) are identified with the RICH [60] and TPC $ \ensuremathbox{\mathrm{d}E / \mathrm{d}x}$ [61], electrons [62] with the electromagnetic calorimeters, and muons [75,76] with the muon chambers. A primary vertex fit and b-tagging algorithm [63] is also run to identify $ \ensuremathbox{\mathrm{Z^0}}\ensuremathbox{\rightarrow}\ensuremathbox{\mathrm{b\bar{b}}}$ events by the increased impact parameters of tracks from B hadron decays.

The analysis chain causes the data to be interpreted and transformed between a number of forms, all of which are based on the ZEBRA [64] memory management system (part of the CERN library), which provides dynamic data structuring within Fortran 77 (which lacks the pointer or reference data types required for this data model). ZEBRA also provides methods for file input/ output for these data structures in a format that allows the data to be transported between computers with different numeric representations.

Raw data [65]
is output by the DAS (or, for simulated events, by DELSIM) and read in by DELANA.

provides a data model to ensure a clean and safe interface between DELANA routines, since these are written and maintained by many different people. TANAGRA is also based on ZEBRA, so its data can be output to a file, and in fact this was the primary output of DELANA in DELPHI's first two years. However, since then, the volume of data has made it impractical to save the detailed TANAGRA files for anything other than debugging purposes.

DST2.10 [66]
is now the primary output of DELANA. It includes reconstructed track parameters, as well as the TE track elements, allowing the tracks to be refit post-DELANA. The average full DST hadronic event size is 60 kilobytes.

LongDST [67]
is written by DSTFIX and has the results of the particle identification and b-tagging algorithms included.

ShortDST [68]
is also written by DSTFIX, but includes less reconstruction information (e.g. it does not include the TEs), reducing the average hadronic event size to 20 kilobytes.

MiniDST [69]
is written by the PHDST program. It contains a subset of information from the shortDST, stored in a compressed form, allowing it to be kept on disk for rapid analysis. The average miniDST hadronic event size is 6 kilobytes.

All the offline code is written in Fortran 77 [116]. Due to the large number of collaborating institutes within DELPHI as well as its long timescale, the simulation and analysis tools have been required to work on a diverse collection of computing architectures and operating systems: HP-UX and DEC Alpha Unix, VAX and Alpha VMS, and IBM VM/CMS systems being only the most common. In order to allow for system-specific code, as well as providing a rudimentary form of revision control, the PATCHY system [70] was used.

2.11.2 Track Reconstruction

Two stages of track fitting are performed in DELANA (once on the possibly ambiguous track strings, and then again following ambiguity resolution) as well as an additional track fit using the corrections and improved VD alignment available in DSTFIX. Essentially the same algorithm is used in each case, with TEs as input. In the case of the ID, TPC, and OD these are themselves the result of local track fits.

The effects of the particle's passage through the material between the points where measurements are made must be taken into account. Small-angle multiple scattering, which introduces additional uncertainties in the angle of extrapolated track elements, is described by a Gaussian distribution with RMS plane-projected scattering angle of

$\displaystyle \sigma_{\theta} = \frac{0.0136\ \ensuremathbox{\mathrm{GeV}/c}}{p} \sqrt{x/X_0} \left[ 1 + 0.038 \ln(x/X_0) \right]$ (2.7)

(a good approximation for $ \gamma \gg 1$), where $ p$ is the momentum and $ x/X_0$ is the fraction of a radiation length traversed. Scattering at larger angles is not treated at this stage. The energy loss due to ionization is also taken into account in the fit as shown in figure 2.13.
Figure 2.13: Assumed variation of energy loss ( $ \ensuremathbox{\mathrm{d}E / \mathrm{d}x}$) with momentum (plotted as $ p/m$), relative to the minimum ionization. $ \dEdx_{\ensuremathbox{\mathrm{min}}}$ depends on the materials traversed; composite values between 0.02 and 0.2 GeV per radiation length are typical.

The track fit is based on a Kalman filter [71], which allows `outliers' (TEs whose presence in the track fit significantly degrades the fit $ \ensuremathbox{\chi^{2}}$) to be iteratively removed as part of the fit process.

The spatial precision of the overall tracking is largely determined by the vertex detector, so the results were summarized in section 2.3.1 (see in particular figure 2.8). The momentum precision at 45.6 $ \ensuremathbox{\mathrm{GeV}/c}$ can be measured using $ \ensuremathbox{\mathrm{Z^0}}\ensuremathbox{\rightarrow}\ensuremathbox{\mu^+ \mu^-}$ events (radiative $ \ensuremathbox{\mathrm{Z^0}}$ decays are removed by requiring an acollinearity of less than 0.15 $ \ensuremathbox{^\circ}$). These are shown in figure 2.14.

Figure: Inverse momentum distribution of 45.6 $ \ensuremathbox{\mathrm{GeV}/c}$ ( $ 1/p=0.0219$) muons seen in (a) the VD, ID, TPC, and OD; or (b) the VD and FCB.
RMS widths of $ \sigma_p/p = 3\%$ in the barrel and $ 6\%$ in the endcaps are found. At lower momenta, simulated data are used, as shown in figure 2.15.
Figure 2.15: Track momentum precisions estimated by comparing simulated and reconstructed parameters. The errors in momentum (a,b), azimuthal angle $ \phi$ (c,d), and polar angle $ \theta$ (e,f) are shown. The variation of each parameter's precision with respect to $ \theta$ is shown on the left; with respect to momentum on the right.

2.11.3 Beamspot

The primary vertex ( $ \ensuremathbox{\mathrm{Z^0}}$ decay position) is within the region encompassed by the crossing of the electron and positron beams. Its RMS size is typically 200 $ \ensuremathbox{\mu\mathrm{m}}$ in $ x$, negligible in $ y$ ($ <3.9$ $ \ensuremathbox{\mu\mathrm{m}}$ in 1994), and $ \sim 1$ cm in $ z$. The position and size change slowly throughout the LEP fill as well as between fills.

The beamspot position and size [72] (at least in the $ x$$ y$ plane) can be used directly as a measure of the primary vertex position and error, as a constraint on an event-by-event primary vertex fit, or to identify tracks coming from a secondary vertex.

A primary vertex fit is performed for each hadronic event passing cuts designed to reduce the contribution from poorly measured tracks and secondary vertices. These are divided into samples corresponding to each tape written by the DAS ($ \sim 200$ hadronic events in 1994). For each sample a fit is performed, assuming independent Gaussian beam profiles in all three coordinates, and the primary vertex position and size in $ x$ and $ z$ is determined. The beam size in $ y$ is too small to measure for this sample size. Since the beamspot position determination is critically dependent on (and relative to) the detector alignment (particularly that of the VD), it is performed separately for each DST and shortDST production. The mean beamspot position is typically measured to an accuracy of $ (\sigma_x, \sigma_y, \sigma_z) = (15, 5, 1500)\ \ensuremathbox{\mu\mathrm{m}}$ for each sample.

2.11.4 Muon Identification

Similar algorithms are used for muon identification [73] in DELANA (the EMMASS package [74]) and afterwards (the MUCFIX [75] and MUFLAG [76] packages). The difference is that EMMASS employs extremely loose selection criteria (useful for producing dimuon ( $ \ensuremathbox{\mathrm{Z^0}}\ensuremathbox{\rightarrow}\ensuremathbox{\mu^+ \mu^-}$) samples for checking and alignment), whereas MUCFIX allows for variable cuts, specifically those defined in MUFLAG, which are optimized for identifying muons in hadronic jets. MUCFIX can also take advantage of the improved tracking available after DSTFIX and, as is the case with all post-DELANA reconstruction, allows corrections to be implemented more rapidly.

Hadron contamination in the set of tracks selected as muons in jets is due to

About 0.3% of hadrons pass through the calorimeters without interacting.

Hadrons that pass down the cracks between HAC sectors have a lower chance of interaction.

The dominant source of background is due to secondary particles produced by hadronic interactions, particularly in the outer layers of iron, which themselves penetrate to the muon chambers.

Decays in flight:
Muons produced from pion or kaon decay are an additional source of background to muons produced closer to the production vertex.

Since the particles seen in the MUC from punch-through and decays in flight are not the same as the originating hadron, they will tend to have a slightly different trajectory. Their contribution can thus be reduced by requiring that the extrapolated trajectory of the supposed muon match the MUC hits within the accuracy expected from multiple scattering and measurement errors.

Contamination can be further reduced by requiring hits in muon chambers outside the iron, where they will have had to pass through an additional interaction length of iron.

EMMASS/MUCFIX perform a fit for the particle's trajectory at the MUC, using the MUC hit coordinates (and errors) and the extrapolated track position and direction (and full error matrix, including multiple scattering). $ \ensuremathbox{\chi^{2}}$s from this fit are used to select the best association, drop bad hits, and reduce the contamination from punch-throughs and $ \pi$/K decays.

MUFLAG provides four predefined selections, the results of which are written to the ShortDST. The Very Loose tag (intended for $ \ensuremathbox{\mathrm{Z^0}}\ensuremathbox{\rightarrow}\ensuremathbox{\mu^+ \mu^-}$ studies) has no cut on the $ \ensuremathbox{\chi^{2}}$, so the best association made (after the very loose preselection and bad hit rejection made by EMMASS) is used. The other three tags (intended for studies of muons in hadronic jets) use a tighter bad-hit cut and progressively tighter $ \ensuremathbox{\chi^{2}}$ cuts. The Loose tag aims to maximize efficiency. As well as tighter $ \ensuremathbox{\chi^{2}}$ cuts, the Standard and Tight tags require at least one hit in the chambers outside the iron.

The muon identification efficiency for the standard tag is shown in figure 2.16. The overall efficiencies and misidentification probabilities for all four tags are summarized in table 2.4.

Figure 2.16: Identification efficiency in 1994 of standard muon tag as a function of momentum (above) and polar angle (below, where the solid line shows the efficiencies from simulation).

Table 2.4: Muon identification efficiencies and misidentification probabilities for the four MUFLAG tags on 1994 data. The data efficiencies and misidentification probabilities are determined using $ \tau^{-} \ensuremathbox{\rightarrow}\mu^{-} \ensuremathbox{\bar{\nu}}_{\mu} \nu_{\tau}$ and $ \tau^{-} \ensuremathbox{\rightarrow}\pi^{-} \pi^{-} \pi^{+} \nu_{\tau}$ (and charge-conjugate) events respectively. These can be compared with the equivalent numbers from the simulation (MC).
\begin{tabular}[t]{\vert l\vert l\vert l\vert\vert*{4}...
....4 & 1&.8 & 0&.2 & 0&.7 & 0&.1 & 0&.36& 0&.10\\

In both cases, particles only contribute if their momentum is greater than 3 $ \ensuremathbox{\mathrm{GeV}/c}$ and their polar angle is within $ 20\ensuremathbox{^\circ}< \theta < 42\ensuremathbox{^\circ}$ (MUF) or $ 52\ensuremathbox{^\circ}< \theta < 88.5\ensuremathbox{^\circ}$ (MUB). These selections exclude regions of poor track reconstruction ( $ \theta < 20\ensuremathbox{^\circ}$ and $ 88.5\ensuremathbox{^\circ}< \theta < 91.5\ensuremathbox{^\circ}$) or limited coverage (also $ 42\ensuremathbox{^\circ}< \theta < 52\ensuremathbox{^\circ}$). The MUS (only installed during 1994) is not used here.

Tim Adye 2002-11-06