# DAQ summary

Mikihiko Nakao (KEK, IPNS) mikihiko.nakao@kek.jp

March 19, 2009 SuperBelle meeting, KEK

#### Discussions

- Proposal from DAQ group on trigger rate limit
  - nominal L1 rate is 20 kHz, is 30 kHz maximum sufficient?
- Proposal from DAQ group on deadtime
  - 3.5% readout deadtime limit
- Hardware development and unification?
  - new proposal from G. Varner, adding DSP at FINESSE
- Still COPPER?
  - 30 kHz bottleneck
- PXD readout and event building?
  - 100 Gbps data after sparsification

# Trigger rate extrapolation



#### Simple extrapolation

|                                  | best    | worst   |
|----------------------------------|---------|---------|
| $\mathcal{L} = 1 \times 10^{35}$ | 2.5 kHz | 13 kHz  |
| $\mathcal{L} = 2 \times 10^{35}$ | 5.0 kHz | 26 kHz  |
| $\mathcal{L} = 8 \times 10^{35}$ | 20 kHz  | 100 kHz |

- Belle's trigger rate is dominated by luminosity term
  ⇒ 20 kHz will be the nominal trigger rate for SuperBelle
- Uncertainties
  - Luminosity term in Belle is not well understood
  - IR design is not fixed
- Maximum of 30 kHz will be sufficient? two major brickwalls
  - APV25 deadtime and COPPER bandwidth
  - Can trigger logic be adjusted to keep 30 kHz?

#### Proposal to detector people:

- At least be capable to handle 30 kHz L1 trigger rate with deadtime smaller than SVD-APV25
- Design the system to handle 60 kHz (twice) if no significant cost is added (hopefully it is the case except PXD and SVD)
  - FPGA based pipeline system should be capable
  - SVD APV25 to be reconsidered (e.g., 3 samples per trigger)
  - COPPER bottleneck has to be checked, subdivision may be needed (now assumes 1kB/ev max)
- Even higher rate O(100 kHz) will be for future
  (of course you can do if it does not introduce additional cost)
  - This L1 rate implies an RFARM with O(30000) CPU core!
  - No L1 trigger is needed anymore for DEPFET readout :-)

#### Deadtime issues

- Deadtime due to pipeline readout determined by:
  - $\bullet$  Depth of the ring buffer ( $N_{buf}$ )
  - Minimum time interval between two triggers ( $t_{in}$ )
  - $\bullet$  Time for data transfer to the next stage ( $t_{out}$ )
- Example
  - SVD (APV25):  $N_{\rm buf}$  = 5,  $t_{\rm in}$  = 210 ns,  $t_{\rm out}$  = 26  $\mu$ s gives 3.4% deadtime
- To be capable for 60 kHz L1 rate,  $N_{\rm buf}=$  6,  $t_{\rm out}=$  13  $\mu \rm s$  should be fine (but 16  $\mu \rm s$  is not OK) (e.g., for a 63 MHz clock logic with 60ch serialization, 14 clock time per channel is available)
- In reality, there are more buffers at datalink, FINESSE and COPPER — to be studied

#### Proposal to trigger people:

Could you provide trigger with 200 ns (12 clock at 63 MHz) spacing?

- Shorter than CDC drift time, shared hits by two L1 timings
- How about ECL?
- A longer spacing (~500 ns) will add 1% deadtime

#### Proposal to detector people:

Could you take event snapshot within 200 ns (12 clock at 63 MHz) spacing for  $N_{\text{buf}} = 5$  (6) events and readout within  $t_{\text{out}} = 26 \mu \text{s}$  (13 $\mu \text{s}$ )?

- A longer spacing (~500 ns) will add 1% deadtime
- t<sub>out</sub> speed is crucial
- lacktriangle A smaller  $N_{\mathrm{buf}}$  will be fatal for deadtime!

## Slow pipeline readout

- Readout time is expected to be slow in ECL (and PXD)
  - $\bullet$  Minimum buffer separation of 500 ns  $\times$  16 sampling (8  $\mu$ s)
  - Two or more trigger have to share the same sample
  - For  $t_{\rm in}$  < 500 ns, the same samples have to be read out could be separated offline ( $\sigma(t) \sim 100$  ns for 5 MeV)



#### Unification of readout

- Unified COPPER platform
- Unified RocketIO data link ?
- Unified FINESSE board (and firmware) ??
- ullet Unified waveform sampling and feature extraction ???



DAQ summary — M. Nakao — p.8



- BLAB3 is 8 channels, each 64k samples deep
- <~lus to read out 32-samples hit/BLAB3</li>

## DSP processing on Finesse

- First SuperBelle DAQ hardware with RocketlO
- Waveform sampling + feature extraction with DSP
- DSP on FINESSE violation of unification?
- Larger processing time if processing after gathering Faster if it is done in frontend (before merging)



# CDC prototype readout board



- 16ch/board
- BJT-ASB/Discriminator
- FADC: over 20MHz / 10bit
- FPGA: Vertex-5 LXT
  - TDC: 1 nsec counting
  - FADC reading
  - Control
- FPGA: Spertan3A
  - SiTCP for CDC study
- Connectors
  - RJ-45 for SiTCP
  - RJ-45 for DAQ timing signals
  - RJ-45 for DAQ data line
  - SFP for DAQ data line
  - Optical TX/RX for GDL
  - LEMO input x 3, output x1
- Shielded substrate

Board in fabrication

# CDC prototype analog daughter card



- Drift time: TDC based on 250 MHz running
- dE/dx: 8ch flash ADC
- Too much power consumption?

#### Datalink R&D

- IHEP group has fabricated a VME6U module with Virtex2pro
- Electrically tested, firmware to be developed soon



#### **COPPER** issue

- Limitation due to PClbus
  - $\bullet$  1 kB  $\times$  30 kHz would be maximum (need to test again)
  - Data should be trasmitted through the on-CPU GbE link (opposite to the current usage)



- Limitation due to CPU
  - Radisys EPC-6315 has a bottleneck, not usable at 30 kHz
  - New CPU to be developed, with Intel Atom

#### Datasize

- SVD (8kB) ⇒ even smaller due to better timing window? (2.5 kB?)
- $\bullet$  CDC (5kB)  $\Rightarrow$  twice more channels (10kB?)
- $\bullet$  ECL (10kB)  $\Rightarrow$  up to 30% occupancy? (12kB?)
- TOP (—) ⇒
- $\bullet$  ARICH (—)  $\Rightarrow$
- $\bullet$  KLM (4kB)  $\Rightarrow$
- $\bullet$  PXD (—)  $\Rightarrow$  400 kB, dominate the others, need special care

- Up to average 100kB / event if no PXD  $\Rightarrow$  3 GB/s bandwidth
- Max data rate up to 1kB/ev/COPPER (30 MB/s/COPPER)
- 30 Gb Ethernet connection needed

#### Event builder

- Technology EB unit: PC with Gb Ethernet
  - Simplified design single process, using buffers in the network driver (now: multi-process offline framework)
  - 2 Gbps out (2x GbE) consumes 8% Core2duo time
  - 1 Gbps in consumes 11% Core2duo / 20% Pen4 time max will be ~5 Gbps throughput per PC (need to test)
- Local event building
  - Combine multiple COPPERs
  - Multiple local EB units per subdetector
- Global event building
  - Parallel EB streams, each stream collects all local EB data
  - Subdetector level EB in addition
- Efficient reconfiguration connections from downstream
  - Connection invokes the EB process through inetd
  - Multiple connections are attached to a single process



 Large DEPFET datasize does not allow this kind of symmetric design



 $\bullet$  Relative data bandwidth scale factor is further bigger — PXD vs everything else  $\sim >\!\! 4$  : 1

## Proposal from DEPFET group

Datasize reduction with reconstructed tracks in FPGA processor



14 boards are interconnected in ATCA shelf ( $\simeq$  crate)

# FPGA computing node solution?

DEPFET data alone is not enough to reduce synchrotron radiation hits, need track matching with O(mm) resolution

- Need CDC+SVD track finding
  - Receive CDC+SVD raw data and run tracking on Virtex4?
  - Data bandwidth is sufficient (100 Gbps)
  - Too big latency to wait for COPPER raw data?
  - Too slow even with Virtex4 FPGA?
    (It takes O(0.3s) per P4 CPU now, Virtex4 has O(20) boost factor)
- Other possibilities
  - Do everything in RFARM need 5x bigger event builder
  - Do entire event building (or partial for PXD+SVD+CDC) in computing node (but still has to do tracking)
  - Prepare a huge buffer (100Gbs x > 10s ~ > 125Gbyte) in/before computing node and wait for the RFARM tracking parameters (Itoh-san's proposal)

# Summary

- Proposal from DAQ group on trigger rate limit
  - 30 kHz at least, but do not limit your design
- Proposal from DAQ group on deadtime
  - 3.5% readout deadtime limit
- Hardware development and unification?
  - still in prototype stage, we will see if unification is possible
- Still COPPER?
  - 30 kHz bottleneck to be tested again, no other good alternatives
- PXD readout and event building?
  - on-going discussion issue

# End

# RocketlO datalink

- RocketIO GTP (3 Gbps), GTX (6.5 Gbps) in Xilinx Virtex5 FPGA
  - 6.5 Gbps: 32-bit every 160 MHz,
    need to format a larger data packet (2000-bits)
  - 8b 10b encoding for safe transfer of 8-bit payload in 10-bit also allows formatting code embedded in data
  - Latency problem >30 clocks (O(1 $\mu$ s)) for en-/decoding
- Asynchronous to the RF clock
  - Has to be driven by a local oscillator
  - CDC/ECL triggers are much slower than the clock cycle



DAQ summary — M. Nakao — p.

Trigger and clock to COPPER (mixed system clocks — already working)



#### Trigger and clock to front-end



# OAQ summary — M. Nakao — p.25

## Deadtime-free trigger distribution

- Readout status from frontend through COPPER to TTD
  - $\bullet$  RocketIO + serialbus latency  $\sim$  1.5  $\mu$ s
  - Status can be embedded in the RocketIO datalink using the "K charactor" of 8b10b encoding
- Pipelined trigger handshake scheme
  - Data integrity (no data-driven FIFO full handling)
  - TTD can issue  $N_{\rm buf}$  (=5) triggers with at least  $t_{\rm in}$  (~ 200 ns) interval before seeing the response



## Continuous injection deadtime

- Injection noise
  - Short component all over the ring
  - Long component only in the injected bunch every  $10 \mu s$  (two components?)
- Injection veto for the L1 trigger
  - 150  $\mu$ s veto for short component 10% times 2.5ms for long component
- Injection effects on PXD?
  - ullet Takes 10  $\mu$ s to readout
  - Always affected by the long component



in circ (µS)

#### Radiation hardness / magnetic field

- Need to put FPGAs in the radiation area
  - SEU (single event upset) can affect the FPGA configuration memory
  - Crucial logic (buffer pointer, event counter, etc) has to be 3-fold redundant (partially damaged data is not a problem)
  - Latest FPGA (Virtex5) has ECC of config memory
- Optical transmitter may also get damaged (experiences from BESIII experiment — Z.-A. Liu)
- Series of radiation tests are scheduled
  - '09 March with neutron source, '09 April in KEKB tunnel...
- Magnetic field could be also an issue (CDC readout)
  - RocketlOf power supplies require ferrite filters which will stop working in the magnetic field

# FAQ — why not doing this and that?

- What if trigger rate exceeds 30 kHz? Isn't it too optimistic?
  - It's not as optimistic as reaching  $8 \times 10^{35}$ :-)
  - APV25 may be able to operate with less sampling mode.
    (Other readout system should preferably have a better margin)
  - COPPER has to be replaced with something else (e.g., making datalink compatible with Gb Ethernet and receiving with a huge PC cluster)
  - We also always have an option to tighten the trigger
- Why do you use COPPER from the beginning? Why not just a PC?
  - COPPER is more compact than a 1U rackmount PC (17 COPPER boards in 9U space)
  - FINESSE is easier to develop than a PCI card
  - We already have 200+ COPPER boards and software

# Physics and background rates

| hysics                                 | cross-  | rate (Hz)    | rate (Hz)             |
|----------------------------------------|---------|--------------|-----------------------|
| 1170100                                | section | at $10^{34}$ | at $8 \times 10^{35}$ |
| $\Upsilon(4S)$                         | 1.2     | 12           | 960                   |
| $q\overline{q}$                        | 2.8     | 28           | 2200                  |
| $\tau^+\tau^-$                         | 0.8     | 8            | 640                   |
| $\mu^+\mu^-$                           | 0.8     | 8            | 640                   |
| Bhabha (1/100 prescale)                | 44      | 4.4          | 350                   |
| $\gamma\gamma$ (1/100 prescale)        | 2.4     | 0.24         | 19                    |
| two-photon ( $p_T > 0.3 \text{ GeV}$ ) | 15      | 35           | 2800                  |
| total                                  | 67      | ~100         | ~8000                 |

Backgrounds (at Belle,  $\mathcal{L} \sim 1$ –1.5 × 10<sup>34</sup> and  $I_{\rm HER}$  +  $I_{\rm LER} \sim$  3A, rate~400 Hz)

- Luminosity term (~ 300 Hz) dominant, 2–2.5 times physics rate (Radiative Bhabha hitting endcap? BaBar's problem, not Belle's)
- Constant term is about 100 Hz at total

#### Frontend electronics

|             | sensor + analog       | digitization                |
|-------------|-----------------------|-----------------------------|
| PXD         | DEPFET pixel          | DCD+DHP ASICs(DEPFET group) |
| SVD         | strip (+ APV25)       | flash ADC + FPGA            |
| CDC         | sense wires           | flash ADC + FPGA TDC(?)     |
|             |                       | TARGET ASIC(U-Hawaii) (?)   |
| TOP         | MCP-PMT(?) + CFD      | HP TDC (?)                  |
|             |                       | BLAB2 ASIC(UHawaii) (?)     |
| ARICH       | HAPD(?)               | SA ASIC(JAXA) + FPGA        |
| ECL(barrel) | Csl(Tl) + photo-di.   | Flash ADC(2MHz) + FPGA      |
| ECL(endcap) | Csl(pure) + photo-di. | Flash ADC(42MHz) + FPGA     |
| KLM         | Sci. + Si-PM          | TARGET ASIC(U-Hawaii) (???) |
|             |                       |                             |

- Mostly driven by existing technologies constraints to DAQ
  ASIC developments, commercial flash ADC, heavy use of FPGA
- Unification to some extent is under discussion

# Four parameters ( $f_{RCLK}$ , $N_{buf}$ , $t_{in}$ , $t_{out}$ ) completely determine the deadtime characteristics

- SVD with APV25 readout at 30 kHz
  - 31.8 MHz readout clock (RF/16)  $\Rightarrow$  3.4% deadtime
  - 42.3 MHz readout clock (RF/12)  $\Rightarrow$  0.9% deadtime but the L1 trigger latency has to be 3  $\mu$ s (unacceptable!)
  - No other immediate alternative than using APV25
- FPGA-based outer detectors should work better
  APV Trigger Simulation (2)



- Min Lost: trigger restriction (1) = too little distance
- FIFO Lost: trigger restriction (2) = too many pending readouts

#### COPPER platform

- Modular structure 4x FINESSE daughter card for readout, 1x PrPMC CPU, 1x trigger receiver
- ~ 200 COPPER2 boards have been used in Belle in place of FASTBUS TDC with TDC-FINESSE (CDC, ACC, TRG, EFC (, KLM))



- PC or COPPER? —
   PCle would be too
   advanced, but no reason
   to start with PCl, and
   COPPER has a bigger
   channel density
- COPPER3 board: revised in 2008 for next 10-year lifetime

Heart of SuperBelle DAQ

(also constraint to the design)

## Trigger Timing Distribution (proposal)



- Limited number of fast signal through LVDS(or PECL/CML)
  4-pair CAT6 cable with RCLK (readout clock), Trigger,
  Revolution or other clock, plus 1 reserve line
- Requires one RJ-45 connector on the front-end board (already implemented in the CDC prototype board)
- Plan to make a VME6U module (second version of TT-IO)
- More control through slower Rocket1O link (tag, reset, etc)

#### Virtex5LXT

- Is Virtex5LXT a good choice?
  - Just RocketIO + FPGA logics, not much unnecessary things
  - 3rd generation RocketlO, with nice features like equalization
- Is Virtex5LXT too expensive?
  - XC5V30 costs like 350USD per chip, but if costs much more if we need something larger
  - So far I'm not willing to adopt additional DSP even if it's much cheaper — it requires doubled learning cost.
- No other choice?
  - I don't think it's a good idea to go back to Virtex2pro
  - Xilinx has announced Spartan6, should be a cheaper alternative but no details are given yet