

### 2-D PT module concept for the sLHC CMS tracker

Special thanks to M. Pesaresi, M. Raymond, A. Rose A. Marchioro CMS Track-trigger task force CMS Tracker upgrade simulation team

**Geoff Hall** 

#### Outline

- Basic idea of, and motive for, PT module
  - already presented by M Pesaresi
- Describe one possible way of building such a module
  - not fully worked out
  - ideas are still evolving and we do not know the best route
- The requirements are not fully defined
  - originally the long term upgrade aimed to operate at 10<sup>35</sup> cm<sup>-2</sup>.s<sup>-1</sup>
  - it's obvious that this is a long way in the future and we should expect the requirements might evolve with the LHC physics discoveries and the operation of the LHC machine, which still faces many challenges
  - nevertheless, the LHC physics programme is of utmost importance and the longevity of the accelerator
  - hence R&D on new module concepts will prove invaluable
    - crucial to advance the means to construct advanced module types
    - must understand better the technical drivers to assemble these modules since significant investment is required

### **Basic module requirements**

• Compare binary pattern of hit pixels on upper and lower sensors



High p<sub>T</sub> tracks can be identified if hits lie within a search window in R-φ (rows) in second layer



Sensor separation and search window determines  $p_T$  cut

### Schematic of PT module

• Transfer hits to both edges –with minimal power – for comparison logic



### What defines module size?

- Assumed radial location ~25-45cm
  - allows to cover full CMS  $\eta$  range, with only barrel assembly
- Pixel size ~100µm x 2.5mm
  - $\sim$ 100 $\mu$ m matches required resolution and likely assembly precision for double layers
  - ~2.5mm defined by approximate projection of luminous region at R ~25cm and likely radial spacing ~2mm (next slide)
  - coarse enough for low cost bump bonding (or perhaps even wire for prototyping)
- 256 x 32 binary multiple of pixels with practical dimensions
- ~25.6mm x 80mm => 4 sensors in 150mm wafer ie one contribution to yield
- Chip height ~15mm fits reticle
- Expected occupancy at R = 25cm  $\sim 0.5\%$  @  $10^{35}$ 
  - so average <1 hit in column which may allow transfer to edge</li>
- Of course, this is history and explains initial parameters chosen for simulations
  - module can probably be enlarged, eg 384 x 32
  - but concept has avoided chip to chip connections i.e. >2 ASICs in height

### PT layer pixel size

- R- $\phi$ : compare to likely assembly precision ~ 100 $\mu$ m
- Should reduce need to compare many nearby columns
  - $-~\Delta$  independent of  $\eta,$  but offset in z between layers increases with  $\eta$



LHC luminous region L  $\approx$  28cm (±3 $\sigma$ ) – may be larger or smaller at SLHC

### Why edge-readout?

- Original motivations were:
  - it was (is) far from clear how to solve layer to layer interconnection problem
  - edge readout seemed to offer means to factorise this, eg by constructing a small, dedicated component
    - real connectors close to requirements do exist but would probably not be usable for mechanical reasons for ~8000 connections
  - profit from low occupancy and high density of lines on ASIC
    - no penalty in making comparisons at edge of module
  - constructing two module layers independently, then making final assembly is attractive conceptually
- This logic exposes some prejudices (which could be wrong)
- module should evolve from known technologies
- design should profit from known constraints (ie occupancy)
- design should be adaptable to changes in requirements
- a unique double layer assembly which is best commercially may not be possible
- the logic to be included is not yet well defined and may also evolve

### **Comparison** logic

- Modules are flat, not arcs
- Compensate for Lorentz drift
- Orientation of module
   => position dependent logic

- z offset  $\eta$  dependent
- search window to allow for luminous region and quantization => 3 pixels (if not tiny)



WIT 2010



### Possible PT layer readout

- shift register ruled out: 128/25ns = 5.12Gb/s
  - probably even if allow several BX and latency penalty
- worst case occupancy may mean >1 hit/BX
  - with adequate fluctuations to be permitted
  - NB from simulations, jets don't have much impact
- equip column with N x 8 address/data lines
  - N = maximum number of clusters allowed
  - ignore combinations consistent with wide clusters
  - seems easy to read out 3, or more clusters (also suspect occupancy is pessimistic)

one possible design for transferring up to 3 cluster addresses to end of column logic



then show some examples

end of column logic

one possible design for transferring up to 3 cluster addresses to end of column logic



if signal exceeds comp. thresh

logic







![](_page_14_Figure_1.jpeg)

up to 3 cluster addresses can be transmitted to end-of-column logic in this example

#### 2 hits on non-adjacent channels

![](_page_15_Figure_1.jpeg)

other examples in backup slides

### Track stub generation by matching layers

![](_page_16_Figure_1.jpeg)

![](_page_17_Figure_0.jpeg)

#### Layout

Make ROC + Assembler as single ASIC identical for both layers may switch off elements

Practical to produce a chip to serve several columns which eases data transfer and line density

Looks easy to match 4 x 2.5mm columns eg 2mm pitch

Looks possible to cover 8 columns with pitch  $\approx$  2mm

Width: 16mm to read 8 x 2.5mm

Chip size ≈ 18mm x 16mm (128) ~21mm x 16mm (192)

![](_page_18_Figure_0.jpeg)

### Possible assembly sequence

- Sensor
- ROCs bump bonded to sensor
- Invert sensor-ROC object
  - Exposed ROC areas then face up
- Place assembly on hybrid
  - hybrid has pre-mounted ancillary chips
  - Wire bond ROCs to hybrid
- Prepare partner module
  - assemble and connect together

### Weak points in this approach

- It is worth emphasising that this concept was originally to kick start a design effort. Nevertheless, so far insufficient attention given (by us) to several important issues
  - cooling seems feasible to include thermally conductive layers
    - but not necessarily more difficult, as no interior interlayer connections
  - location of links most likely to be on, or close to, the module
    - originally it seemed that links could be deployed on a remote bulkhead
    - this now seems implausible high speed electrical links are undesirable for noise and material reasons and there are no obvious close locations
    - optimal matching of bandwidth and power (if adjustable) not yet obvious
  - provision of power to module
    - we assume DC-DC conversion and local converters
  - overall mechanical design
    - involves constraints above

### Data volumes and link requirements

- Assume 24 bits/hit to transfer in each 25ns BX
  - includes time stamp and error coding

| for 40M channels in stacked layer<br>L = 10 <sup>35</sup> cm <sup>-2</sup> s <sup>-1</sup> |       |  |  |  |
|--------------------------------------------------------------------------------------------|-------|--|--|--|
| Channels/chip                                                                              | 128   |  |  |  |
| Occupancy                                                                                  | 0.005 |  |  |  |
| PT data reduction                                                                          | 0.050 |  |  |  |
| Channels above PT cut/BX/layer                                                             | 5,000 |  |  |  |
| bits/channel                                                                               | 24    |  |  |  |
| No links @ <b>5</b> Gbps (3.2Gbps data)                                                    | 1,500 |  |  |  |
| Power/link [W]                                                                             | 2.0   |  |  |  |
| Link Power [kW]                                                                            | 3.0   |  |  |  |
| Power/chan [µW] with 50% BW                                                                |       |  |  |  |
| usage                                                                                      | 150   |  |  |  |

#### send trigger data from **one** layer of stack

#### Comments

150μW is <u>conservative</u> estimate for trigger
data only
assumes 50% use of bandwidth

and 2W 5Gbps GBT

GBT power may improve

ideally should optimise power & speed

# but additional links required for full readout

with 6.4  $\mu s$  storage on each FE pixel

### Power <u>estimate</u> for PT module

|                   | Ρ [μW]<br>per pixel | Functions                                                     |
|-------------------|---------------------|---------------------------------------------------------------|
| Front end         | 25                  | amplifier, discriminator local logic, cf ATLAS 130nm<br>pixel |
| Control, PLL      | 10                  | 1 PLL/ROC @ 5mW, x 2                                          |
| Digital logic     | 2                   | transfer to edge (M Raymond)                                  |
| Comparison        | 4                   | logic (0.5mW/column)                                          |
| Data transfer     | 0.25                | few cm across module                                          |
| Data to local GBT | 0.25                | transmit 48bits/BX @ 1pJ/bit? ≈ 2mW/module                    |
| Concentrator      | 5                   | buffer to and from GBT: 2 ASICs @ 20mW                        |
| Full readout      | 20                  | following L1 trigger, extrapolate from CMS pixel              |
| Sub-total         | ~67                 |                                                               |

#### **Total with DC-DC ~90μW** 75% efficiency for DC-DC conversion

NB big uncertainties and significant guesswork

eg SEU-robustness, full control and timing, data volumes,....all required essential to improve on this with real design work

### **Other considerations**

- "Simplicity" is desirable
  - designers and users may have different perspectives on issues such as
    - grounding and shielding
    - control software development
    - initialisation and data unpacking software
    - off-detector firmware and digital processing (trigger system will probably continue to be debugged with real triggers only in-situ)
- Up to now, there was modest overlap between real end-users and ASIC designers
  - this will probably be more crucial for these modules
  - substantial evaluation programme should be foreseen

### An parable of evolution

- To make soap powder, liquid is blown through a nozzle.
- As it streams out, the pressure drops and a cloud of particles forms... thirty years ago, the spray came through a simple pipe that narrowed from one end to the other... it had problems with irregularities in size of grains, liquid or blockages...
- the nozzle has become an intricate duct, longer than before, with many constrictions and chambers. The liquid follows a complex path before it sprays. Each type of powder has its own nozzle design which does the job with great efficiency.
- The problem was too hard to allow even the finest engineers to explore with mathematics and design... they tried another approach... **evolution**: preservation of favourable variations and rejection of those injurious.
- Take a nozzle that works quite well and make copies, each changed at random. Test them for how well they make powder. Then impose a struggle for existence by insisting that not all can survive.
- Many altered devices are no better (or worse) than the parental form. They are discarded, but the few able to do a superior job are allowed to reproduce and are copied – but again not perfectly. As generations pass, there emerges a new and efficient pipe of complex and unexpected shape.
- from Steve Jones. *Almost Like a Whale.*

### Conclusions

- The requirements for trigger modules are not yet clear
  - especially the physics case and luminosity scenario
  - a lot of factors involved in constructing the module
- Power is crucial for such layers in future trackers
  - the trigger data transfer still dominates but maybe can be optimised
- Evolution is about finding a solution which matches the environment
  - intelligence may not be what we think it should be
  - we need to try out alternatives and really evaluate them
- It may be necessary to think hard about using expensive technologies in a way which allows us to try variants before committing to a single solution

### Backups

### Approximate parameters of trigger layers

| For stacked layer (doublet) |                  |  |  |  |  |  |
|-----------------------------|------------------|--|--|--|--|--|
| Pixel size                  | 100μm x 2.5mm    |  |  |  |  |  |
| ROC                         | 8 x 128 channels |  |  |  |  |  |
| <power>/pixel</power>       | 250μW (*)        |  |  |  |  |  |
| η <sub>MAX</sub>            | 2.5              |  |  |  |  |  |
| Bandwidth efficiency        | 50%              |  |  |  |  |  |

| R<br>[cm<br>] | L<br>[m] | A<br>[m²] | N <sub>face</sub> | N <sub>chan</sub> | N <sub>ROC</sub> | N <sub>module</sub> | N <sub>links</sub> | P<br>[kw] |
|---------------|----------|-----------|-------------------|-------------------|------------------|---------------------|--------------------|-----------|
| 25            | 3.0      | 9.6       | 64                | 38.5M             | 38k              | 4700                | 2880               | 9.6       |
| 35            | 4.2      | 18.7      | 88                | 75M               | 73k              | 9200                | 5610               | 18.7      |

(\*) With overlaps in R- $\phi$  or  $\eta$  expect additional 10-15%

present tracker ~35kW

### Making a trigger

- Stubs provide track trigger <u>primitives</u>
- Not yet proven how these contribute to trigger
  - and rate reduction achievable
  - many simulation studies under way
  - expect to match a series of stubs to a calorimeter or muon object
    - using off-detector processors
- Questions to answer include
  - how many layers are needed?
  - what is the optimal location, allowing sufficient  $\eta$  coverage?
  - what is the impact of material? in trigger layers and elsewhere
  - how important is z-measurement, and resolution?
  - what is the impact on tracking performance?
  - cost, power and material budget?
  - L0 trigger to guide?

# some digital power estimates (1) M Raymond

#### 8 bits data transmission through mux

hit occupancy per strixel column =  $0.5\% \times 128 = 64\%$ 93% of time only one cluster per 128 strixel column (simulation) => close to 100% each mux line can change state 50 % of time (either '1' or '0', so 50% of time will change from

so translating to an "average" toggling speed 20 MHz (line can only change state every 25 ns) x 0.64x 0.5= 6.4 MHz

![](_page_29_Figure_4.jpeg)

average power consumed in the mux gates 2 (gates / mux line) x 8 (lines) x 6.4 (MHz) x 9 (nW / MHz / gate) = 1 uW (note: pessimistic since not all mux gates will be active – depends on location of hit)

#### = ~ 50 nW per strixel channel negligible

transmission power associated with transmitting CMOS levels across chip (note this power also consumed in the gate driving the line, but consider separate here) (use 2 uW/MHz/cm) 2 x 6.4 (MHz) x 1.28 (cm) x 8 (mux lines) /128 strixels

#### = ~ 1 uW per pixel

# some digital power estimates (2)

#### 40 MHz clock distribution: 100 uW

(2 uW/MHz/cm) (128 channnel chip, 1.28 cm high)

= ~ 1 uW / channel (assumes 1 V transitions, 2 pF / cm)

#### channel logic (including mux select):

~ 30 gates toggling at channel occupancy frequency (0.5% x 40 MHz)

= 48 nW / channel (9 nW / MHz / gate) negligible

# => digital total associated with correlation data to edge of chip only ~ few uW / channel

## some circuit area estimates

.... for some of the logic described here

assuming 10 um<sup>2</sup> per inverter, 20 um<sup>2</sup> per 2 I/P NAND/NOR (in 130nm)\*

CWD logic:  $\sim 16 \times 16 \text{ um}^2$ 

priority logic to select mux: ~ 16 x 16 um<sup>2</sup>

mux logic for 3 per pixel:  $\sim 50 \times 50 \text{ um}^2$ 

so no big area consumption here – c.f. pixel size ~ 100 x 1500  $\mu$ m<sup>2</sup>

but much other functionality not yet included

\* W. Erdmann:

http://indico.cern.ch/getFile.py/access?contribId=7&resId=1&materialId=0&confId=55224

### cluster width discrimination

![](_page_32_Figure_1.jpeg)

In the diagrams the cluster width discrimination logic above is represented by

![](_page_32_Figure_3.jpeg)

#### example - one hit on one channel

![](_page_33_Figure_1.jpeg)

25 nsec pulse propagates through logic, selecting multplexer in first column address of hit pixel passes through multiplexer chain to end of column

#### one hit shared between 2 channels

![](_page_34_Figure_1.jpeg)

simple logic stops the lower channel signal feeding through to the mux select input so 7-bit address of only one channel ( $A_{n+3}$ ) is selected double channel cluster indicated by including hit signal  $D_{n+2}$  from lower channel<sup>35</sup> (so 8 bits altogether)

#### one hit shared between 3 channels

![](_page_35_Figure_1.jpeg)

CWD logic rejects signals in 3 (or more) adjacent channels no multiplexer is selected

#### 2 adjacent + 1 isolated

![](_page_36_Figure_1.jpeg)

**3 isolated hits** 

![](_page_37_Figure_1.jpeg)

all 3 multiplexers are now active a cluster further up the chip would be lost in this design (design would have to expand if more than 3 clusters required – but linear expansion of