A high efficiency readout architecture for a large matrix of pixels.

#### WIT 2010 – Workshop on Intelligent Trackers Feb. 4<sup>th</sup> 2010, Berkeley – CA

Alessandro Gabrielli, Filippo Maria Giorgi, Mauro Villa I.N.F.N. & University of Bologna

## Outline

- Target operating conditions
- Matrix overview
- Readout architecture
- Simulations & Efficiencies
- Demonstrator chip submission
- Applications

### **Target conditions**

- An architecture that can be integrated with hybrid pixel sensors or MAPS (Monolithic Active Pixel Sensor)
- 100 MHz/cm<sup>2</sup> hit rate (compatible with the rate foreseen in SuperB Layer0)
- **0.25 2.0 μs** BCO clock:
  - Time Counter clock, represents the time granularity of the events.
- **60-100 MHz** Matrix Readout Clock
- 3 Gbit/s data bus bandwidth per chip

#### Matrix Overview



F.Giorgi – WIT2010 Berkeley, CA

#### Matrix Overview



### Matrix Overview



- Pixel groups called Macro Pixels
- MP shape 2x8 pxl
- Reduce logic in every pixel
- Reduce global routes towards readout logic.

- 80K pixel matrix
- Total area ~ 1.3 cm<sup>2</sup>
- 130 Mhit/s
- 40 microns pitch
- 4 indep. submatrices
- 4 indep. column scans
- 4 active columns (AC) read in 1 clk

1024 pixels analyzed in 1 CLK cycle: @50MHz  $\rightarrow$  5 Gpxl/s

#### Matrix overview

- Binary **pixels matrix**
- Hit readout through a column-wide shared data-bus



#### **Submatrix Scan Policy**



#### **The Macro Pixels**

#### Matrix is divided into MPs: group of pixels (2x8)

- MP global lines:
  - Fast-OR line: (MP output) inclusive OR of all pixel latches.
  - Freeze line: (MP input) disable the reception of new hits.
- On BCO clock edge all MPs with active fast-OR :
  - Gets frozen
  - Are associated to the current value of BCO counter (Time Stamp) in a LUT memory of the readout
  - Waits to be scanned and reset



#### Matrix readout architecture

- Each sub-matrix scan has its own readout & scan logic
- All readout working in parallel
- Common final output stage



#### Sub-matrix readout architecture



11



#### The sparsifiers and barrels



#### Output stage data-bus solutions



- 8 bit TS (modulo 256 BCO counter)
- 9 bit X address (320 pixels)
- 8 bit Y address (256 pixels)
- TOT 25 bits
- $\rightarrow$  expected rate 130 MHit/s per chip = 130MHz x 25bit = <u>3.2 Gbps</u>
- 2. Zone sparsification & time sorting of the hits (TS heading the relative hits, 1 MHz BC clock) lead to:
  - 2 bit Barrel L2 address ( $\rightarrow$ 1/4 of submatrix: 80x64 pxl)
  - 2 bit Barrel L1 address (1 submatrix: 80x256 pxl)
  - 7 bit X address (80 pixels)
  - 3 bit zone Y address (8 vertical zones for each L2 barrel)
  - 8 bit zone pattern
  - TOT 22 bits
  - $\rightarrow$  expected rate: 130 (+1 TS) \* 22 = <u>2.8 Gbps</u>

BUT: assuming a x4 cluster factor of the form 2x2: in 87.5% of cases 2 hits only & in 12.5% are required 4 hits

- → [(22\*2)\* 0.875 + (22\*4)\*0.125 ] \*25 Mtrack s<sup>-1</sup> cm<sup>-2</sup> \* 1.3 cm<sup>2</sup>
  - Weighted average ~ <u>1.6 Gbps</u>

### VHDL model simulations

- VHDL models verification
- First-order estimation of the optimal parameters of the architecture in several working conditions:
  - Barrels depth
  - Zone width
- Long batch runs for efficiency estimation in function of the operating conditions (RDclk, rate etc...)

#### Efficiencies

- Two sources of inefficiency due to digital readout:
  - Frozen MP inefficiency: the hits generated on a frozen MP or on an already activated pixel are lost.
  - Overflow inefficiency: when a buffer is full it looses the eventual incoming hits.
- NO sensor efficiency is taken into account in these simulations. We give numbers relative to the digital readout architecture only.

#### Frozen MP Efficiency @ 100 MHz/cm<sup>2</sup>

#### (1-ineff.)

(Random and no clusters  $\rightarrow$  **no zone benefits** at this rate)



#### SuperPix0 - demonstrator with smaller matrix

- Submitted Sept. 2009
- Technology STM 130 nm
- Hybrid Pixels Matrix 128x32 pixels, 50 µm pitch (1/20 of the target matrix area)
- Only 2 readout instances implemented
  - The readout instances are oversized respect to the matrix height (32 vs 256 pixels), but the connections implemented allows to stimulate all the components.



#### SuperPix0 layout - ST130nm



### Application on a DAQ chain

For the **SLIM5 collaboration** a similar readout architecture was implemented within a MAPS sensor chip. It has been tested with a silicon strip telescope and a powerful DAQ system during a test beam in 2008



### Application on a DAQ chain



### Conclusions

- Development under challenging target conditions
- Optimization of the architecture in several directions:
  - High speed: 100 Mhit/cm<sup>2</sup> rate, processed 50 Gpxl/s.
  - High efficiency: 98–99%.

- Low bandwidth cost per chip: clusters optimization and time-wise scan halves down the bandwidth.
- Demonstrator submitted Sept. 2009 ...awaiting for silicon.

#### Thanks for your attention

Filippo Giorgi INFN Bologna (Italy) giorgi@bo.infn.it

F.Giorgi – WIT2010 Berkeley, CA

#### Backups

F.Giorgi – WIT2010 Berkeley, CA

## Study on Barrel optimal Depth:





F.Giorgi – WIT2010 Berkeley, CA

## Study on Barrel optimal Depth:





# The Slow Control bus: I<sup>2</sup>C–like system



Registers R/W access communication type

- I<sup>2</sup>C : two bidirectional open-drain lines.
- Serial Data (SDA)
- Serial Clock (SCL), pulled up with resistors.

### **Slow Control**

- I Set of Read/Write registers
  - Chip settings
  - MP masks
- I set of Read Only registers
  - Acquisition flags
  - Rate counters
  - Error flags

# Sub-matrix readout Efficiency table

#### Hit rate 100 $MHz/cm^2$

|     | sim<br>DURATION | RDclk | BCO  | Mean<br>Sweeping | global hit | rate on area | B2    | B1    | Scan<br>buffer | Already<br>hit effi | Frozen MP    | Overflow<br>effi B2 | Overflow<br>effi B1 |
|-----|-----------------|-------|------|------------------|------------|--------------|-------|-------|----------------|---------------------|--------------|---------------------|---------------------|
| RN  | (us)            | (MHz) | (us) | time (us)        | rate (MHz) | (MHz/mm2)    | depth | depth | overflow       | (%)                 | effi (%)     | (%)                 | (%)                 |
| 107 | 1               | 60    | 0,5  | 0,45             | 33,8       | 1.03         | 8     | 32    | 0              | 99,96               | 98,90        | 100                 | 100                 |
| 108 | 1               | 80    | 0,5  | 0,34             | 33,8       | 1.03         | 8     | 32    | 0              | 99,95               | <u>99,39</u> | 100                 | 100                 |
| 109 | 1               | 100   | 0,5  | 0,27             | 33,8       | 1.03         | 8     | 32    | 0              | 99,96               | 99,53        | 100                 | 100                 |
| 110 | 1               | 60    | 1    | 0,75             | 33,8       | 1.03         | 8     | 32    | 0              | 99,91               | 98,83        | 100                 | 100                 |
| 111 | 1               | 80    | 1    | 0,56             | 33,8       | 1.03         | 8     | 32    | 0              | 99,91               | 99,10        | 100                 | 100                 |
| 112 | 1               | 100   | 1    | 0,45             | 33,8       | 1.03         | 8     | 32    | 0              | 99,91               | 99,25        | 100                 | 100                 |
| 113 | 1               | 60    | 1,5  | 0,95             | 33,8       | 1.03         | 8     | 32    | 0              | 99,86               | 98,78        | 100                 | 100                 |
| 114 | 1               | 80    | 1,5  | 0,71             | 33,8       | 1.03         | 8     | 32    | 0              | 99,86               | 99,05        | 100                 | 100                 |
| 115 | 1               | 100   | 1,5  | 0,57             | 33,8       | 1.03         | 8     | 32    | 0              | 99,86               | 99,23        | 100                 | 100                 |
| 116 | 1               | 60    | 2    | 1,08             | 33,8       | 1.03         | 8     | 32    | 0              | 99,84               | 98,42        | 100                 | 100                 |
| 117 | 1               | 80    | 2    | 0,81             | 33,8       | 1.03         | 8     | 32    | 0              | 99,83               | 98,81        | 100                 | 100                 |
| 118 | 1               | 100   | 2    | 0,65             | 33,8       | 1.03         | 8     | 32    | 0              | 99,83               | 99,04        | 100                 | 100                 |

#### SIMULATIONS: the infrastructure

- Realistic VHDL model of a Sub-matrix for behavioral simulation.
  - 2D array of MP entities, each one with uniform random hit generation. User-defined hit rate.
  - NO pixel dead time taken into account. (pixel immediately reset after read)

#### VHDL test bench

- Integrated data integrity check.
- Efficiencies evaluation.

- File logs
  - · Simulation runs e-log (storing the whole parameter set for each simulation)
  - Frozen hit log (once a MP gets frozen, the fired pixels within are stored in absolute x-y format)
  - · B2 Readout log (stores the hits read out from any of the B2 decoded in absolute x-y format)
  - B1 Readout log (stores the hits read out from B1 decoded in absolute x-y format)
  - Output log (stores the hit read out from the final output stage decoded in absolute x-y format)
- Hit controller program: a C++ tool that checks the correspondence between the frozen hit log and output log.

- Pixels grouped into Macro Pixels:
  - Minimum entities addressable by readout logic
  - Minimum entities for time tagging



- Pixels grouped into Macro Pixels:
  - Minimum entities addressable by readout logic
  - Minimum entities for time tagging



- Pixels grouped into Macro Pixels:
  - Minimum entities addressable by readout logic
  - Minimum entities for time tagging



- Pixels grouped into Macro Pixels:
  - Minimum entities addressable by readout logic
  - Minimum entities for time tagging



- Pixels grouped into Macro Pixels:
  - Minimum entities addressable by readout logic
  - Minimum entities for time tagging



- Pixels grouped into Macro Pixels:
  - Minimum entities addressable by readout logic
  - Minimum entities for time tagging



- Pixels grouped into Macro Pixels:
  - Minimum entities addressable by readout logic
  - Minimum entities for time tagging



- Pixels grouped into Macro Pixels:
  - Minimum entities addressable by readout logic
  - Minimum entities for time tagging



#### **Components synthesis**

| Components          | Flip Flop<br>registers | Logic gates |
|---------------------|------------------------|-------------|
| B2                  | ~140                   | ~1400       |
| B1                  | ~1000                  | ~6700       |
| Concentrator        | ~230                   | ~1000       |
| Concentrator<br>out | ~120                   | ~370        |
| I2C interface       | ~130                   | ~600        |
| Mask register       | ~520                   | ~2300       |
| Scan buffer         | ~4700                  | ~9600       |
| Register file       | ~1000                  | ~2700       |
| Sparsifier          | ~160                   | ~1000       |
| Sweeper             | ~7200                  | ~16200      |