# A 1.6Gbps Digital Clock and Data Recovery Circuit

Pavan Kumar Hanumolu, Min Gyu Kim, Gu-Yeon Wei<sup>1</sup>, and Un-ku Moon

School of EECS, Oregon State University, Corvallis, OR 97331

<sup>1</sup>Harvard University, Cambridge, MA 02138

Abstract—A digital clock and data recovery circuit employs simple 3-level digital-to-analog converters to interface the digital loop filter to the voltage controlled oscillator and achieves low jitter performance. Test chip fabricated in a  $0.13 \mu m$  CMOS process achieves BER  $< 10^{-12}, \pm 1500 \rm ppm$  lock-in range,  $\pm 2500 \rm ppm$  tracking range, recovered clock jitter of 8.9ps rms and consumes 12mW power from a single-pin 1.2V supply, while operating at 1.6Gbps.

#### I. INTRODUCTION

The ever increasing demand for large off-chip I/O bandwidth requires integration of many serial links on a large digital chip. These serial links need to be low-power, easily portable to different process technologies, and should operate reliably in noisy environments. A clock and data recovery (CDR) circuit is an integral component of these serial links and is the focus of this paper. Modern CDRs are commonly implemented using analog phase-locked loops (PLLs) as shown in Fig. 1 [1]. A bang-bang phase detector (!!PD) determines sign



Fig. 1. Conventional analog CDR.

of the phase error between incoming data and recovered clock  $(R_{CK})$ . The 3-level (early/late/no transition) phase error is converted to current by the charge-pump (CP) and filtered by the RC loop filter. The filtered control voltage  $(V_F)$  drives the voltage controlled oscillator (VCO) toward phase lock. The use of !!PD limits the pull-in range of the CDR to few thousand parts per million (ppm) thus requiring a frequency acquisition aid. There are several digital frequency locking loop (FLL) architectures in the literature [2] that can be used to drive the coarse control voltage  $(V_C)$  of the VCO to bring its frequency to within the pull-in range of the CDR. FLL is not the focus of this paper.

Even though these analog CDRs offer good performance, they do not easily port to different technologies, require extra mask steps to implement the passive elements, are susceptible to leakage, and prohibit quick production-level testing. In this paper, we present techniques to implement a digital CDR that provides an attractive alternative to overcome some of the drawbacks associated with analog CDRs. Section II presents a simple digital CDR architecture and identifies the issues associated with it. A new architecture that overcomes the issues of the simple digital architecture is presented in Section III. A linearized analysis of the CDR loop and the circuit design details are discussed in Section IV and Section V, respectively. Finally, the experimental results that validate the proposed design techniques are shown in Section VI.

#### II. DIGITAL CDR

A digital counterpart of the analog CDR shown in Fig. 1 can be arrived at by using a simple continuous to discrete time transformation. A relation between the discrete-time operator  $z = e^{j\omega T}$  and continuous-time operator  $s = j\omega$ , where  $\omega$  is the angular frequency of interest and T is the sampling period, can be derived using first-order Taylor series expansion of z as shown below:

$$z = e^{j\omega T} \approx 1 + j\omega T = 1 + sT \Rightarrow s = \frac{1 - z^{-1}}{T \cdot z^{-1}}.$$
 (1)

The above equation is valid only under the assumption that  $\omega \ll 1/T$ . This assumption is true in practice since the bandwidth of the CDR (few mega Hertz) is much smaller than the data rate (multi giga bits/second). We can now use Eq. (1) to transform the analog loop filter to a digital loop filter (DLF) as follows:

$$i_{cp}R + \frac{i_{cp}}{Cs} \Rightarrow i_{cp}R + \frac{i_{cp}T}{C} \frac{z^{-1}}{1 - z^{-1}}.$$
 (2)

Now, using Eq. (2) we arrive at the digital CDR architecture shown in Fig. 2. The proportional and integral gains are given



Fig. 2. A digital CDR obtained by s-to-z transformation.

by  $K_P$  and  $K_I$  and are equal to  $i_{cp}R$  and  $i_{cp}T/C$  respectively. A digital-to-analog converter (DAC) interfaces the DLF to the voltage controlled oscillator (VCO).

There are two major drawbacks with this simple digital CDR architecture. First, the implementation of the DLF requires high-speed adders A1 and A2 that consume prohibitively large power. Second, this architecture requires very high-speed, high-resolution DAC to convert DLF output to



Fig. 3. Proposed all-digital CDR.

an analog control voltage. For example, a 1.6Gbps CDR requires two 14-bit adders and a DAC operating at 1.6GHz. Decimation is commonly employed to alleviate the high-speed requirement [4]. However, decimation increases loop-latency which causes excessive dither jitter. Furthermore, even with reasonable decimation factors (for example 4), the design of the DAC (400MHz, 14-bit) would still be very complex. In the following section, an improved CDR architecture that obviates the need for these high-speed and high-resolution requirements will be presented.

### **III. PROPOSED ARCHITECTURE**

The block diagram of the proposed digital CDR architecture is shown in Fig. 3. This architecture implements several techniques that enable low-speed digital logic and low resolution DACs without incurring severe penalty on the looplatency and quantization error. The first improvement is based on the observation that the proportional path takes on only three values ( $\pm K_p$ , 0) and, therefore, digital adder A2 can be replaced with a simple 3-level current-mode DAC and a current summer.

Implementing a high-resolution integral path at full-rate still requires high speed adder A1 in Fig. 2. In order to alleviate this requirement, the two bit !!PD output is first de-multiplexed to 8-bits at quarter-rate and then re-quantized to 3-levels by a simple majority vote. The resulting 2-bits are integrated using a 14-bit accumulator operating at quarter rate. The three least significant bits of the accumulator output are discarded to suppress dither jitter caused by loop latency in the integral path. The remaining 11 bits are truncated to 3-levels using a second-order delta-sigma modulator (DSM), thus, obviating the need for a high-resolution DAC. The DSM shapes the quantization error to high frequency and the loop dynamics of the CDR suppresses this truncation noise, resulting in precise adjustment of the VCO frequency. The phase noise due to quantization error that leaks to the output depends on the architecture of DSM and the sampling frequency. Simulations show that with a second-order DSM operating at one quarter of the operating frequency contributes less than 2.5ps rms jitter to the recovered clock. The CDR loop is designed to have an over-damped response by ensuring that the ratio of the phase change from the proportional path to the phase change from integral path is more than 1000 [3].

When the frequency offset between the incoming data and the local oscillator is small, the proportional loop drives the PLL towards lock without cycle slipping. Phase-lock is acquired by dithering the VCO between two frequencies  $(\pm \Delta F_P)$  [3]. In this design  $\Delta F_P$  is chosen to achieve approximately  $\pm 1500$ ppm of *lock-in range*. In the presence of larger frequency error, the CDR cycle slips and the integral loop drives the VCO towards the data frequency in discrete steps. As opposed to an analog CDR, the discrete VCO control degrades CDR's immunity to a long string of consecutive identical digits (CIDs). In this design, the frequency resolution is better than 7ppm which results in the CDR's tolerance to more than 72,000 CIDs.

#### IV. LINEAR ANALYSIS

In this section we present the linearized analysis of the proposed CDR. The grossly non-linear transfer characteristic of the !!PD mandates non-linear techniques to fully analyze the CDR behavior. However, it has been shown that !!PD can be linearized in the presence of recovered clock jitter [4],[5]. The linearized gain  $K_{PD}$  of the !!PD in the presence of  $\sigma_j$  gaussian clock jitter is equal to  $\frac{1}{\sqrt{2\pi\sigma_j}}$  [4]. The small-signal model of the proposed CDR using the linearized !!PD is shown in Fig. 4. The three sources of noise in a digital CDR , also depicted in



Fig. 4. Linearized CDR model.

Fig. 4, are self-noise of the !!PD  $(S_{Q_{BB}})$ , quantization error due to the finite resolution of the integral path  $(S_{Q_F})$ , and the phase noise of the VCO  $(S_{\Phi_{VCO}})$ . The loop gain  $LG(z^{-1})$  is equal to:

$$LG(z^{-1}) = \frac{K_{PD}K_{VCO}T}{1 - z^{-1}} \left(K_P + \frac{K_I z^{-1}}{1 - z^{-1}}\right) z^{-M}$$
(3)

The impact of each of the noise sources can be evaluated by simple transfer function analysis. For example, the contribution of the !!PD quantization error to the output phase noise is given by

$$S_{\Phi_{OUT}}|_{BB} = \left|\frac{1}{K_{PD}}\frac{LG(z^{-1})}{1 + LG(z^{-1})}\right|^2 S_{Q_{BB}}$$
(4)

The phase noise contribution of the other noise sources can be calculated in a similar fashion, and the results are illustrated in Fig. 5. The close-in phase noise is dominated by !!PD self-



Fig. 5. Output phase noise contribution from individual noise sources. ( $\sigma_i = 7.5ps, \Delta F_P = 4MHz, \Delta F_I = 12MHz, M=3, T = 625ps$ )

noise, while the shaped error in the integral path dominates at higher frequencies. In this design, the intrinsic phase noise of the VCO has little impact on the overall phase noise.

#### V. CIRCUIT DESIGN

The proposed CDR is a digital intensive circuit. The digital building blocks such as adders can be built using simple digital logic or can be synthesized using standard cells. This section will focus on the design of analog building blocks such as the receiver frontend, 3-level DAC and the 4-stage ring oscillator. The receiver frontend circuitry shown in Fig. 6, recovers data ( $R_{DATA}$ ) and performs bang-bang phase detection through early (E) and late (L) signals. Sense amplifiers are used as data and edge samplers.



Fig. 6. Data recovery and phase detection circuit.

The VCO is implemented as a four stage ring oscillator and employs split-tuned differential cells shown in Fig. 7. Pseudodifferential inverters with rail-to-rail swing are used as delay elements. The external coarse control voltage ( $V_C$ ) is used to bring the VCO to within the pull-in range of the CDR. A duty cycle correcting buffer (not shown in Fig. 7) maintains accurate duty cycle ( $50\% \pm 1.5\%$ ) under process, temperature and voltage variations. The simulated phase noise of the VCO (including DACs) oscillating at 1.6GHz is -102dBc/Hz at 3MHz offset. The external coarse control is used to bring the



Fig. 7. 4-stage VCO employing split-tuned delay cell.

VCO to within the pull-in range of the CDR. The fine control voltage  $V_F$  controls all four delay elements to preserve equal spacing between phases thus making this architecture suitable for multi-phase clock recovery. The fine control voltage is generated by summing the proportional (PDAC) and integral (IDAC) paths in the current domain by 3-level DACs as shown in Fig. 8. The 3-level input -1, 0, +1 is converted to output



Fig. 8. DACs to generate fine control voltage  $V_F$ .

current  $(I_o)$  0, I and 2I respectively. Transistor  $M_3$  is used to minimize glitches due to clock feed-through and thereby reduce pattern jitter.

#### VI. EXPERIMENTAL RESULTS

A test chip is fabricated in a  $0.13\mu m$  CMOS process occupies  $0.1mm^2$  active area and operates off of a singlepin 1.2V power supply. The recovered data and the recovered clock operating at 1.6Gbps are shown in Fig. 9. Jitter of the recovered clock, with  $2^7 - 1$  PRBS data is 8.9ps rms (Fig. 10) and this jitter increases to 9.9ps with  $2^{31} - 1$  PRBS data. The measured bit error rate (BER) is less than  $10^{-12}$  with about 50mV of received data amplitude and 600ppm frequency offset. Fig. 11 shows the recovered clock spectrum when the DSM is clocked at 200MHz and 400MHz respectively. This figure demonstrates, as expected, that clocking the DSM at lower speed results in prohibitively large quantization noise leakage to the output. The measured jitter tolerance is greater than 2UI at 2MHz modulation frequency. The coarse tuning range of the VCO is 0.8-1.8GHz. The chip micrograph is shown in Fig. 12 and the CDR performance is summarized in Table I.



Fig. 9. Recovered data and clock.



Fig. 10. Recovered clock jitter.



Fig. 11. Recovered clock spectrum. (DSM clocked at 200MHz and 400MHz)



Fig. 12. Chip micrograph and performance summary table.

| TABLE I             |  |  |
|---------------------|--|--|
| PERFORMANCE SUMMARY |  |  |

| Technology                  | $0.13 \mu m$ CMOS             |
|-----------------------------|-------------------------------|
| Supply Voltage              | 1.2V                          |
| Operating Frequency         | 0.8-1.8Gbps                   |
| Lock-in range               | $\pm 1500 \mathrm{ppm}$       |
| Tracking range              | $\pm 2500 \mathrm{ppm}$       |
| BER @ 1.6Gbps               | $< 10^{-12}$                  |
| Input sensitivity           | $< 50 m V_{pp}$               |
| Jitter @ 1.6Gbps            | $2^7 - 1$ PRBS : 8.9ps rms    |
|                             | $2^{31} - 1$ PRBS : 9.9ps rms |
| Power consumption @ 1.6Gbps | 12mW                          |
| Active Die Area             | $0.1mm^2$                     |

## VII. CONCLUSION

A digital CDR architecture that obviates the need for complex analog circuitry is presented. A digital loop filter with fast feed-forward path and delta-sigma controlled integral path is introduced. A prototype implemented in  $0.13\mu m$  CMOS process operates at 1.6Gbps with a recovered clock jitter of 8.9ps rms and achieves BER less than  $10^{-12}$  while consuming 12mW from a 1.2V supply.

## VIII. ACKNOWLEDGEMENTS

We thank Samsung Electronics for providing IC fabrication, V. Kratyuk for his help with the board design, and M. Brownlee for many useful discussions. This work was supported by Intel Corporation.

### REFERENCES

- J. Cao et al., "OC-192 transmitter and receiver in standard 0.18µm CMOS," IEEE J. Solid-State Circuits, vol. 37, pp. 1768-1780, Dec. 2002.
- [2] S. Anand, B. Razavi, "A 2.75 Gb/s CMOS clock recovery circuit with broad capture range," *ISSCC Dig. Tech. Papers*, pp. 214-215, Feb. 2001.
- [3] R. Walker et al., "A two-chip 1.5-GBd serial link interface," IEEE J. Solid-State Circuits, vol. 27, pp. 1805-1811, Dec. 1992.
- [4] J. Sonntag, J. Stonick, "A digital clock and data recovery architecture for multi-gigabit/s binary links," *Proc. of IEEE CICC*, pp. 532-539, Sep. 2005.
- [5] J. Lee, K. Kundert, B. Razavi, "Analysis and modeling of bang-bang clock and data recovery circuits," *IEEE J. Solid-State Circuits*, vol. 39, pp. 1571-1580 Sept. 2004.