# Timing Recovery in CMOS using Nonlinear Spectral-line Method Un-Ku Moon, Angelo Mastrocola, Jeanann Alsayegh, and Scott Werner AT&T Bell Laboratories, Allentown, PA 18103 ABSTRACT: In a Carrierless AM/PM (CAP) passband modulation scheme where the waveform does not contain a baud rate spectral line due to zero mean-value of the data symbols, the nonlinear spectral-line method is applied to extract the symbol rate. It is achieved first by a squaring function which draws out the higher moments of the signal that are periodic at the symbol rate; then the signal is further processed through a bandpass filter and phase-locked loop (PLL) combination to recover signal timing. All necessary timing recovery functions are implemented in the analog (continuous-time) domain, and the recovered timing information is used by a receive equalizer. The timing recovery block implemented in 0.9μm CMOS includes a multiplier for the squaring function, a self-tuning bandpass filter, and a PLL using an external VCXO. The 51.84MHz recovered clock allows a BER of 10-10 [1]. ### I. INTRODUCTION This paper describes an IC implementation of timing recovery using the nonlinear spectral-line method [2]. This analog implementation of the timing recovery IC was fabricated in 0.9µm CMOS technology and fully verified to meet all the requirements for a system utilizing Carrierless AM/PM (CAP) modulation scheme [1]. The following sections will include a brief summary of the nonlinear spectral-line method (section II), the overall structure and some sub-block details of the IC implementation (section III) including the squarer, the self-tuned bandpass filter, and PLL, and some meaningful measurement results (sections IV). ## II. NONLINEAR SPECTRAL-LINE METHOD The key principle in nonlinear spectral-line timing recovery is that a memoryless nonlinearity, $f(\cdot)$ , is applied to the incoming signal, s(t), so that the expected value of the output, E[r(t)], where r(t)=f(s(t)), is non-zero and periodic with the symbol period T [2-5]. This is due to the inherent fact that if we treat the phase of the signal (signals in modulation schemes such as QAM, PAM, and CAP) as a random parameter, the signal represents a cyclostationary process. In a simpler case where E(s(t)) is already non-zero and periodic with period T (i.e. cyclostationary), the necessary spectral-line at the symbol rate exists and a nonlinearity does not need to be applied. Such condition qualifies as linear spectral-line method [2]. Typical ways of applying a memoryless nonlinearity to the zero-mean input signal are using a class of absolute value circuits (i.e. rectifiers), a squarer, and even fourth-power circuits [6]. Regardless of the method used, it must elicit the cyclostationary behavior that is contained at the higher moments of the incoming signal. Perhaps the most common and clearly explained implementation is the squarer. The IC implementation uses this method, sometimes known as envelope-derived timing recovery [2-3]. In regard to understanding the functions of the memoryless nonlinearity in the context of timing recovery, it is generally "safe" to dismiss any distinction between different inphase and quadrature data-transmission systems such as QAM and CAP which can fall under passband PAM [2], and further dismiss any differences between baseband and passband PAM [2-3]. This is well-justified given that some sort of band-limiting pre-filter normally exists (typically part of the transmission path), and the output of the nonlinear function, $f(\cdot)$ , is filtered by a bandpass filter. The timing recovery IC operates on a received passband signal $$s(t) = \operatorname{Re} \left\{ \sum_{n=-\infty}^{\infty} c_n g(t-nT) e^{(j\omega_n t)} \right\},\,$$ where $c_n = a_n + jb_n$ is the complex input data sequence and g(t) is the filtered baseband pulse shape modulated by the carrier $\omega_c$ . The squared signal, $r(t) = s(t)^2$ , has a desired timing tone (symbol rate $\omega_s$ ) whose power is [3] $$R(\omega)\big|_{(\omega=\omega_s)} = \sum_{k=0}^{N-1} |C(k\Omega)|^2 G(k\Omega) G(\omega_s - k\Omega).$$ (1) In (1), $C(k\Omega) = \sum_{n=0}^{N-1} c_n e^{(j\omega_n t)}$ , $\Omega = \frac{\omega_s}{N}$ , is the DFT of the complex input data sequence $c_n$ with period N. $G(\omega)$ is the spectrum of g(t). The DFT is invoked because of the periodic nature of $c_n$ . However, in the limiting case of perfectly random, uncorrelated $c_n$ , $N \to \infty$ and $|C(\omega)|^2 \to \text{constant}$ , and (1) approaches the form $$R(\omega)|_{(\omega=\omega_s)} = \overline{c^2} \int_{-\infty}^{\infty} G(\omega) G(\omega_s - \omega) d\omega.$$ (2) From Fig. 1 and 2, it is clear that the amount of overlap between $G(\omega)$ and $G(\omega_s - \omega)$ directly impacts on the strength of the desired timing tone. This spectral overlap is a function of the excess bandwidth parameter $\alpha$ , which can vary from 0 to 1 (0 to 100%). Note that in the extreme case of $\alpha$ =0, $G(\omega)$ and $G(\omega_s - \omega)$ are disjoint, and no timing tone is generated. In our applications, signals are transmitted with 2.2.1 Fig. 1 Effect of excess bandwidth on symbol tone extraction Fig. 2 Overall timing recovery structure 100% [1] and 50% excess bandwidth. ### III. TIMING RECOVERY IMPLEMENTATION The overall structure of the IC implementation is shown in Fig. 2. This timing recovery is realized in the continuoustime (except for the inherent sampling nature of the phase detector) domain. To minimize sensitivity to the surrounding noisy environment, all circuits are fully differential wherever possible. As discussed in the preceding section, the squaring function is used to apply a memoryless nonlinearity to the incoming signal. And the subsequent bandpass filter eliminates a large amount of undesired tones (noise) surrounding the symbol frequency (12.96MHz). The input to the PLL is a slicer which turns this "noisy tone" into a "jittery clock" (which can have missing or extra edges) at this symbol rate; the PLL locks onto this jittery input clock and filters out its jitter using an external VCXO. After the symbol timing is recovered, the jitter-free clock is fed into an applicationspecific receiver which processes the incoming CAP signal. ### A. Squarer (Multiplier) The squaring function is implemented with a set of cross-coupled MOSFETs operating in triode [7], as shown in Fig. 3. The input and output buffers are source-follower stages to remove the resistive load requirement from the previous stage, and to have a large enough driving capability at the output. The op amp symbol represents a differential single-transistor gain stage with simple common-mode feedback. Fig. 3 Squaring function by using a multiplier This portion of the design was kept simple to achieve a large bandwidth so that the squaring function would not be hindered by the limited bandwidth. The overall signal path has more than 100MHz bandwidth for either inputs (X or Y) according to simulation results. The bandwidth on the squaring/multiplying function is limited only by the input buffers and the cross-coupled MOSFETs (current output). The rest of the bandwidth limitation only applies as a spectrum shaping function, which only needs a 12.96MHz bandwidth in this application. For the set of cross-coupled MOSFETs alone, even a gigahertz range of input bandwidth can be achieved by placing an enhancement capacitor across the virtual grounds [8]. Lastly, the feedback resistors are implemented with triode MOSFETs to match the input triode devices for superior gain stability over process and temperature variation. # B. Self-tuned Bandpass Filter The bandpass filter and the automatic tuning circuit structures are similar to Khoury's Gm-C filter design [9]. For this application, the two blocks used for voltage-controlled oscillator (VCO) and the bandpass filter are identical. For the VCO, the signal input is disabled, and for the filter, the positive feedback is disabled. This should allow a better matching between master (VCO) and slave (filter) in achieving an accurate automatic-tuning. The VCO oscillates due to the positive feedback at the signal's zero-crossing [9]. This is done by the limited range in the transconductance of the positive feedback input. At larger signal swings of either polarity, the positive feedback is removed, until the signal swings back near zero. At zero crossing, a net positive feedback is realized. One can visualize the operation by picturing a pendulum swinging back and forth at its natural frequency ( $\frac{G_m}{C}$ ) while a small horizontal force is applied in the direction of movement at the instance where the arm is lined up vertically. Implanted in these VCO and filter biquad blocks is a small range of digital tunability should it be needed for improved tuning accuracy. Figure 4 displays the bandpass filter measured frequency response for a few digital settings. For the few dozen devices characterized, the center frequency of the Fig. 4 Measured bandpass filter frequency response filter at its default setting was within $\pm 5\%$ of 12.96MHz. The quality factor (Q) and the frequency accuracy ( $f_c$ ) of the bandpass filter directly affects the final jitter of the recovered clock. Having a smaller Q requires a better PLL implementation (more noise gets through). However, a larger Q filters out more noise, but requires a more accurate $f_c$ (or else we lose the timing tone). Such tradeoffs must be considered in optimizing performance over process, temperature, and matching tolerances. Reported discrete (manually-tuned) implementations have up to Q=50 [5], while it it impractical for IC design. Some pre-filtering techniques (before the squarer) can further aid in reducing the jitter [2-3],[5]. # C. Phase-locked loop The PLL requires a very narrow-band jitter transfer function; it must filter out the majority of the wide-band jitter still contained in the signal processed through the squarer and the bandpass filter. Use of a VCXO with a very narrow tuning range aids in this matter. An AT&T 154-type quartz crystal oscillator (or equivalent) with a tuning range of ±100ppm for 1-4volt input control range (5V supply) is used to make up the complete PLL. As shown in Fig. 2, the input to the PLL is first squared-up by a high-speed slicer (zero-threshold comparator). The phase detector inside the PLL block is a simple XOR gate. It is important to use a non-edge-triggered phase detector because the input to the phase detector is very jittery with multiple edges (or no edge) per symbol period. Simulation results as well as lab measurements have verified this phenomenon. The buffered XOR output drives a charge-pump, which is filtered by external components. An all-digital implementation similar to [10] was also verified to function properly both in simulation and measurement, but the XOR and charge-pump combination was chosen for its simplicity and smaller chip area. ### IV. MEASUREMENT RESULTS Presented in the following figures are the measurement results, verifying the intended and anticipated performance of the timing recovery IC. Extensive simulation results prior to fabrication have shown close agreement with the measurement. Simulations were performed using a combination of C code, MATLAB, and AT&T's version of SPICE (ADVICE). The channel to the timing recovery block was modeled with a pseudo-random number generator for the data sequence, inphase and quadrature FIR filters for generating the CAP-16 signal, an FIR-modeled transmission line, an all-passive LC smoothing filter, and additive white Gaussian noise (AWGN) yielding a SNR of 20dB. The measured spectrum of the incoming signal for a 50% excess bandwidth system is shown in Fig. 5. The lab measurements displaying the regeneration of the symbol tone by squaring and filtering is shown in Fig. 6. With the help of a narrowly tunable VCXO, the measured jitter transfer function of the PLL demonstrates about 500Hz -3dB corner frequency. Fig. 7 displays the PLL transfer Fig. 5 Measured spectrum of incoming CAP-16 signal Fig. 6 Measured spectrum after squarer and filter function and its measured output jitter spectrum, which follow a similar shape over frequency. Finally, shown in Fig. 8 and 9, respectively, are the recovered clock used by a CAP-16 receive equalizer and the constellation of the demodulated, received data that was sent over a maximum length (worstcase) test-link. The recovered clock, measuring about Fig. 7 Measured jitter transfer function and spectrum Fig. 8 Measured recovered clock at sample rate (51.84MHz) Fig. 9 Measured received CAP-16 constellation 1.4ns peak-to-peak jitter, meets the BER requirement of 10<sup>-10</sup> for an application such as [1]. ### V. CONCLUSION A timing recovery scheme using a nonlinear spectral-line method was presented. An all-analog implementation in $0.9\mu m$ CMOS technology successfully demonstrates the required performance for an application using Carrierless AM/PM [1]. Justification for the IC's applicability in all synchronous multi-level pulse amplitude modulation systems has been summarized. The three key functional blocks in this timing recovery structure, squarer, bandpass filter, and PLL, were described, and design tradeoffs were discussed. Some ways of improving the timing recovery process such as prefiltering before the squarer, and higher-Q with more accurate $f_c$ of the post-filter have been mentioned. # VI. ACKNOWLEDGMENT Authors are thankful for much help provided by Nat Dwarakanath and his team, Art Grandle and his team, Derrick Johnson, Jit Kumar and his team, Dale Nelson, Gerry Pepenella and his team, Jacque Ruch, Joe Trackim, J.J. Werner and his team, and Jay Zeman. ### REFERENCES - [1] G.H. Im, D.D. Harman, G. Huang, A.V. Mandzik, M.H. Nguyen, and J.J. Werner, "51.84 Mb/s 16-CAP ATM LAN standard," *IEEE J. Select. Areas Commun.*, vol. 13, no. 4, pp. 620-632, May 1995. - [2] E.A. Lee and D.G. Messerschmitt, Digital Communication, Second Edition, Boston: Kluwer Academic Publishers, 1994. - [3] R.D. Gitlin and J.F. Hayes, "Timing recovery and scramblers in data transmission," *Bell Syst. Tech. J.*, vol. 54, no. 3, pp. 569-593, March 1975. - [4] W.R. Bennett, "Statistics of regenerative digital transmission," Bell Syst. Tech. J., vol. 37, pp. 1501-1542, Nov. 1958. - [5] L.E. Franks and J.P. Bubrouski, "Statistical properties of timing jitter in a PAM timing recovery scheme," *IEEE Tran. Commun.*, vol. COM-22, pp. 913-920, July 1974. - [6] J.E. Mazo, "Jitter comparison of tones generated by squaring and by fourth-power circuits," *Bell Syst. Tech. J.*, vol. 57, pp. 1489-1498, May-June 1978. - [7] B.-S. Song, "CMOS RF circuits for data communications applications," *IEEE J. Solid-State Circuits*, vol. SC-21, no. 2, pp. 310-317, April 1986. - [8] J. Crols and M.S.J. Steyaert, "A 1.5 GHz highly linear CMOS down-conversion mixer," *IEEE J. Solid-State Circuits*, vol. 30, no. 7, pp. 736-742, July 1995. - [9] J. Khoury, "Design of a 15-MHz CMOS continuous-time filter with on-chip tuning," *IEEE J. Solid-State Circuits*, vol. 26, no. 12, pp. 1988-1997, Dec. 1991. - [10] J.F. Ewen, A.X. Widmer, M. Soyuer, K.R. Wrenner, B. Parker, and H.A. Ainspan, "Single-chip 1062Mbaud CMOS transceiver for serial data communication," ISSCC Dig. Tech. Papers, pp. 32-33, Feb. 1995.