# An All-Digital PLL Synthesized from a Digital Standard Cell Library in 65nm CMOS

Youngmin Park and David D. Wentzloff University of Michigan, Ann Arbor, MI, 48109, USA

Abstract—This paper presents an all-digital PLL (ADPLL) in which all functional blocks have been synthesized from standard digital cells and automatically placed and routed (P&R). A calibration scheme is proposed to account for the systematic mismatch resulting from P&R. The ADPLL is fabricated in 65nm CMOS and occupies  $0.042 \text{mm}^2$ . The period jitter is  $3.2 \text{ps}_{\text{rms}}$ (36ps<sub>pp</sub>) at 2.5GHz, and the power consumption is 9.1mW to 14.6mW over a 1.5 to 2.7GHz frequency range.

### I. INTRODUCTION

CMOS scaling into the nanometer regime has resulted in improved timing accuracy, power, and density of digital logic gates, while analog circuits suffer from reduced supply voltage and increased gate leakage [1-3]. As a result, some traditionally analog-only circuits have progressed to digitally-assisted designs, and more recently all-digital designs, that utilize the precise time control and sophisticated digital signal processing. Recently published all-digital phase locked loops (ADPLL) have shown several advantages over conventional analog PLLs in terms of area, scalability, testability, and programmability [4-12]. In many ADPLLs, conventional structures such as phase detectors and analog filters are replaced with time-to-digital converters and digital loop filters, thereby significantly reducing area and providing higher compatibility with other digital blocks. Also, the digital interface between functional blocks in the ADPLL improves testability and programmability of the circuits.

Another advantage of an all-digital architecture is that more of the circuit blocks can be absorbed into the *digital design flow*. The design procedure for digital logic circuits is now highly sophisticated, with the synthesis, layout and verification of the circuits being automated with design tools. Analog circuits, on the other hand, require comprehensive characterization of devices to achieve a target performance. Moreover, the performance is highly dependent on the layout, thus a fully-custom layout is required. Therefore, it is commercially advantageous to have more analog functions implemented digitally, and that the design of the circuits is automated with CAD tools, sometimes even at the expense of degraded performance.

In Section II, this paper presents an ADPLL suitable for clock synthesis for digital systems such as microprocessors. What differentiates this ADPLL from prior work is that all functional blocks are synthesized from standard digital cells and automatically placed and routed (P&R) using design tools. The design procedure of this completely *synthesizable* ADPLL significantly shortens the design time compared to full-custom circuits, and it enhances the portability of the ADPLL as a block for various applications. The proposed cell-based design methodology, which leverages standard cell engineering and design automation, becomes even more attractive in advanced technologies that have severely restrictive design rules [13].

Automatic P&R, however, introduces variation in the placement and routing of cells in the circuits. This is particularly a challenge in the design of precision ADPLL blocks such as the DCO and TDC. In Section III, this paper introduces a calibration scheme that accounts for systematic mismatch from automatic layout of the DCO, which is applicable for the ADPLL in closed-loop operation, and is also applied to the DCO-based TDC [14]. Through the proposed calibration scheme, the systematic mismatch in delays is measured only once, and applied to improve the DCO and TDC performance.

#### II. ADPLL ARCHITECTURE

A block diagram of the proposed ADPLL is shown in Fig. 1. The TDC compares the edges of the reference clock  $(F_{ref})$  and the divided output clock  $(F_{div})$ , and provides 16-bit phase difference measurements to the digital loop filter (DLF). The proposed TDC operates in two steps; a coarse step and a fine step, producing 8-bit MSBs and 8-bit LSBs, respectively. The prescaler combines this output by weighting the MSBs and the LSBs separately, making a 16-bit input to the DLF. This enables floating point processing of the TDC output, and provides more programmability of the coarse/fine step resolutions. The DLF consists of a proportional path and an integral path, each with programmable coefficients. The 3-bit MSBs of the DLF output determine whether the DCO controller increases, decreases, or holds the DCO frequency. The 8-bit LSBs of the DLF output are used to dither the DCO frequency to reduce the effect of limit cycles. The dithering block is clocked at a higher frequency of  $F_{out}/4$ . Finally,  $F_{out}$  is divided by an integer N (1-to-511), and compared with  $F_{ref}$ . In calibration mode, on-chip counters are used for one-time characterization of the TDC and the DCO, which is later used to control them in locking mode.



Fig. 1. Block diagram of proposed ADPLL.

#### A. Cell-based DCO

All blocks, including the DCO, are constructed only of standard cells and synthesized from a cell library. Fig. 2 shows the block diagram of the DCO and the DCO control blocks. The DCO consists of 5 stages, and each stage is implemented with 64 inverting tri-state buffers connected in parallel (63 buffers and one NAND gate in Stage1). The maximum frequency is obtained when all buffers are turned on, and the frequency is reduced by turning off buffers. Instead of custom layout to achieve matched delays [6-8], the buffers in the DCO are automatically laid out by design tools. This introduces systematic mismatch between buffer delays which was found through measurements to be dominated by wire routing variation. In calibration mode, the effective drive strength of each buffer is measured, capturing the relative differences between buffers. Then, buffers are sorted based on drive strength, and the reordered indexes are stored in a local memory. The DCO controller decodes one row of the memory each cycle, and turns on/off the indexed buffers according to the DLF output. This reordering procedure provides a coarse/fine frequency control, and improves the resolution of the DCO. The dithering block generates 2-bit control signals to pseudo-randomly turn on/off the buffers at a rate proportional to the 8 LSBs of the DLF output. Two buffers in Stage2 and Stage3 are dedicated to dithering the DCO frequency.

## B. Cell-based TDC

We use a DCO-based Vernier TDC to measure the phase difference between  $F_{ref}$  and  $F_{div}$ , and the details of the TDC architecture and operation are described in [14]. As shown in Fig. 3, the TDC employs two DCOs with slightly different periods, and the phase difference is measured in two steps. First, the coarse time difference is measured by counting the number of slow DCO cycles between the two rising edges. Second, the residue of the time difference is measured by counting the number of fast DCO cycles during the time it takes the fast DCO to catch up to the slow DCO. The coarse step resolution is the slow DCO period, and the fine step resolution is the difference between the fast and slow DCO periods. The two step measurement and ring structure allows for an extended



Fig. 2. Block diagram of DCO and DCO control block.



Fig. 3. Block diagram of TDC and coarse/fine steps.

detection range, and the Vernier structure using two DCOs provides a fine resolution. We adopt a similar structure for the DCOs as shown in Fig. 2, and apply a similar ordering of buffers based on effective drive strength. The measured coarse step resolution can be tuned between 160ps and 1ns, and the fast and slow DCOs are controlled together for the fine step resolution, which can be as low as 8ps.

## III. CALIBRATION SCHEME UTILIZING SYSTEMATIC MISMATCH

Fig. 4 illustrates the DCO control scheme to address systematic mismatch caused by automatic P&R. The systematic mismatch is measured in terms of incremental period, which is defined as increase in period when only one buffer is turned off. Each buffer has a unique incremental period value, implying an effective drive strength of the buffer. The measured incremental periods for the 62 buffers in Stage1 of the DCO are shown in Fig. 4. The buffers in each stage are reordered based on the incremental periods and stored in the memory while in calibration mode. When the ADPLL is in locking mode, the DCO controller turns off the buffers following the order stored in memory. During coarse control, 5 buffers at a time are turned off from the top of the memory, which have the highest drive strengths. During fine control, only one buffer at a time is turned off from the bottom of the memory, which has the lowest drive strengths. The coarse



Fig. 4. Calibration algorithm utilizing systematic mismatch.

control range is programmable by the starting memory address, which enables fast-locking at a target frequency range. The fine control achieves higher resolution by turning off buffers with the lowest drive strength. Fig. 4 shows the measured frequency response for 3 different coarse settings (branches off the steep slope), with fine-tuning applied afterwards (shallow slopes). The measured fine-frequency resolution is 3MHz/step at 2.5GHz.

This coarse/fine frequency control is obtained due to systematic mismatch by automatic P&R. If the buffers and wiring were completely matched, the incremental periods of the buffers would all be the same, and a coarse/fine tuning would not be possible. Instead, the frequency would be determined only by the absolute number of enabled buffers, regardless of which ones were used. The proposed calibration scheme therefore leverages the systematic wiring mismatch by reordering buffers to achieve better resolution at desired coarse frequency bands.

To address the sensitivity of the synthesized ADPLL over environmental variations, the DCO frequency control is measured over PVT variation as shown in Fig. 5. The DCO frequency varies less than 10% and 15% over the voltage and temperature variation, respectively. The maximum DCO frequencies of 5 chips are also measured to address process variation. The mean of the maximum frequency is 2.9GHz with a standard deviation is 37.4MHz. The above PVT variation is within a few coarse control steps in the proposed calibration scheme, and can be overcome by the broad DCO frequency range and its programmability when the ADPLL operates in closed-loop.

The proposed calibration scheme requires a consistent *order* of the buffers over supply voltage and temperature variations, since the index of buffers is reordered and stored in the memory while in calibration mode, and remains during the ADPLL locking mode. Since the systematic mismatch is dominant in the automatically P&R-ed DCO, the *order* of the buffers' drive strengths is relatively consistent over environmental variations



Fig. 5. Measured DCO frequency control over (a) voltage, (b) temperature, and (c) process variation.



Fig. 6. Measured incremental period of buffers in Stage1 over (a) voltage, (b) temperature, and (c) process variation.

as shown in Fig. 6 by the same relative shapes of the incremental periods, resulting in the same sorted order. Thus, the reordering process can be done once per chip, and applied to the frequency control.

## IV. MEASURED ADPLL PERFORMANCE

The ADPLL was fabricated in 65nm CMOS, and the micrograph is shown in Fig. 7. The active area of the ADPLL is  $185 \times 190 \mu m^2$ , and the memory occupies an additional area of  $90 \times 85 \mu m^2$ . The ADPLL operates at 1.1V, and the output frequency locks from 1.5GHz to 2.7GHz by controlling  $F_{ref}$  and the divider N, which ranges from 1 to 511. In this measurement,  $F_{ref}$  is set to 10MHz, and N is programmed to be 250 to lock the frequency at 2.5GHz. Fig. 8 shows the measured output clock signal and the period jitter histogram at 2.5GHz. The period jitter is the main figure of merit for clock synthesis, and it is measured with an oscilloscope. Over 275K samples, the RMS and peak-to-peak jitters are  $3.2 p_{srms}$  and  $36 p_{spp}$ , respectively.

Fig. 8 also shows the power consumption of the ADPLL. It dissipates 4.6mW/GHz over the frequency range with an offset



Fig. 7. Die micrograph of ADPLL





of 2.2mW. The performance of the ADPLL is summarized in Table I.

## V. CONCLUSION

A synthesizable ADPLL for clock generation is designed and fabricated in a 65nm CMOS process. All functional blocks are implemented with digital standard cells and automatically P&R-ed. Systematic mismatch by automatic P&R is measured in terms of incremental periods, and utilized to improve resolution at target frequencies. A calibration scheme exploits the systematic mismatch to improve the performance. The proposed cell-based design of ADPLL leverages standard cell engineering and design automation to enhance productivity in advanced process technologies, and the measured performance of the ADPLL is comparable to other full-custom designs.

#### REFERENCES

- A-J. Annema, B. Nauta, R. Langevelde, and H. Tuinhout, "Analog Circuits in Ultra-Deep-Submicron CMOS," *IEEE Journal of Solid-State Circuits*, vol.40, no. 1, Jan. 2005.
- [2] L. L. Lewyn, T. Ytterdal, C. Wulff, and K. Martin, "Analog Circuit Design in Nanoscale CMOS Technologies," *Proc. of IEEE*, pp. 1687-1714, 2009.

TABLE I Performance Summary

|                     | This work                       | [4]                           | [5]                            | [6]                           |
|---------------------|---------------------------------|-------------------------------|--------------------------------|-------------------------------|
| Process             | 65nm<br>CMOS                    | 65nm<br>CMOS                  | 65nm<br>CMOS                   | 65nm<br>CMOS                  |
| Area                | 0.042mm <sup>2</sup>            | 0.045mm <sup>2</sup>          | 0.027mm <sup>2</sup>           | 0.048mm <sup>2</sup>          |
| Supply              | 1.1V                            | 1.3/1.1V                      | 1.1-1.3V                       | 1.2V                          |
| Power               | 13.7mW<br>@2.5GHz               | 11.6mW<br>@3GHz               | -                              | 19.7mW<br>@2GHz               |
| Output<br>Frequency | 1.5 to<br>2.7GHz                | 0.19 to<br>4.27GHz            | 600 to<br>800MHz               | 1 to 2GHz                     |
| Jitter              | 3.2ps <sub>rms</sub><br>@2.5GHz | 1.4ps <sub>rms</sub><br>@3GHz | 21ps <sub>rms</sub><br>@800MHz | 1.0ps <sub>rms</sub><br>@2GHz |
|                     | 36ps <sub>pp</sub><br>@2.5GHz   | 15ps <sub>pp</sub><br>@3GHz   | 193ps <sub>pp</sub><br>@800MHz | 16.6ps <sub>pp</sub><br>@2GHz |

- [3] B. Murmann, P. Nikaeen, D. J. Connelly, and R. W. Dutton, "Impact of Scaling on Analog Performance and Associated Modeling Needs," *Trans. Electron Devices*, vol. 53, no. 9, Sep. 2006.
- [4] W. Grollitsch, R. Nonis, and N.D. Dalt, "A 1.4ps<sub>rms</sub>-Period-Jitter TDC-Less Fractional-N Digital PLL with Digitally Controlled Ring Oscillator in 65nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 478-479, Feb. 2010.
- [5] M.S-W. Chen, D. Su, and S. Mehta, "A Calibration-Free 800MHz Fractional-N Digital PLL with Embedded TDC," *ISSCC Dig. Tech. Papers*, pp. 472-473, Feb. 2010.
- [6] A.V. Rylyakov, J.A. Tierno, D.Z. Turker, J-O. Plouchart, H.A. Ainspan, and D. Friedman, "A Modular All-Digital PLL architecture Enabling Both 1-to-2GHz and 24-to-32GHz Operation in 65nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 516-517, Feb. 2008.
- [7] J.A. Tierno, A.V. Rylyakov, and D. Friedman, "A Wide Power Supply Range, Wide Tuning Range, All Static CMOS All Digital PLL in 65 nm SOI," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 42-51, Jan. 2008.
- [8] T. Olsson and P. Nilsson, "A Digitally Controlled PLL for SoC Application," *IEEE J. Solid-State Circuits*, vol. 39, pp. 751-760, May. 2004.
- [9] C. Weltin-Wu, E. Temporiti, D. Baldi, M. Cusmai, and F. Svelto, "A 3.5GHz Wideband ADPLL with Fractional Spur Suppression Through TDC Dithering and Feedforward Compensation," *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, pp. 468-469, Feb., 2010.
- [10] T. Tokairin, M. Okada, M. Kitsunezuka, T. Maeda, and M. Fukaishi, "A 2.1-to-2.8GHz All-Digital Frequency Synthesizer with a Time-Windowed TDC," *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, pp. 470-471, Feb., 2010.
- [11] J. Borremans, K. Vengattaramane, V. Giannini, and Jan. Craninckx, "A 86MHz-to-12GHz Digitally-Intensive Phase-Modulated Fractional-N PLL Using a 15pJ/Shot 5ps TDC in 40nm digital CMOS," *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, pp. 480-481, Feb., 2010.
- [12] C. Hsu, M. Z. Straayer, and M. H. Perrott, "A Low-Noise Wide-BW 3.6-GHz Digital ∆∑ Fractional-N Frequency Synthesizer With a Noise-Shaping Time-to-Digital Converter and Quantization Noise Cancellation," *IEEE Journal of Solid State Circuits*, vol. 43, no. 12, pp. 2776-2786, Dec. 2008.
- [13] V. Kheterpal et al., "Design methodology for IC manufacturability based on regular logic-bricks," in Proc. ACM/IEEE Design Automation Conf., pp. 353-358, Jun. 2005.
- [14] Y. Park and David D. Wentzloff, "A Cyclic Vernier Time-to-Digital Converter Synthesized from a 65nm CMOS standard library", *IEEE Int. Symposium on Circuits and Systems*, pp. 3561-3564, Jun. 20.