Chapter from the book *Advances in Solid State Circuit Technologies*
Downloaded from: http://www.intechopen.com/books/advances-in-solid-state-circuit-technologies

Interested in publishing with InTechOpen?
Contact us at book.department@intechopen.com
CMOS Nonlinear Signal Processing Circuits

Hung, Yu-Cherng
National Chin-Yi University of Technology
Taiwan, R.O.C.

1. Introduction

In VLSI circuit design, nonlinear signals processing circuits such as minimum (MIN), maximum (MAX), median (MED), winner-take-all (WTA), loser-take-all (LTA), k-WTA, and arbitrary rank-order extraction are useful functions (Lippmann, 1987; Lazzaro et al., 1989). In general, median filter is used to filtering impulse noise so as to suppress the impulsive distortions. The MAX and MIN circuits are important elements in fuzzy logic design. With regard to WTA application, it is the major function in pattern classification and artificial neural networks. Thus, design of these nonlinear signal-processing circuits to integrate smoothly within SoC (System-on-a-chip) applications becomes an important research. Recently, complementary metal-oxide-semiconductor (CMOS) technology is widely used to fabricate various chips. In this chapter, the designs of all circuits are realized by using CMOS process. However, since CMOS transistor is continuously scaled down via thinner gate oxides and reduced device size, supply voltage is necessary to reduce in order to improve device reliability. Therefore, a high reliable WTA/LTA circuit, a simple MED circuit, and a low-voltage rank-order extractor are addressed in the chapter. The organization of this chapter is as follows. Section 1 introduces the background of these nonlinear functions, including definitions and applications. Section 2 describes conventional WTA/LTA architectures and presents a high reliable winner-take-all/loser-take-all circuit. Section 3 shows an analog median circuit, with advantage of simple circuit. Section 4 describes a CMOS circuit design for arbitrary rank order extraction. Restrictions and design techniques of low voltage CMOS circuit are also addressed. Section 5 will briefly conclude this chapter.

Given a set of external input $n$ variables $a_1, \ldots, a_n$, the operation of MAX (or MIN) circuit determines the maximum (or minimum) value. A median filter puts out the median variable among a window of input samples. The function of a WTA network is to select and identify the largest variable from a specified set of variables. A counter part of WTA, LTA identifies the smallest input variable and inhibits remain ones. Instead of choosing only one winner, the $k$-WTA network selects the largest $k$ numbers among $n$ competing variables ($k \leq n$), which allows for more flexibility in applications. For arbitrary rank order identification, a rank-order filter (extractor) is designed to select the $k$-th largest element $a_k$ among $n$ variables $a_1, \ldots, a_n$. Depending on application requirements, these input variables are either voltage, or current signals.

In order to clearly describe these nonlinear functions, taking one example indicates these definitions. Two output responses of a circuit corresponding to a set of input currents $I_{in1}$, $I_{in2}$, and $I_{in3}$.
\(I_{in1}, \ldots, \text{and } I_{inN}: \) one is analog output current \(I_o\), the other one is digital outputs set \(V_{o1(\text{rank})}, V_{o2(\text{rank})}, \ldots, \text{and } V_{oN(\text{rank})}.\) Assuming five external input currents are 9, 7, 10, 5, and 3 \(\mu\)A. Depending on various functions requirement, the output current \(I_o\) and the corresponding digital outputs responses are as follows.

1. **MAX:** \(I_o = \text{Maximum}(I_{in1}, I_{in2}, \ldots, I_{inN}) = I_{in3} = 10 \mu\)A
2. **MIN:** \(I_o = \text{Minimum}(I_{in1}, I_{in2}, \ldots, I_{inN}) = I_{in5} = 3 \mu\)A
3. **MED:** \(I_o = \text{Median}(I_{in1}, I_{in2}, \ldots, I_{inN}) = I_{in2} = 7 \mu\)A
4. **WTA:** Output voltages \(V_{o1(\text{rank})}, V_{o2(\text{rank})}, \ldots, \text{and } V_{o5(\text{rank})}\) respond to logic high to identify which one is the maximum value among \(I_{in1}, I_{in2}, \ldots, \text{and } I_{inN}.\) In this case, \((V_{o1(\text{rank})}, V_{o2(\text{rank})}, \ldots, V_{o5(\text{rank})}) = (0, 0, 1, 0, 0),\) where “0” and “1” are the logic low and logic high, respectively.
5. **LTA:** A reverse operation of WTA function, and outputs set is (0, 0, 0, 0, 1) for this case.
6. **k-WTA:** Depending on \(k\) value, \(k\) winners are selected. The function has more flexible in application than WTA. For example, the outputs of 2-WTA is \((V_{o1(\text{rank})}, V_{o2(\text{rank})}, \ldots, V_{o5(\text{rank})}) = (1, 0, 1, 0, 0)\) in this case.
7. **Rank order:** The function of the \(r\)th rank-order extraction identifies the \(r\)th largest magnitude among \(I_{in1}, I_{in2}, \ldots, \text{and } I_{inN}.\) For example, outputs of the 2nd and 3rd rank order are \((1, 0, 0, 0, 0)\) and \((0, 1, 0, 0, 0)\) in this case, respectively.

Rule 1: IF \(x\) is PL and \(y\) is ZR, then \(z\) is NS.

Rule 2: IF \(x\) is ZR and \(y\) is NL, then \(z\) is ZR.
Various applications for these nonlinear functions are described as follows. The MAX and MIN circuits are important elements in fuzzy logic design (Yamakawa, 1993). Fig. 1 shows the MAX and MIN operations in fuzzy inference. Variables “x” and “y” are inputs; variable “z” is the corresponding output response. In a specific status, either rule 1 or rule 2 is satisfied. MIN function realizes the “and” operation in fuzzy rules, and MAX function realizes the “or” operation. In image signal processing, MED function in general is used to filtering impulse noise so as to suppress the impulsive distortions. Figure 2 shows a one-dimension application for noise cancellation. Fig. 2(a) shows a V_{pp} 1.2 V sinusoidal signal corrupted by noise, and Fig. 2(b) shows the processed signal after MED filtering with a window of size five. In addition, Figure 3 shows a two-dimension application also for noise cancellation of image. With regard to WTA application, it is the major function in pattern classification, vector quantization, data compression, and self-organization neural networks. Figure 4 shows WTA application for pattern identification. Commonly, an analogue rank order filter is widely used in signals sorting and classification.

In general, these nonlinear functions are achieved either by using digital or analog implementations. Under digital implementation, since most of signals obtained from the real world are continuous forms, the continuous inputs must first be transferred to digital type by using one-or-multiple analog-to-digital converter (A/D). As a result, the circuit complexity, chip area, and power consumption are increased due to the extra data converters in digital realization. Whereas for analog implementation, the circuit accuracy is slightly lost than digital operation and there is weaker tolerance to fabricate process variation. However, without extra data transfer, the analog operation is with many
advantages such as saving time, bandwidth, and computation at the system level. Considering the practicality and flexibility, design issues of a CMOS analog signal processing circuit therefore must include 1) precision; 2) speed; 3) high tolerance to fabrication process variation; 4) wide range of supply voltage; 5) wide input range; 6) low circuit complexity; 7) low power consumption; 8) scalability; 9) programmability, and so forth, to allow these functions easily integration within various system-embedded chips. Additionally, when the device size of CMOS transistor is shrunk thinner and smaller, supply voltage is necessary to scale down in order to improve device reliability. A forecast of high-performance CMOS circuit operated within low voltage had been reported (Semiconductor Industry Association, 2008). Figure 5 shows the trend of CMOS supply voltage and physical gate length. Moreover, portable equipments such as biomedical electronics, computer, and portable telecommunication equipments are common used recently. Battery operation and low-power consumption are also important design requirements for these circuits.

Fig. 5. Trend for supply voltage and physical gate length by ITRS 2008 update.

2. Winner-Take-All and Loser-Take-All circuit

2.1 Architectures of WTA/LTA circuits

Based on different circuit structures, conventional WTA/LTA circuits are roughly cataloged into four types: 1) global-inhibition structure, in which the connectivity increases linearly with the number of inputs (Lazzaro et al., 1989; Starzyk & Fang, 1993); 2) cell-based tree-topology (Smedley et al., 1995; Demosthenous et al., 1998); 3) excitatory/inhibitory connection (He & Sanchez-Sinencio, 1993); and 4) serial cascade structure (Aksin, 2002). Figure 6(a-d) shows the conceptual diagrams of these topologies. In Fig. 6(a), each cell receives the same global inhibition, and a common current $I_{\text{conn}}$ or voltage $V_{\text{conn}}$ is shared by all the competing cells. The cells represented in a square block are nonlinear signal processing elements. Therefore, the precision of the circuit is degraded as the number of inputs increases. Since the operation of this circuit relies on the cells matching, a stable fabrication process is required for manufacturing a high-precision system. The complexity of the connectivity of the circuit is $O(N)$, where $N$ is the number of inputs. Figure 6(b) shows a cell-based tree-topology, with $N-1$ cells arranged in a tree topology for $N$ inputs. Each cell receives two input variables to compare and outputs the larger (or smaller) of the two input signals. The backward digits in the bottom cell are then successive feedback to 1st-layer cells.
to identify the maximum (or minimum) input. The precision of this circuit is also sensitive to cell matching. With this circuit design, the device sizes must be rescaled when the supply voltage is modified.

![Diagram of conventional architectures]

Fig. 6. Conventional architectures. (a) Global-inhibition structure. (b) Cell-based tree topology. (c) Excitatory/inhibitory connection. (d) Serial cascade.

Figure 6(c) shows an excitatory/inhibitory connection with an $O(N^2)$ connectivity complexity. Each cell receives the inhibited signals from other cells and an excitatory signal from itself. With this design, chip area increases with the square of the number of inputs. Based on comparators operation, Figure 6(d) shows an $N-1$ analog comparison blocks and $N-1$ digital blocks cascaded in serial. Within a comparison time $T_{\text{comp}}$, the larger magnitude of inputs in each analog block is sent to next stage to compare with other inputs. The result of the each comparison is then sent to the corresponding digital block, and a decision digit is feedback from right block to left block to identify the maximum input. As a result, the response time of the circuit is approximated to $(N-1) \cdot T_{\text{comp}} + T_{\text{dig}}$, where $T_{\text{dig}}$ is the total propagation time of the digital part. The offset voltage of each comparator dominates the precision of the architecture. Circuit implementation of Fig. 6(d) is also sensitive to process variation. For a high precision application, identical internal circuit blocks shown in Figs. 6(a-d) are necessary. The primary limitations of accuracy for the conventional architectures are fabricated process variations and matching requirement of internal cells. The variations of CMOS fabricated process include transistor threshold voltage, actual device size, thinness of the gate oxide, and other variety of factors. In a common process, threshold voltage in general varies from -10% to +10% of its nominal value. Due to the non-uniform etch and diffusion procedures, actual device sizes are also varied. In a real CMOS process, these variations are hard to eliminate completely. How can we improve the accuracy of analog circuit in a conventional process?
2.2 A high reliable WTA/LTA circuit

In the section, a highly reliable CMOS signal processing circuit with a programmable capability for WTA function and LTA function is described (Hung & Liu, 2004). A symbol \( \text{COMP}^j_i(V_{\text{inj}}, V_{\text{ink}}) \) \((1 \leq j, k \leq N \text{ and } N \text{ is the number of inputs})\) is defined such that the \( i \text{th} \) comparator cell receives two input variables \((V_{\text{inj}} \text{ and } V_{\text{ink}})\) to compare in magnitude at time \( t \), and the output \( Z^i_t \) of the cell is the larger variable or a binary value. For a \( \text{COMP}^j_i(V_{\text{inj}}, V_{\text{ink}}) \) operation, \( Z^i_t \) is defined as

\[
Z^i_t = \begin{cases} 
1 & \text{or } V_{\text{inj}}, \text{ when } V_{\text{inj}} > V_{\text{ink}} \\
0 & \text{or } V_{\text{ink}}, \text{ otherwise.}
\end{cases}
\]

Therefore, returning to the conventional architecture the tree topology of Fig. 6(b), WTA mode, is represented as:

\[
t_1: \text{COMP}^1_{i1}(V_{\text{in1}}, V_{\text{in2}}), \text{COMP}^2_{i1}(V_{\text{in3}}, V_{\text{in4}}), \ldots, \text{COMP}^{N/2}_{i1}(V_{\text{in}(N-1)}, V_{\text{in}N})
\]

\[
t_2: \text{COMP}^{(N/2)+1}_{i2}(Z^1_{i1}, Z^2_{i1}), \text{COMP}^{(N/2)+2}_{i2}(Z^3_{i1}, Z^4_{i1}), \ldots
\]

\[
t_{(\log_2 N)}: \text{COMP}^{(N-1)}_{i_{(\log_2 N)}(Z^{N-3}_{i_{(\log_2 N)-1}}, Z^{N-2}_{i_{(\log_2 N)-1}})}.
\]

After time \( O(\log_2 N) \), the maximum (or the minimum) input variable is obtained. Total \( N-1 \) identical comparators are necessary for this operation.

![Fig. 7. A high reliable WTA/LTA architecture.](www.intechopen.com)

To reduce the matching requirement of internal cell, Figure 7 shows a conceptual diagram of high reliable circuit. In the scheme, there are \( N \) identical ‘digital’ control cells and a single comparator for \( N \) input variables. A single comparator block multiplexes in time to achieve all inputs comparisons. The operating procedures are described as follows:

\[
t_1: \text{COMP}^1_{i1}(V_{\text{in1}}, V_{\text{in2}})
\]

\[
t_2: \text{COMP}^{(N/2)+1}_{i2}(Z^1_{i1}, V_{\text{in3}})
\]
The strategy adopted to find the maximum/minimum among a set of variables is that two variables are first compared; then the result of this comparison is compared with the next input variable using the same comparator. The procedure continues until the comparisons of all input variables are completed. Conceptually, circuit operation is similar to a serial comparison. Unlike the traditional architectures that require $N-1$ analogue comparators; this architecture requires only a single comparator to eliminate sensitivity to component matching requirements. Using the same algorithm, the LTA function is easily obtained by only reversing the output state $Z^t_i$ in the same architecture.

\[ t_{(N-1)} : \text{COMP}^t_i \left( Z^t_{i(N-2)}, \text{in}_N \right). \]

The key block in this architecture is the comparator cell. Comparator performance is a crucial factor for realizing high-speed data conversion systems and telecommunication interfaces. The precision of a comparator is usually defined as the minimum identifiable differential voltage (or current) between inputs, that is, the comparator’s resolution capability. A comparator design from (Hosotani et al., 1990) is used herein; the schematic diagram is shown in Fig. 8. Transistors $M_{sw1}$, $M_{sw2}$, $M_{sw3}$ are used as switches. The circuit operates on two phases, auto-zero phase and comparison phase. Assuming the voltage at node B is $V_x$. Based on charge conservation, after the comparison phase, $V_x$ arrives at the following:

\[ V_x = V_b + (V_{in2} - V_{in1}) \cdot \frac{C_s}{C_s + C_p + C_{int}}. \]
The effect of the \( C_s / (C_s + C_p + C_{in}) \) term in (1) represents a degrading factor. To reduce the decision time, the succeeding inverters amplify the different voltage \( (V_{in2} - V_{in1}) \) to pull node D up to high (logic 1) or push it down to 0 V (logic 0). The functions of the N-latch are to sample the voltage at node D as \( latch\_clk \) turns high and to hold the comparison result as \( latch\_clk \) turns low. Ultimately, the output polarity of the N-latch will be changed according to the \( max/min\_selector \) setting. The \( max/min\_selector \) signal modifies the polarity of the compared result; therefore, without the need for structural modification, this circuit possesses win/lose configurable capability. The comparison block shown in Fig. 8 is reused during all comparison procedures. The architecture of N-inputs circuit is shown in Fig. 9, in which \( Control\_Cell_n \) (\( 1 \leq n \leq N \)) are identical. N cells are required for N input variables. Each cell contains a status block, a control_switch block, and two latch blocks.

Fig. 9. The block diagram of the high reliable WTA/LTA.

Figure 10 shows the clocks for the whole circuit. Signal \( reset \) and clock \( reg\_clk \) must be generated externally; other clocks are produced by \( reg\_clk \) and some logic gates.

To describe the operations of the entire circuit, the circuit architecture in Fig. 9 and the clock waveform in Fig. 10 are referred. First, at \( t1 \), \( reset \) signal is used to initiate the status blocks, control_switch blocks and latch blocks. The N-latch in the status block and \( R_{o1}, R_{o2}, \ldots, R_{oN} \) are reset to zero by \( reset \) signal. Based on \( max/min\_selector \) signal, the MOS transistors Ms1, Ms2, Ms3 and Ms4 preset the initial sampling voltage (0 V or \( V_{DD} \)) at node cap_comn. Despite the magnitude of input-1 variable, the input-1 variable must be a winner during an initial interval for a serial comparison. The initial sampling voltage at node cap_comn is thus set as 0 V when the \( max/min\_selector \) signal is set to logic 1 for WTA operation, and vice versa.
Then, at $t_2$, the $V_{s1}$ clock turns high (auto-zero phase) to sample the initial voltage (0 V or $V_{DD}$) at node cap_comn. Next, at $t_3$, $R_{o1}$ turns high to sample voltage $V_{in1}$. At this time, the clock $V_{s1}$ turns low (comparison phase) to compare the $V_{in1}$ with the initial sampling voltage, and the compared result is stored in the N-latch of the first status block. The state of the N-latch is logic 1 if the variable is the winner. At $t_4$, the present winner $V_{in1}$ is sampled again. At $t_5$, a new comparison between previous winner $V_{in1}$ and $V_{in2}$ is performed. At $t_6$, the winner (the result for the $V_{in1}$ and $V_{in2}$ comparison) is sampled again. After this procedure, a new comparison between the present winner and $V_{in3}$ is performed. The procedure continues until comparison of all the input voltages is completed. Ultimately, only one state $V_{osn}$ ($n=1,\ldots,N$) in these cells is logic 1 for WTA/LTA indication; others are logic 0. Therefore, a WTA or a LTA operation has been accomplished.

Figure 11 shows the status block. Figure 12 shows the control_switch block. It receives an input variable and controls the transmission gate to sample input level. A true single-phase latch composed of an N-latch and a P-latch is used to reduce the clock skew issue (Yuan & Stensson, 1989).

Fig. 10. Clock waveforms.

Fig. 11. Status block.
Fig. 12. Control_switch block.

2.3 Simulation results and reliability test
With regard to the high reliable WTA/LTA circuit, an experimental chip with six inputs was also fabricated using a 0.5-μm CMOS technology. The sampling capacitance $C_s$ implemented by using two-layer polysilicon is set to be 3 pF. The period of reg_clk clock is 100 ns with a 50% duty cycle. WTA/LTA functions, supply-voltage range, and Monte Carlo analysis of transistor variation by simulation were also tested.

1) WTA/LTA functions
To test the function of the circuit, each example takes ten input voltages for the WTA/LTA operation. For supply voltage $V_{DD}=3.3$ V, the input variables $V_{in1}$, $V_{in2}$, ..., and $V_{in10}$ are 0.003, 0.006, 1.000, 0.997, 2.000, 2.003, 2.000, 3.297, 3.300, and 3.297 V for testing WTA function, respectively, and 3.297, 3.294, 2.000, 1.997, 2.000, 1.000, 0.997, 0.006, 0.009, and 0.003 V for testing LTA function. During the WTA operation, the logic state $V_{osn}$ of each cell at each time slice becomes:

$V_{os1}=1,0,0,0,0,0,0,0,0,0$  
$V_{os2}=0,1,0,0,0,0,0,0,0,0$  
$V_{os3}=0,0,1,1,0,0,0,0,0,0$  
$V_{os4}=0,0,0,0,0,0,0,0,0,0$  
$V_{os5}=0,0,0,0,1,0,0,0,0,0$  
$V_{os6}=0,0,0,0,0,1,1,0,0,0$  
$V_{os7}=0,0,0,0,0,0,0,0,0,0$  
$V_{os8}=0,0,0,0,0,0,1,0,0,0$  
$V_{os9}=0,0,0,0,0,0,0,1,1,0$  
$V_{os10}=0,0,0,0,0,0,0,0,0,0$.

When all comparisons are finished, the outputs $V_{os1}$, $V_{os2}$, $V_{os3}$, ..., and $V_{os10}$ respond as logic 0, 0, 0, 0, 0, 0, 0, 1, and 0, respectively. Therefore, among these ten inputs, input variable $V_{in9}$ is the maximum. Figure 13 shows the results of HSPICE simulation for the WTA operation. The time period of the latch clock (top trace) is 100 ns. In the same operation, Fig. 14 shows the results for the LTA operation. The final outputs $V_{os1}$, $V_{os2}$, $V_{os3}$, ..., and $V_{os10}$ are logic 0, 0, 0, 0, 0, 0, 0, 0, 1, and 0, respectively, and the input variable $V_{in10}$ is the minimum one. Choice for the above tested voltages was based on the followings: 1) input voltages of neighbor cells should be as close as possible to test discrimination capabilities; 2) input voltages are distributed from 0 V to 3.3 V to test for wide dynamic range.

2) Supply voltage range
All circuit parameters such as transistor dimensions, clock periods and sampling capacitance $C_s$ are held constant. A supply voltage $V_{DD}$ varies from 2 V to 5 V, and the logic high of these clocks are also modified when the supply voltage alters. The supply voltage $V_{DD}$ for each iteration increases in 0.1 V steps. The simulation results show that the circuit operates successfully within 3-mV discrimination when the supply voltage ranges from 2.7 V to 5 V. Without any procedure for rescaling the device size, the circuit works under various commonly used supply voltages.
A statistical distribution of manufacturing parameters often occurs during CMOS fabrication. Wafer-to-wafer, run-to-run and transistor-to-transistor process variations determine the electrical yield and critical second-order effects. Threshold voltage, channel widths, and channel lengths of all MOS transistors were set to nominal values with ±5 % variation at the 3 sigma level, and each transistor was given an independent random Gaussian distribution. After 30 Monte Carlo iterations, HSPICE results indicate that circuit precision and speed are not degraded over this range. In addition, to verify the circuit with
multi-technology support capability, using various CMOS fabrication parameters also simulates the circuit performance. The results show that the performance of the circuit under various fabrication processes is functional work, without needing to tune any device dimension. The following reasons contribute to the robustness of this circuit: 1) the circuit is designed with only a single analog cell (comparator), while the other active components are digital; 2) the comparator itself is designed with a auto-zero property, therefore, the operation of the comparator is more tolerant to manufacturing process variation.

4) Circuit precision
The accuracy of the comparator cell dominates the identified precision. The comparator accuracy is dependent on two factors. One is the clock feed-through error and charge-injection error in transistor \( M_{sw3} \), shown in Fig. 8; the other is the degrading factor in Eq. (1). Charge-injection error is a complicated function of substrate doping concentration, load capacitor, input level, clock voltage, clock falling rate, MOS channel dimension, and the threshold voltage. Therefore, this error is difficult to be completely eliminated. In general, complementarly clock, transmission gates, and dummy transistor are adopted for a switch realization to reduce the error.

3. CMOS analogue median cell
Median (MED) filter is a useful function in image processing application to eliminate pulse noise. Given a set of external input \( n \) variables \( a_1, \ldots, a_n \), the operation of MED circuit determines the median value. The extracted median operation is a nonlinear function. The MED circuit realizations can be classified as analog filtering and digital filtering depending upon what type of input signals are. The digital filtering architecture has a variety of sophisticated algorithms to support the circuit realization so as with advantages of higher flexible and higher reliability. For power consumption and chip area considerations, however, it is costly expensive than analog architecture. In 1994, without using an operational amplifier, an analogue median extractor with simple structure and high sharp DC transfer characteristic was presented (Opris & Kovacs, 1994). The circuit expects to reduce the errors in the transition region. In 1997, for the same authors, an improved version with high speed operation was proposed. The median circuit has transient recovery less than 200 ns by using 2-um CMOS process (Opris & Kovacs, 1997). In 1999, a current-input analog median filter composed of absolute value and minimum circuits was proposed (Vlassis & Siskos, 1999). The operational amplifier and transconductor are also not needed in design of the circuit. Based on transconductance comparators and analog delay elements, a fully continuous-time analog median filter is presented in 2004 (Diaz-Sanchez et al., 2004). By using the median filter cells, an image of 91×80 pixels can be processed in less than 8 ms to remove salt and pepper noise. In the section, an intuitional and simple CMOS analog median cell is described (Hung et al., 2007). Based on current-mirror, current comparison, and some basic digital logics, a simple analog median filter cell is achieved. By using TSMC 0.35 \( \mu m \) CMOS technology, simulation shows that the median filter provides a 0.4-\( \mu A \) discriminability and well tracked the median value among input currents.

Figure 15 shows a basic one-input current cell composed of current mirror and control logic circuits. The cell has one signal input \( (i_s) \), a current source \( (i_{src}) \) output and a current sink \( (i_{sink}) \) output, a control signal \( V_{ctrl} \), and an output current \( (i_{out}) \). Transistors \( M_{1}-M_{12} \) are cascode current mirrors. \( M_{swp} \) and \( M_{swn} \) constitute transmission gate for analog switch function. \( M_{dummy} \) is designed to compensate the \( M_{swn} \) and \( M_{swp} \) loading to improve the
accuracy of output current. \( M_{iso} \) is used to isolate the clock noise from transmission gate. \( M_{dist-2} \) and \( M_{res} \) are used to speedup transmission operation and control the discharge timing. Corresponding to Fig. 15(a), Fig. 15(b) is a symbol representation, which is named as current signal control unit and is abbreviated as CSCU.

Fig. 15. Current signal control unit (CSCU): (a) circuit and (b) symbol representation.

Three input signals \( i_{s1}, i_{s2}, \) and \( i_{s3} \), how can circuit extract the median value? Assuming \( i_{s2} \) is a median current. The criteria must be satisfied.

\[
\text{MED}(i_{s1}, i_{s2}, i_{s3}) = \begin{cases} 
(i_{s2} > i_{s3}) \text{ and } (i_{s2} < i_{s1}) \\
\text{ or } \\
(i_{s2} < i_{s3}) \text{ and } (i_{s2} > i_{s1})
\end{cases}
\]  \hspace{1cm} (2)

As a result, current level comparison and logic decision are required to realize the function. Figure 16 shows a three-input median circuit composed of three CSCU cells and three decision logic blocks. The decision logic circuit is simply realized by AND-OR gate circuit to perform

\[
\overline{V_{ctr}} = 1 \cdot 2 + 3 \cdot 4
\]  \hspace{1cm} (3)

where 1, 2, 3, and 4 represent the corresponding the logic inputs, that is, these signals come from comparison results \( \overline{A-F} \) signals. Depending on the output status of each decision logic, Eq. (3) determines \( V_{ctr} \) a low level or a high level, respectively. A low \( V_{ctr} \) will turn on the transmission gate of corresponding CSCU cell to switch on the input current; otherwise, the input current is prohibited. As a result, three-input MED filter cell is successfully arrived. Due to the transition pulse noise, a capacitor \( C_{filter} \) is used to suppress the switch noise.

In the circuit, NMOS transistor size \((W/L)_N=5\mu/1\mu\) and PMOS transistor size \((W/L)_P=10\mu/1\mu\) are used for \( M_1-M_{12} \). The sizes of inverters are \((W/L)_N=5\mu/0.35\mu\) and \((W/L)_P=20\mu/0.35\mu\). The device site of switch transistors \( M_{swn} \) and \( M_{swp} \) are equal to \((W/L)_N=20\mu/0.35\mu\). All transistors in decision logic block are sizing \((W/L)_N=5\mu/0.35\mu\) and \((W/L)_P=10\mu/0.35\mu\). The filter capacitance \( C_{filter} \) is designed as 10 pF. The supply voltage \( V_{DD} \)
is commonly used as 3.3 V. Input current signals $i_{s1}$, $i_{s2}$, $i_{s3}$ have 10 μA peak value at different 5 μs, 10 μs, and 15 μs time slot, respectively. Figure 17 shows three triangle waves and the corresponding median output. The red line represents the MED output. The output is tracked well with the median value of the three inputs current. By observing Fig. 17, when two input values are closed to each other, the minimum difference must be larger than 0.4 μA. That is the discriminability of the MED filter. However, there are some little spike occurs in the transition point.

Fig. 16. Three-input median cell.

Fig. 17. The output response of the median filter for triangle waveforms.

Inspecting Fig. 16, the proposed three-input median cell has three input pins ($i_{s1}$, $i_{s2}$, $i_{s3}$) and a common output pin ($i_{out}$). By modifying the switch transistors and decision logic, the
MED cell can be easily modified as three inputs and three outputs. The modified MED cell will have maximum value $i_{\text{max}}$, median value $i_{\text{median}}$, and minimum value $i_{\text{min}}$ outputs, simultaneously. As a result, the multiple modified MED cells can be organized cooperation to perform the ‘sorting’ function. In the design, no critical components such as operational amplifier and precise voltage reference are required in the MED cell. These properties are useful for the MED cell simply embedded into a larger system.

4. Low-voltage arbitrary rank order extraction

4.1 Principle of rank-order extraction

Either WTA, LTA, or MED function, however, is only a single order operation. In 2002, a low-voltage rank-order filter with compact structure was designed (Cilingiroglu & Dake, 2002). The filter is based on a pair of multiple-winners-take-all and a set of logic gates. In the section, a new architecture for with both arbitrary rank-order extraction and k-WTA functionalities is described (Hung & Liu, 2002). An $r$th rank-order extraction is defined that identifies the $r$th largest magnitude of input variables. In the design, the circuit locates an arbitrary rank order among a set of input voltages by setting different binary signals. A set of output voltages $V_{o_1}, V_{o_2}, \ldots$, and $V_{o_M}$ corresponds to the output voltages of a rank-order extractor for inputting of a set of variables $V_1, V_2, \ldots$, and $V_M$. The output status $D_{ij}$ of a comparator with two-input terminals is defined as

$$D_{ij} = \begin{cases} 1 & \text{if } V_i > V_j \\ 0 & \text{otherwise} \end{cases} \quad 1 \leq i, j \leq M , \ j \neq i \quad (4)$$

where $M$ is the number of the input variables. For convenience of description, a temporal index $S_i$ defines the total number of winners for the $i$th input variable compared with the others. Thus, $S_i$ is represented as

$$S_i = \sum_{j=1, j \neq i}^{M} D_{ij} \quad 1 \leq i \leq M . \quad (5)$$

Based on the definition of (5), $S_i$ is expanded as follows

$$S_1 = D_{12} + D_{13} + \ldots + D_{1M} \quad (6a)$$

$$S_2 = D_{21} + D_{23} + \ldots + D_{2M} = D_{12} + D_{23} + \ldots + D_{2M} \quad (6b)$$

$$S_3 = D_{31} + D_{32} + \ldots + D_{3M} = D_{13} + D_{23} + \ldots + D_{3M} \quad (6c)$$

$$\ldots$$

$$S_M = D_{M1} + D_{M2} + \ldots + D_{M(M-1)} = D_{1M} + D_{2M} + \ldots + D_{(M-1)M} . \quad (6d)$$

Thus, from the left-hand side of (6), $M(M-1)$ comparators’ cooperation is required for $M$ input variables to identify the rank order. Since $D_{ji}$ is the complementary of $D_{ij}$ ($D_{ji} = \overline{D_{ij}}$), the expression is replaced by $\overline{D_{ij}}$ in the right-hand side of (6). The physical meaning is that if both the output of the comparator and its complementary are given, the total number of comparators can be reduced from $M(M-1)$ to $M(M-1)/2$. 

www.intechopen.com
In this section, the comparator generates a unit current $I_{unit}$ when input variable $V_i$ is larger than $V_j$. Thus, the index $S_i$ in (5) is rewritten as

$$S_i^* = \sum_{j=1, j\neq i}^{M} D_{ij} I_{unit}, \quad 1 \leq i \leq M = n I_{unit}, \quad 0 \leq n \leq (M - 1)$$

(7)

where $n$ is the number of the winner in comparison. If the inputs are arranged in ascending order of magnitude, $V_1, V_2, \ldots, V_M$ which satisfy $V_1 < V_2 < \ldots < V_M$ then $S_1^* = 0, S_2^* = I_{unit}, \ldots, S_M^* = (M - 1) I_{unit}$. Obviously, the minimum, next minimum, ..., maximum input variables can be found by checking the index $S_i^*$. The $k$-WTA function is defined so that the outputs must be logic high when

$$S_i^* \geq (M - k) I_{unit}.$$  

(8)

For example, if the input variables are (0.5, 0.6, 0.9, 0.2, 0.4), the first variable 0.5 is larger than variables 0.2 and 0.4. Thus, the index $S_1^*$ is $2I_{unit}$; the meaning is that the variable wins two other input variables among all comparisons. For the same reason, the $S_2^* = 3I_{unit}$, $S_3^* = 4I_{unit}$, $S_2^* = 0$, $S_5^* = I_{unit}$. Therefore, the rank order is found among the input variables by checking the index $S_i^*$. In this example, the output voltages ($V_{o,1}, V_{o,2}, \ldots, V_{o,5}$) of the extractor respond to be (0, 0, 1, 0, 0), (0, 1, 0, 0, 0), (1, 0, 0, 0, 0), (0, 0, 0, 1, 0) for the maximum operation, next maximum operation, median operation, and the minimum operation, respectively. The “0” and “1” are the logic low and high. Similarly, if the extractor is configured as $k$-WTA function, the output voltages ($V_{o,1}, V_{o,2}, \ldots, V_{o,5}$) of the circuit respond to be (1, 1, 1, 0, 1), (1, 1, 1, 0, 0), ..., and (0, 0, 1, 0, 0) for 5-WTA, 4-WTA, 3-WTA, ..., and 1-WTA operations, respectively.

### 4.2 Architecture of rank-order extraction

The structure of the extractor is shown in Fig. 18 for five input variables (Hung & Liu, 2002). There are a total of $M(M - 1)/2$ comparators and $M$ evaluation cells for $M$ input variables. Each comparator cell accepts two input signals, and the results of each comparison are fed into the individual evaluation cell. In the first row of Fig. 18, the input $V_1$ is compared with other input variables. In addition, the results of the comparison will generate the proper unit currents $I_{unit}$. Then, these currents will be summed up in Eval-1 cell if $V_1$ is larger than the other samples; otherwise, the result of the comparison will be fed into the corresponding evaluation cell. The connecting strategy is the same for other input variables. Therefore, equation (7) have been realized in this architecture.

The signal $V_{\text{choice}}$ in Fig. 18 is used to decide the function of the circuit. $V_{\text{choice}}$ is preset at logic high to allow the rank-order operation; otherwise, the $k$-WTA function is enabled. The binary signals $sel_1, sel_2$, and $sel_3$ are used to determine which rank-order/$k$-WTA will be located. Based on the select signals ($sel_1$-3) setting, the logic states of the evaluating cells indicate which input variable belongs to this rank order. For example, in the seven inputs rank-order operation, the ($sel_1, sel_2, sel_3$) signals are set to logic (0, 0, 0) to find the minimum variable; the logic (0, 1, 1) and (1, 1, 0) setting are the median and maximum functions, respectively. Similarly, in the $k$-WTA operation, the ($sel_1, sel_2, sel_3$) is set as (0, 0, 1) and (1, 1, 0); therefore, the 6-WTA and 1-WTA are obtained, respectively.
Fig. 18. The architecture of arbitrary rank-order extractor for five input variables.

4.3 Circuit design

4.3.1 1.2-V comparator

Comparator is a key element in Fig. 18. An auto-zero comparator shown in Fig. 19 is designed to operate at low voltage supply. To improve the speed of the comparator, the succeeding gain stage is designed to operate in dynamic mode. First, in the auto-zero phase, the input $V_1$ is sampled at the top plate of the capacitor $C_s$ and the MOS transistor M11 is biased at $V_{bias}$ voltage. In next phase, the voltage at node E is $V_{bias} + (V_2 - V_1)(C_s/C_s + C_p)$ during the comparison phase. Then, a deviation voltage is amplified by transistors M11 and M12. To reduce the power dissipation, the adjustable biasing voltage $V_{bias}$ is chosen simply to overcome the threshold voltage of a MOS transistor, and the biasing voltage is also adjusted for the comparator operation in different voltage supplies. The succeeding transistors M13 and M14 provide the current to generate the proper voltage at node F. Depending on which input voltage is larger, either the voltage at node H or node G will be at logic high. The output node G of the comparator and its complementary node H are fed into next stage to generate unit currents $I_{\text{large}_1}$, $I_{\text{large}_2}$, $I_{\text{small}_1}$, and $I_{\text{small}_2}$. During the evaluation phase, the unit currents $I_{\text{large}_1}$ and $I_{\text{large}_2}$ will be presented when $V_1$ is larger than $V_2$. Otherwise, the $I_{\text{small}_1}$, $I_{\text{small}_2}$ are generated. The symbol representation of the comparator cell is shown in the right-bottom of Fig. 19.

The function of the comparator shown in Fig. 19 is summarized as

$$V_1 > V_2, \quad \left\{ \begin{array}{l} I_{\text{large}_-1} = 0, \\ I_{\text{large}_+2} = I_{\text{unit}}, \\ I_{\text{small}_-1} = 0, \\ I_{\text{small}_+2} = 0 \end{array} \right.$$
\[ V_1 < V_2 , \quad \begin{cases} \quad I_{\text{large} \_1} = I_{\text{large} \_2} = 0 , \\ \quad I_{\text{small} \_1} = I_{\text{small} \_2} = I_{\text{unit}} \end{cases} \]

where \( I_{\text{unit}} \) is the unit current of the PMOS transistor \( M_{\text{base}} \).

Fig. 19. 1.2-V auto-zero comparator, clock, and symbol representation.

4.3.2 Evaluation cell

Fig. 20. Evaluation cell.
The circuit of the evaluation cell is shown in Fig. 20. The MOS transistors $M_{\text{gen}}$ and $M_{\text{unit}}$ reproduce the same unit current. The unit current is equal to the $I_{\text{large}_1}$, $I_{\text{large}_2}$, $I_{\text{small}_1}$, and $I_{\text{small}_2}$ in Fig. 19. In order to find the various rank orders for all input signals, the cell must identify that the unit-current summation in (7) comes from $Out_{\text{com}1}$ and $Out_{\text{com}2}$ terminals. It is not easy to identify the exact current value in the VLSI circuit. However, whether the summation current $S_i^*$ lies inside a valid range or not can be checked by the criterion,

$$nI_{\text{unit}} - \delta_1 < S_i^* < nI_{\text{unit}} + \delta_2.$$  

(9)

It is a reasonable and safe design to choose $\delta_1 = \delta_2 = I_{\text{unit}}/2$. Therefore, the dimensions of these MOS transistors are designed as

$$\left(\frac{W}{L}\right)_{M_1} = \left(\frac{W}{L}\right)_{M_5} = 4\left(\frac{W}{L}\right)_{M_{\text{unit}}}, \quad \left(\frac{W}{L}\right)_{M_2} = \left(\frac{W}{L}\right)_{M_6} = 2\left(\frac{W}{L}\right)_{M_{\text{unit}}}$$

$$\left(\frac{W}{L}\right)_{M_3} = \left(\frac{W}{L}\right)_{M_7} = \left(\frac{W}{L}\right)_{M_{\text{unit}}}, \quad \left(\frac{W}{L}\right)_{M_4} = \left(\frac{W}{L}\right)_{M_8} = \frac{1}{2}\left(\frac{W}{L}\right)_{M_{\text{unit}}}$$

where $W$ is a channel width and $L$ is a channel length. MOS transistors $M_{\text{add}1}$ and $M_4$ realize the $\delta_2$ effect, and the $M_8$ realizes the $-\delta_1$ one. Depending on the $sel_{1-3}$ signals setting, the transistors $M_{\text{cnt}_{1-6}}$ enable the corresponding binary-weight current. The inverters $inv_{4-7}$ support sufficient gain to amplify the current difference between the currents which come from $Out_{\text{com}1-2}$ terminals and the binary-weight currents. This mechanism is similar to a current comparator. In the upper row of Fig. 20, the extra PMOS transistor $M_{\text{add}1}$ generates an extra unit current; therefore, the voltage $V_{\text{out-h}}$ is always larger or equal to $V_{\text{out-l}}$. If the $V_{\text{choice}}$ is preset to 0, the dash block in Fig. 20 resets the $V_{\text{out-l}}$ to 0. Then the effect of lower row in Fig. 20 is disabled. At this time, the function of the cell resembles performing only the

$$S_i^* < nI_{\text{unit}} + \delta_2.$$  

(10)

Thus, this is a $k$-WTA criterion.

Take an example to describe the function of the evaluation cell. The number of input variables is seven, and the $sel_{1-3}$ signals are set as $(0, 0, 1)$ to find the next minimum input variable. Since the next minimum is only larger than the minimum one, only a single unit current comes from $Out_{\text{com}1-2}$ terminals of the corresponding evaluation cell. In the upper row of Fig. 20, the summation of one unit current and the extra unit current ($M_{\text{add}1}$) is larger than binary weight current $1.5I_{\text{unit}}$; therefore, $V_{\text{out-h}}$ is logic 1. In contrast with the upper row, in the lower row the unit current $I_{\text{unit}}$ (which comes from $Out_{\text{com}1-2}$ terminals) is smaller than the binary weight current $1.5I_{\text{unit}}$; therefore, $V_{\text{out-l}}$ is logic 0. Thus, the transistors $M_{\text{id}1}$ and $M_{\text{id}2}$ only allow the situation $(V_{\text{out-h}}, V_{\text{out-l}}) = (1, 0)$ to pull up the corresponding output $(V_{o,n}, n=1, \ldots, 7)$ to logic 1. Otherwise, the status of $V_{o,n}$ will be logic 0 or open state for other cases. Therefore, by inspecting the logic state of $V_{o,n}$ it is found which input variable belongs to this desired rank order.
4.4 Measured results and design consideration
A seven-input experimental chip was fabricated using a 0.5 μm CMOS technology. Bias voltage $V_{bias}$ is set to 0.9 V in this design. The sampling capacitor $C_s$ is 0.8 pF, and these analog switches in this circuit are implemented by CMOS transmission gates. The micrograph of the experimental chip is shown in Fig. 21, and the active area is $610 \times 780 \mu m^2$. An individual comparator cell was built in this chip for measuring the accuracy. The supply voltages of the core circuit and the input/output pads were all set as 1.2 V. The accuracy of the individual comparator was measured roughly as 40 mV, that is, the resolution of the comparator was near five bits under a 1.2 V supply voltage. Figure 22(a)

![Fig. 21. Micrograph of the 1.2-V rank-order chip.](image)

![Fig. 22. The measurement results of (a) rank-order (b) $k$-WTA operations.](image)
shows the rank-order function, whereas Fig. 22(b) shows the function of the k-WTA. On the average, the accuracy of whole circuit was approximated 150 mV. The performance of the chip was degraded by many factors such as the mismatch in comparator cells, the different capacitance at input terminals of the evaluation cells, and the clock feed-through error. Due to these non-ideal effects, each rank-order function was finished in 20 μs. After increasing supply voltage up to 1.5 V and proper biasing voltage $V_{\text{bias}}$ adjusting, the performance of the circuit can be improved. Including power consumption of the input/output pads, the static power consumption of the chip was 1.4 mW.

Many factors such as precision, speed, process variation, and chip area must be considered for design of a low-power low-voltage rank order extractor.

1. Limitations of low voltage and low power

The average power consumption of the circuit is expressed by

$$P = P_{\text{dynamic}} + P_{\text{static}} + P_{\text{short current}}$$

$$= f C V_{DD}^2 + (I_o + I_{\text{leakage}}) V_{DD} + Q_{sc} f V_{DD}$$

where $f$ is the frequency, $C$ is the capacitance in the circuit, $V_{DD}$ is the voltage supply, $I_o$ is the standby current, $I_{\text{leakage}}$ is the leakage current, and the $Q_{sc}$ is the short-current charge during the clock transient period. In order to reduce the power consumption, the voltage supply $V_{DD}$ must be reduced, and the standby current in the comparator and evaluation cell must be designed as small as possible. In mask layout, the clock and its complementary are generated locally to reduce delay and mismatch. Thus, the probability of a short current occurring in the circuit is minimized.

2. Speed and precision

The accuracy of the comparators determines the resolution of the circuit. For the comparator design, the smallest differential voltage, that is, distinguished correctly is influenced by two factors. One is the charge-injection error in analog switches, and the other is the parasitic capacitor $C_p$ effect. The effect is reduced by enlarging the sampling capacitor $C_s$ and making the switches dimension as small as possible. In the design, the response time $\tau$ of the extractor is the summation of the auto-zero time $\tau_{az}$, the comparison time $\tau_{cmp}$, and the evaluation time $\tau_{eval}$.

$$\tau = \tau_{az} + \tau_{cmp} + \tau_{eval}$$

Reducing $\tau_{az}$, $\tau_{cmp}$ and $\tau_{eval}$ will improve the response time $\tau$. The minimum auto-zero time $\tau_{az}$ is required to sample the input voltage correctly at sampling capacitor $C_s$ and to bias the inverter properly at high gain region. The switches shown in Fig. 19 with larger dimension reduce auto-zero time $\tau_{az}$. However, the clock feed-through error and charge injection error will also be enlarged during the clock transition. In the same situation, the smaller sample capacitor $C_s$ will reduce the time $\tau_{az}$. Unfortunately, it will reduce the effective magnitude of the difference voltage; thus, the comparator accuracy is degraded. The comparison time $\tau_{cmp}$ dominates the response time $\tau$, especially when the input levels are close each other. Since the amplification in the transition region of a CMOS inverter operated at low voltage supply is not high enough, the comparator must take a long time to
identify which input variable has a larger level. The evaluation time $\tau_{\text{eval}}$ is defined so that the time interval between the comparator cells generates the proper currents and the extractor has finished finding the desired rank order. Time $\tau_{\text{eval}}$ is a function of the current $I_{\text{unit}}$. The maximum number $M$ of input variables is also influenced by the current $I_{\text{unit}}$. Although reducing the magnitude of the current $I_{\text{unit}}$ is able to reduce the power consumption, however, the relationship among $\tau_{\text{eval}}$, $I_{\text{unit}}$, and $M$ in this architecture is a complicated function.

3. Process variation analysis

With contemporary technology, process variation during fabrication cannot be completely eliminated; as a result, mismatch error must be noticed in VLSI circuit design. The match in dimension of the binary-weight MOS in the evaluation cell (M1 - M8 in Fig. 20) is an important factor for the circuit operation. If the mismatch error induces an error current $I_{\text{err}}$ larger (or smaller) than half of the unit current $I_{\text{unit}}$, decision of the evaluation cell fails. Thus, a rough estimated constraint for $I_{\text{err}}$ is

$$I_{\text{err}} < \frac{I_{\text{unit}}}{2}.$$  \hspace{1cm} (13)

5. Conclusion

The chapter describes various nonlinear signal processing CMOS circuits, including a high reliable WTA/LTA, simple MED cell, and low-voltage arbitrary order extractor. We focus the discussion on CMOS analog circuit design with reliable, programmable capability, and low voltage operation. It is a practical problem when the multiple identical cells are required to match and realized within a single chip using a conventional process. Thus, the design of high-reliable circuit is indeed needed. The low-voltage operation is also an important design issue when the CMOS process scale-down further. In the chapter, Section 1 introduces various CMOS nonlinear function and related applications. Section 2 describes design of highly reliable WTA/LTA circuit by using single analog comparator. The analog comparator itself has auto-zero characteristic to improve the overall reliability. Section 3 describes a simple analog MED cell. Section 4 presents a low-voltage rank order extractor with $k$-WTA function. The flexible and programmable functions are useful features when the nonlinear circuit will integrate with other systems. Depend on various application requirements, we must have different design strategies for design of these nonlinear signal process circuits to achieve the optimum performance. In state-of-the-art process, small chip area, low-voltage operation, low-power consumption, high reliable concern, and programmable capability still have been important factors for these circuit realizations.

6. References


This book brings together contributions from experts in the fields to describe the current status of important topics in solid-state circuit technologies. It consists of 20 chapters which are grouped under the following categories: general information, circuits and devices, materials, and characterization techniques. These chapters have been written by renowned experts in the respective fields making this book valuable to the integrated circuits and materials science communities. It is intended for a diverse readership including electrical engineers and material scientists in the industry and academic institutions. Readers will be able to familiarize themselves with the latest technologies in the various fields.

How to reference
In order to correctly reference this scholarly work, feel free to copy and paste the following: