DCT Presentation1

7/22/2019 DCT Presentation1

1/39

A

Presentation on

Design & Verification of DCT

Algorithm

Guided by:- Submitted by:-

Mr. Preet Jain Atush Jain(Asst. Prof. (0802EC09ME03)

EC Department)


2/39

Abstract

DCT is abbreviated as DiscreteCosine Transform.

It can be regarded as a discrete time version of the FourierCosine series.

It is technique for converting a signal into elementaryfrequency components.

It is very Common and well known algorithm, used for signaland image compression.


3/39

Abstract Contd

The DCT Core uses Direct Implementation of algorithm (i.e.

as per the standard equation of DCT)

The DCT Core is implemented using Verilog HDL.

The output of core is then verified from the output of

MATLAB.


4/39

Discrete Cosine Transform

A discrete cosine transform (DCT) expresses a sequence of

finitely many data points in terms of a sum of cosine functions

oscillating at different frequencies.

The Discrete Cosine Transform (DCT) of a one dimensional

sequence of length N is defined as

where k = 0,1,2,..., N 1

The original signal vector x(n) can be reconstructed back from

the DCT coefficients Y[k] by using the Inverse DCT (IDCT)

operation and it can be defined as

1

0

}2/)12cos{()()()(N

n

NknnxkczY


5/39

Discrete Cosine Transform

where n = 0,1,2,..., N 1 In both the above equations c[k] is defined as

1

0

}2/)12cos{][][)(N

k

NknkYkcnx

1

0

}2/)12cos{][][)(N

k

NknkYkcnx

1

0

}2/)12cos{][][)(N

k

NknkYkcnx

1

0

}2/)12cos{][][)(N

k

NknkYkcnx

1.........2,1,)/2(

0,)/1(

][ NkN

kN

kc


6/39

DCT vs DFT

If we wish to find the frequency spectrum of a function thatwe have sampled, the continuous Fourier Transform is not souseful. For that, We need a discrete version like DFT.

When the input data contains only real numbers, the sinecomponent of the DFT is 0, then DFT becomes a DiscreteCosine Transform(DCT).

The Discrete Fourier Transform (DFT) and Discrete CosineTransform (DCT) perform similar functions i.e. they bothdecompose a finite-length discrete-time vector into a sum ofscaled-and-shifted basis functions.


7/39

DCT vs DFT

The difference between both the transforms is the type of basisfunction used; the DFT uses a set of complex exponentialfunctions, while the DCT uses only (real-valued) cosine

functions.

The DCT & DFT are used because Some tasks are much easierto handle in the frequency domain that in the time domain. Forexample: graphicequalizer. We want to boost the bass:

1. Transform to frequency domain.2. Increase the magnitude of low frequency components.

3. Transform back to time domain.


8/39

Application of DCT

For audio:

Human ear has different dynamic range for differentfrequencies.

It transform from time domain to frequency domain, and

quantize different frequencies differently.

For images and video:

Human eye is less sensitive to fine detail.

It transform from spatial domain to frequency domain, and

quantize high frequencies more coarsely (or not at all)

Has the effect of slightly blurring the image - may not be

perceptable if done right.


9/39

Literature Review

Distributed Arithmetic [6,9,10]:-

DCT have been implemented using distributed mechanism.

Most often encountered form of its computation is sum of

product. The product of a pair of matrices can be realized using the DA

when one of the vectors is constant.

(1)


10/39

Distributed Arithmetic Contd..

Where, Ak is constant

Xk is the input data.

If A1,.AL are all N bits signed 2s complement

binary number, (1) can also be represented as:

(2)

(3)


11/39


In eq. (2), matrix A is a adder matrix

(4)

but it only consists of two elements: 0and 1. It is easy to

find that Y0, Y1, , YN-1 are the sum of some data fromX1,X2,XL, so the computation of Y only contains two

operations: addition and shift.


12/39


DA uses a look-up table and accumulators instead ofmultipliers.

Each single bit from each single value of the two multipliedvariables contribute only once to the sum. Because {0, 1} are

the values as discussed earlier, can be restricted to 2^n,therefore they can be pre-calculated and saved in a look-uptable to be retrieved later.

The construction of look up table used by the Distributed

Arithmetic method take large memory size

. The shift operation is implemented by wirings, which costslittle delay and hardware resources.


13/39

Fast DCT Algorithms

To overcome the extensive computation of the DCT Chen et al

[5, 15], proposed fast DCT (FDCT).

Chen used the Fast Fourier Transform (FFT) method topropose more efficient algorithm involving only real operation

for computing what he called the Fast Discrete Cosine

Transform algorithm (FDCT).

Let, The 8-point DCT can be written as a matrix transform.

Y=AX


14/39

Fast DCT Algorithms Contd

Where,

The Multiplier coefficients are given by


15/39


Where,


16/39


Due to the Symmetry of the (8 X 8) multiplication matrix, it

can be replaced by two (4x4) x (4x4) matrices, which can be

computed in parallel, as can the sums and differences forming

the vectors below


17/39


The matrices operation of the design was implemented in

terms of a plot for the signal-flow.


18/39


The Chen fast DCT signal-flow requires total of 18

multiplications.

Lee Algorithm[8, 15]: -

Lee algorithm [8] is based on the matrix representation.

In fact, the first step is nothing than a butterfly decompositionyielding to an even and an odd part.


19/39


The even part will be just a 1-D DCT of order N/2. While, theodd part will be computed through a matrix multiplication.

For 1-D DCT of order N=8, the number of operationnecessarily for these algorithm will be 32 multiplications and32 additions.


20/39

Project Design Flow

HDL Flow Matlab Flow

Taking input matrix of size 8 x 1

Apply DCT algorithm designed through HDL on it

Check simulation results for DCT output

A

Taking input matrix of size 8 x 1

Compute DCT through dct command on MATLAB

Store result of above calculated DCT

B


21/39

Project Design Flow Contd

Comparison b/w HDL & Matlab Results

Compare results of A and B

BA


22/39

Design of DCT Controller

The equation stated below is the standard equation for the 1d -

DCT

(1)

Where,

Y(u) = Coefficient value in transform domain

X(i) = Coefficient value in pixel domain.u = Co-ordinates in transform domain

i = Spatial co-ordinates in pixel domain

7

0

]16/)12cos[()(}2/)({)(

i

uiixucuy

02/1 u

02/1 u

02/1 u


23/39

Design of DCT Controller Contd...

(2)

Considering the eq. (1) following eight equations are inferred

Y(0) = [X(0) + X(1) + X(2) + X(3) + X(4) + X(5) + X(6) + X(7)] P (3.1)

Y(1) = [X(0) - X(7)]A + [X(1) - X(6)]B + [X(2) - X(5)]C + [X(3) - X(4)] D

(3.2)

Y(2) = [X(0) - X(3)X(4) + X(7)]M + [X(1) - X(2) - X(5) + X(6)] N (3.3)

Y(3) = [X(0) - X(7)]B + [X(1) - X(6)] (-D) + [X(2) - X(5)] (-A) + [X(3) -

X(4)] (-C) (3.4)

0,1

0,2/1)(

u

uuc

0,1

0,2/1)(

u

uuc

0,1

0,2/1)(

u

uuc

0,10,2/1)(

u

uuc


24/39


Y(4) = [X(0) - X(1) - X(2) + X(3) + X(4) - X(5) - X(6) + X(7)] P (3.5)

Y(5) = [X(0) - X(7)]C + [X(1) - X(6)](-A) + [X(2) - X(5)]D + [X(3) - X(4)]

B (3.6)

Y(6) = [X(0) - X(3)X(4) + X(7)]N + [X(1) - X(2) - X(5) + X(6)] (-M)

(3.7)

Y(7) = [X(0) - X(7)]D + [X(1) - X(6)] (-C) + [X(2) - X(5)] B + [X(3) -

X(4)] (-A) (3.8)


25/39


Where,

M = 0.5 * Cos(pi/8) = 0.5 * Cos (2*pi/16)

N = 0.5 * Cos(3*pi/8) = 0.5 * Cos (6*pi/16)

P = 0.5 * Cos(pi/4) = 0.5 * Cos (4*pi/16)A = 0.5 * Cos(pi/16)

B = 0.5 * Cos(3*pi/16)

C = 0.5 * Cos(5*pi/16)

D = 0.5 * Cos(7*pi/16)


26/39

Generalized Equation Implementer

BlockInput

(16 bit)

Xin0 Xin7 Xin1 Xin6 Xin2 Xin5 Xin3 Xin4

Add/Sub Block Add/Sub Block Add/Sub Block Add/Sub Block

Multiplier Multiplier Multiplier Multiplier

Adder

Output


27/39

Add/Sub Block

Adder

Substractor

MuxInput 1

Input 2Output

Sel


28/39

Multiplier Block

Output of Add/Sub Block

Multiplier

Cos Coefficient

Output


29/39

DCT Controller Interface


30/39

RTL View


31/39

Results

The DCT core is implemented in HDL. It is synthesized and simulated

using Xilinx ISE 9.2i on Spartan 3 (xc3s4000-5fg900)

Synthesis Report:-

S.No. Logic Utilization Used Available Utilization1 Number of Slices 685 27648 2%

2 Number of Slice Flip Flops 788 55296 1%

3 Number of 4 input LUTs 1238 55296 2%

4 Number of bonded IOBs 259 633 40%


32/39

Advanced HDL Synthesis Report

S.No. Component Used

1 16x16-bit Multiplier 22

2 16-bit Adder 13

3 16-bit Subtractor 15

4 16-bit Register 50


33/39

Matlab Results


34/39

VerificationOpen DCT Core Code with Xilinx ISE 9.2i

Go to the process window and double click on synthesis Button

After Successful Synthesis. Create Test Bench

Apply Stimulus to Test Bench

Select Behavioral Simulation from source window

From Process Window run Xilinx ISE Simulator

Simulation will start & generate output


35/39

Simulation Results


36/39

Comparison b/w HDL Simulation &

Matlab Results


37/39

Conclusion

The 1DDCT algorithm code was written in the Verilog HDL. It is then,

synthesized and simulated successfully through Xilinx ISE 9.2i.

Eight 8 x 1 input samples are taken, and DCT is calculated through DCT

core designed in Verilog HDL and the same inputs are used for calculatingDCT using MATLAB.

The latency of implemented core is five clock cycles and through put is one

clock cycle.

Comparison is done between output of Matlab and HDL Simulation Result

which shows that, the accuracy of implemented core is 93.75%.


38/39

References

1. R.C. Gonzalez, R.E. Woods, DigitalImage Processing,Pearson Education 3rdEdition 2008

2. David Salomon, DataCompression,The Complete Reference, 2nd Edition Springer-Verlag 1998

3. Y. C. Lim, J. B. Evans, and B. Liu, Decomposition of binary integers into signed power-of-two terms,IEEE Trans. Circuits Syst., vol. 38, no. 6, pp. 667-672,1991

4. R. J. Clark, Relationbetween the Karhunen-Lobe and cosine transform,IEEE Proc., vol. 128, pt. F, no.6, pp. 359-360, Nov.1981.

5. W. Chen, C.H.Smith, and S.C.Fralick,A fast computational algorithm for the Discrete Cosine transformIEEE,Trans.Commun.COMM-25, pp.100 1009, Sep.1977.

6. Peng Chungan, Cao Xixin, Yu Dunshan, Zhang Xing, A250MHz optimized distributed architecture of 2D8x8 DCT,7th International Conference on ASIC, pp. 189192, Oct. 2007

7. Bian Li Jian, Zeng Xuan, Tong Jia Rong, Liu Yue, AnEfficient VLSI Architecture for 2D-DCT UsingDirect Method

8. B. G. Lee, Anew algorithm to compute the discrete cosine transform,IEEE Trans. Acoust., speech, SignalProcessing, vol. ASSP-32, pp.12431245, Dec.1984.


39/39

References Contd

9. Vijay Kumar Sharma, K. K. Mahapatra and Umesh C. Pati, AnEfficient Distributed Arithmetic based VLSIArchitecture for DCT,National Institute of Technology, Rourkela, India-769008

10. Sungwook Yu DCT implementation with Distributed Arithmetic. IEEE Transactions on ComputersVolume 50, Issue 9 September 2001 Pages: 985991, year of Publication: 2001 ISSN: 0018-9340.

11. Trevor W. Fox-2002 Rapid Prototyping of Field Programmable Gate Array- Based Discrete CosineTransform ApproximationsEURASIP JASP 2003, 543- 554.

12. Anthony Edward Nelson Implementationof image processing algorithms on FPGA hardware thesis ofMaster of Science in Electrical Engineering, May 2000 Graduate School of Vanderbilt University.

13. Latha Pillai.video compression using DCT,XILINX Application Note : Virtex- 11 series.XAPP610,v1.4,April 10,2008.

14. K. R. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advantages and Applications, Academic

Press, Inc., 1990

15. Hassan EL-Banna, Alaa A. EL-Fattah, Waleed Fakhr, AnEfficient Implementation of the 1D DCT usingFPGA Technology,ICM 2003, Dec. 9-1 1, Cairo, Egypt

Documents

DCT Presentation1