Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

of 37
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report



Views: 10 | Pages: 37

Extension: PDF | Download: 0

Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis. Tiberiu Chelcea , Girish Venkataramani, Seth C. Goldstein Department of Computer Science Carnegie Mellon University. QDI: Orphans problem. Early propagation : “A” arrives early => Z transitions
Area Optimizations for Dual-Rail Circuits Using Relative-Timing AnalysisTiberiu Chelcea, Girish Venkataramani, Seth C. GoldsteinDepartment of Computer ScienceCarnegie Mellon UniversityQDI: Orphans problem
  • Early propagation:
  • “A” arrives early => Z transitions
  • Stale values on the other signals
  • Incorrect behavior: inputs acknowledged before being received
  • A1X1B1A0X0B0Z1C1Y1Z0D1C0Y0D0DoneADoneCNCL-X solutionA1X1B1A0X0B0Z1N1C1Y1Z0D1C0N3Y0D0N2Add completion detectionQDI Gate DelaysQDI implementations always assume the worst:equal probability for any gate delayMotivation
  • Quasi-Delay Insensitive (QDI) circuits:
  • One timing constraint
  • Naturally tolerate parametric variation, but…
  • Have large area overheads
  • Added completion detection for correctness
  • Goal: pay only what is necessaryParametric Variation and Gate DelaysITRS’05: 35% parametricvariation by 2020Use timing information to reduce size of completion detectionUse mixed gates to further reduce areaw/ early propagationw/o early propagationregular gatesstrict gatesGoal: Optimizing Sync→Async FlowContributionsThree new relative-timing area optimizations:
  • Direct method:
  • Timing analysis + simple CD elimination
  • Greedy method: fast but not optimal
  • Uses strict gates, but may increase area
  • Exact method: optimal, but slow
  • Solves an mILP problem
  • Outline
  • Timing analysis & Direct Optimization
  • Greedy optimization method
  • Exact optimization method
  • Results
  • Conclusions
  • Basics
  • QDI circuits:
  • Unbounded but finite delays on gates and wires
  • One timing assumption: isochronic fork
  • Timed circuits:
  • Delays on gates and wires: bounded time intervals
  • Given input arrival times: compute propagation intervals for each gate and wire
  • GlobalPI(1.5,1.9)(1.0,1.2)(0,0)(0,0)(1.1,1.2)(0.5,0.7)(0,0)(2.0,5.6)(2.0,5.6)(3.0,4.0)(0,0)(0.5,0.7)(0,0)(3.6,4.9)(3.5,4.1)(0.6,0.8)(3.6,4.9)(0,0)(0,0)Timing ComputationX
  • Conservative assumption: any input change can trigger an output change
  • A(1.5,1.9)BN1ZCN3DN2YDoneUnder any input change, gate quiescent when output produced1.9 < 2.0(1.0,1.2)(1.1,1.2)CDirect Optimization Method(1.5,1.9)X
  • Gate completion detection iff gate may not be stable when outputs are produced
  • A(1.5,1.9)BN1Z(2.0,5.6)(2.0,5.6)(3.0,4.0)C(3.6,4.9)N3(3.5,4.1)D(3.6,4.9)N2YAll inputs must arrive before producing an outputEliminate early propagation effectExtremely expensiveDecrease length of propagation intervalCCCStrict GatesAB(1.5,1.9)(1.0,1.2)(1.5,1.9)(1.1,1.2)(5.0,6.8)(5.0,6.8)(3.0,4.0)Done(3.6,4.9)(3.5,4.1)(3.6,4.9)Timing Computation with Strict GatesXA
  • Entire completion detection: single OR gate
  • BN1ZC(1.4,1.9)N3DN2Y
  • This circuit: area not reduced
  • Goal: smart insertion of strict gates
  • Outline
  • Timing analysis & Direct Optimization
  • Greedy optimization method
  • Exact optimization method
  • Results
  • Conclusions
  • Greedy Optimization (1)
  • Strict gates: area implications
  • GlobalPI may be narrower and delayed
  • Fewer gates non-quiescent
  • Smaller completion detection
  • Greedy optimization framework:
  • Flip gates in the circuit from normal to strict
  • Select most promising candidate
  • Continue until no improvements possible
  • Greedy Optimization (2)Algorithm:
  • For each gate Gi in the circuit
  • Flip each gate Gi in turn from regular to strict
  • Perform timing analysis, compute GlobalPIi
  • Flip back Gi to regular
  • Select Gk with the narrowest GlobalPIk
  • If GlobalPIk narrower than previous best:
  • Flip Gk to strict permanently
  • Continue (goto 1)
  • Else: finishGreedy Optimization (3)
  • Algorithm does not optimize for area directly
  • Instead: may reduce the completion detection by narrowing the output interval
  • Results promising, but individual benchmarks may result in larger area
  • Outline
  • Timing analysis & Direct Method
  • Greedy optimization method
  • Exact optimization method
  • Results
  • Conclusions
  • Exact Optimization Method
  • mixed Integer Linear Programming (mILP)
  • Transform circuit graph into an optimization problem:
  • Introduce variables for each gate, wire and primary input/output
  • Matrix coefficients: from library (gate areas) and back-annotation (gate/wire delays) files
  • Decision variables (GS) should gate be strict?
  • mILP formulation
  • Minimize: TotalArea = GateArea+CDArea
  • GateArea = i (GSi·SAreai + (1-GSi)·NAreai)
  • CDArea = SCD·Or2Area + (SCD-1)·CArea
  • SCD: # gates that need completion detection
  • NeedsCD: does a gate need CD?
  • NeedsCD = 0 if PIM < GlobalPIm or successor is strict; otherwise 1
  • Rest of the model implements timing computation
  • Improving the mILP Model
  • Basic mILP model: too slow even for small circuits (hours for dozen gates)
  • Leverage problem knowledge into model improvements:
  • Branching order: gates closer to the output are more likely to become strict => inspected first
  • Single input gates: never strict
  • Provide initial solution (result of greedy opt)
  • Can solve problems with hundreds of gates in minutes
  • Related Work: Optimizations
  • Cortadella et al:
  • logical function decompositions
  • can achieve substantial area savings
  • can be the starting point for our methods
  • Zhou et al: consider strict gates in optimization, but no timing information
  • Sokolov et al: two timing optimizations
  • Alternate levels: unrealistic assumptions for gate delays
  • Longest path: applicable only for small circuits
  • Experimental Setup
  • Tool flow:
  • Synthesis & tech-mapping with Synopsys Design Compiler
  • Perl scripts for dual-rail implementations
  • Optimization tool reads structural Verilog and timing back-annotations
  • End result: optimized circuits (Verilog)
  • Experiments:
  • Arithmetic and ISCAS’89 benchmarks
  • Pre-layout runs in 0.18m technology
  • Greedy: 2.83x NCL-X areafor le32Direct: 0.83xGreedy: 0.55xmILP: 0.43xmILP does not finish inless than 1 hourPartial resultsArea: Ratio vs. NCL-X method8/168 strict4.7% before → 40% afterOver twice as small than NCL-XArea breakdownParametric Variation: BK adderConclusions
  • Paper introduced:
  • a method to translate synchronous circuits into optimized asynchronous circuits
  • Three new relative timing optimizations for improving area
  • Direct: extremely simple
  • Greedy: fast, good results
  • Exact: optimal, may be extremely slow
  • Analyzed the impact of parametric variation on these circuits
  • Backup slidesOutline
  • Background
  • Timing analysis & Direct Optimization
  • Greedy optimization method
  • Exact optimization method
  • Results
  • Conclusions
  • Introduction
  • Future deep sub-micron technologies:
  • large parametric variations (ITRS’05 predicts 35% by 2020).
  • Asynchronous design a natural fit
  • Asynchronous handshaking: widespread
  • Acceptance for asynchronous circuits is predicated on quality CAD tools:
  • “Pure” async: from scratch
  • Sync to async translation
  • A1X1B1A0X0B0Z1N1C1Y1Z0D1C0N3Y0D0N2Dual-rail circuitSynchronous to Asynchronous TranslationZ = (A·B)·(C+D)AXN1BZN3CN2DYSynchronous circuitTemplate-based replacement of each sync gateRelated Work
  • Numerous approaches for translating synchronous circuits into asynchronous
  • Dealing with the orphans problem:
  • Kondratiev et al: NCL-X (discussed below)
  • Brej: anti-tokens
  • Allows for early propagation
  • Completion detection in background
  • Even larger area overheads
  • CrtSol: current bestInteger solutionBest Estimation: best guess ofhow far the optimum isWhen 0, optimum foundILP optimization for 32-bit BK adderOutline
  • Timing analysis & Direc Optimization
  • Greedy optimization method
  • Exact optimization method
  • Results
  • Conclusions
  • 8/168 strict4.7% before → 40% afterOver twice as small than NCL-XArea breakdownmILP Run Time
    View more...
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks

    We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

    More details...

    Sign Now!

    We are very appreciated for your Prompt Action!