Monday 27 January 2014

Encounter Quick Tip: Finding Available Cell Masters with dbGet


When you first start using dbGet, many of your queries branch off the "top" keyword and then traverse to "insts" or "nets". These searches return a list of all the instances or nets in the design. But sometimes it's necessary to query the available cell masters -- some of which may or may not be instantiated.
Common reasons for needing this are for finding things like well taps, end caps, antenna diodes and filler cells. You have a hunch what these cells are called for the library you're working on and you'd like to search through all of the cell masters currently loaded in the design.
Say for example you want to tell the tool which cells should be used as fillers. (Fillers are physical-only instances added after placement to fill gaps between standard cells to provide standard cell rail and well continuity). You can have a hunch they're called FILL-"something". Here's how to use dbGet to find the names of all the cell masters available that match FILL*:

encounter 1> dbGet head.allCells.name FILL*
FILL1 FILL16 FILL2 FILL32 FILL4 FILL64 FILL8


You can pass the output directly to setFillerMode, then call addFiller to add the instances to the design:

encounter 2> setFillerMode -core [dbGet head.allCells.name FILL*]
 

encounter 3> addFiller

Although "top" is by far the common dbGet starting point, the "head" pointer provides a link to technology information like layers, vias and more. Give it a look next time you're seeking to find technology information rather than design-specific data.
For more information on dbGet check out this post on Getting Started with dbGet.
Hope this helps.

Additional I/O Pins HOW to GET ...

How to get an additional pin to an existing one to have an I/O pin for the same net on different places.
There is no menu command or text command to do that. You can just do it by modifying a DEF or Floorplan-File in the following way (DEF shown):
 
PINS 2 ;
- netout + NET netout + DIRECTION INPUT + USE SIGNAL
 + LAYER Metal2 ( -280 0 ) ( 280 560 )
 + FIXED ( 23100 60000 ) S ;
- netout.extra1 + NET netout + DIRECTION INPUT + USE SIGNAL
 + LAYER Metal2 ( -280 0 ) ( 280 560 )
 + FIXED ( 24100 60000 ) S ;
END PINS

If bus pins have to be doubled the syntax is the following (the extra1 extension has to be added before the bus number):
 
PINS 2 ;
- netout<0> + NET netout + DIRECTION INPUT + USE SIGNAL
 + LAYER Metal2 ( -280 0 ) ( 280 560 )
 + FIXED ( 23100 60000 ) S ;
- netout.extra1<0> + NET netout + DIRECTION INPUT + USE SIGNAL
 + LAYER Metal2 ( -280 0 ) ( 280 560 )
 + FIXED ( 24100 60000 ) S ;
END PINS

Read in the modified DEF and you will see the additional pin.

Monday 7 October 2013

5 Tips to Help You Finish Your Low Power Design Tapeout On Time

So you're about to start your first low power design. Or second, third, or fourth. As with many tapeouts, you know that with today's tight market windows, most likely the project will go off with a sprinting start (architectural planning), followed by an endurance test (designing and implementing), then a final mad dash towards the finish line (signoff closure and tapeout).
First, the bad news - given the complexities of today's design requirements and the swiftness in which the technology market moves, the project crunch noted above is still going to happen. The good news? If you're implementing a low power design, there are a few things you could do to reduce last-minute problems.
These tips apply mostly to power-domain based designs that use techniques such as power shutoff (PSO), multiple supply voltages (MSV), and dynamic voltage-frequency scaling (DVFS). However, some apply to non-power domain designs too.
1. Check your low power library availability! If you are well into the physical optimization stage of your design flow, you're counting on doing always-on net optimization, then realize "uh oh, I have no always-on buffers," that translates to at least 1-2 weeks of schedule delay. Obviously you'd want to check for library requirements early in the project, but sometimes for low power designs the requirements aren't that obvious. So here's a short list of the priorities:
  • If you are doing an MSV or DVFS design, check for the availability of multiple supply voltage-characterized libraries. Sure, you could use k-factors to extrapolate delay characteristics based on different voltages, but that's a very risky practice due to inaccuracies.
  • Check for level shifters, isolation cells, power switches (headers of footers, depending on which on you are planning to use), and of course, always-on buffers and state retention cells for PSO designs if you plan on using them.
2. Plan to use at least RTL simulation vectors for your power analysis. Vector-less power analysis is okay for estimation purposes, but at some point you'll have to switch to using vectors. Now, getting gate-level activity vectors for your design might be a bit hard since that only comes after doing gate-level simulation. But, RTL simulation vectors are typically available much earlier.
The old saying "garbage in, garbage out" applies here. The quality of your power analysis is completely dependent on the quality of the activity vectors you are feeding it. If that doesn't scare you enough, think about where this information is used: besides determining whether your design will meet power consumption specs and also fit within the packaging selected for your design, this information is also used as a basis for measuring dynamic and static IR drop, electromigration and other electrical problems that might come back and bite you if not taken care of early.

3. Try to test out your clock trees before finalizing your floorplan. This is helpful especially for power domain based designs. As we know, power domain definitions place restrictions on your floorplan in terms of placement, optimization and other factors. If, for example, your clock tree root starts in a power domain that's physically far away from your PLL, you can be sure that there will be a lot of buffers added in between, which means a much higher latency.
Also, clocks that exit one power domain and enter another power domain might be affected by the power domain layout in terms of skew and transition time. So, by doing at least a trial clock tree synthesis run before you finalize your floorplan, you should be able to catch problems like this early on, and fix it before your floorplan is finalized.

4. Don't over-constrain (too much) on IR drop requirements. Let's face it: the reality is we always over-constrain our designs. We over-constrain on timing to leave us some margin towards the signoff stage, and we over-constrain on IR drop so that we'll be able to meet the IR drop requirements of the library even if we take into account some variation between implementation and signoff. The main reason for IR drop requirements is that library cell performance degrades in accordance to IR drop, so too much IR drop may lead to the design not meeting timing even though STA thinks it does.
Library providers usually build in a little margin when specifying IR drop requirements, and it's perfectly normal for designers to add another layer of margin to that when implementing. The problem comes when expectations are unrealistic for a given design. For power shutoff designs, power switches usually cause some additional IR drop to that power domain. One way to decrease IR drop is to increase the number of power switch cells, but that's a double-edged sword because additional power switches lead to more area and more leakage power, which will ultimately negate the effect of having power switches in the first place. So, you can see how we could potentially shoot ourselves in the foot if we specify an unrealistic IR drop constraint.

5. Plan out your high fanout always-on nets. Planning out high fanout nets in general is a good practice for any design, but this applies even more to power shutoff designs if they have always-on high-fanout nets (hint - they usually do). Power switch sleep enable nets, SRPG sleep nets, and others would fall into this category. If you are planning to tap from nearby always-on power supplies to power the secondary power pins of the buffers for those nets, it's best that there actually is a nearby always-on power net available.
With that said, I hope this has been useful to all the folks out there designing for low power. I'm aware that this is not an all-inclusive list. Would anyone else like to share any pointers on low power implementation? Voice your comment below!

Sunday 21 July 2013

Difference Between Launch and Capture Distances in an AOCV Analysis

In a path-based analysis, the distance of a path is the diagonal of the bounding box that encompasses all of the arcs in the path. In a graph-based analysis, an arc can be both launching and capturing. As a result, there are launch and capture distances. Maintaining separate launch and capture distances for arcs in a graph-based analysis vastly improves the accuracy of the results and allows closer correlation between the graph-based and path-based analyses.

The distinction between launch and capture distances can be best described using an example. In the schematic shown below, the BUF cell arc is treated as a capture arc. The cells that contribute to the bounding box for the BUFcell arc are highlighted in green. The launch and capture paths are shown with arrows. Note that the capture path passes through the BUF cell arc.
Capture Schematic?1292864004330
Figure 1: BUF Cell Arc Treated as a Capture Arc

In the schematic shown below, the BUF cell arc is treated as a launch arc. The cells that contribute to the bounding box for the BUF cell arc are highlighted in red. The launch and capture paths are shown with arrows. Note that thelaunch path passes through the BUF cell arc.
Launch Schematic?1292864004330
Figure 2: BUF Cell Arc Treated as a Launch Arc

You can examine the launch and capture AOCV distances and depths using the report_aocvm command. For example,

pt_shell> report_aocvm [get_timing_arcs -of U3]

Friday 5 July 2013

8 Ways to Optimize Power Using Encounter Digital Implementation (EDI) System Quick Reference

Everyone knows that the increasing speed and complexity of today's designs implies a significant increase in power consumption, which demands better optimization of your design for power. I am sure lot of us must be scratching our heads over how to achieve this, knowing that manual power optimization would be hopelessly slow and all too likely to contain errors.

Here are 8 Top Things you need to know to optimize your design for power using the Encounter Digital Implementation (EDI) System.

Given the importance of power usage of ICs at lower and lower technology nodes, it is necessary to optimize power at various stages in the flow. This blog post will focus on methods that can be used to reach an optimal solution using the EDI System in an automated and clearly defined fashion. It will give clear and concise details on what features are available within optimization, and how to use them to best reach the power goals of the design.

Please read through all of the information below before making a decision on the right approach or strategy to take. It is highly dependent on the priority of low power and what timing, runtime, area and signoff criteria were decided upon in your design. With the aid of some or all of the techniques described in this blog it is possible to, depending on the design, vastly reduce both the leakage and dynamic power consumed by the design.
 
This is a one stop quick reference and not a substitute for reading the full document.

1) VT partition uses various heuristics to gather the cells into a particular partition. Depending on how the cells get placed in a particular bucket, the design leakage can vary a lot. The first thing is to ensure that the leakage power view is correctly specified using the "set_power_analysis_mode -view" command. The "reportVtInstCount -leakage" command is a useful check to see how the cells and libraries are partitioned. Always ensure correct partitioning of cells.

2) In several designs, manually controlling certain leakage libraries in the flow might give much better results than the automated partitioning of cells. If the VT partitioning is not satisfactory, or the optimization flow is found to use more LVT cells than targeted, selectively turn off cells of certain libraries particularly in initial part of the flow i.e. preRoute flow. The user should selectively set the LVT libraries to "don't use" and run preCts/postCts optimization. Depending on final timing QOR, another incremental optimization with LVT cells enabled may be needed.

3) Depending on the importance of leakage/dynamic power in the flow, the leakage/dynamic power flow effort can be set to high or low.
setOptMode -leakagePowerEffort {low|high}
setOptMode -dynamicPowerEffort {low|high}

If timing is the first concern, but having somewhat better leakage/dynamic power is desired, then select low. If leakage/dynamic power is of utmost importance, use high.

4) PostRoute Optimization typically works with all LVT cells enabled. In case of large discrepancy between preRoute and postRoute timings or if SI timing is much worse than base timing, postRoute optimization may overuse LVT cells. So it may be worthwhile experimenting with a two pass optimization, once with LVT cells disabled, and then with LVT cells enabled.

5) In order to do quick PostRoute timing optimization to clean up final violations without doing physical updates, use the following:
setOptMode -allowOnlyCellSwapping true
optDesign -postRoute 

This will only do cell swapping to improve timing, without doing physical updates. This is specifically for timing optimization and will worsen leakage.

6) Leakage flows typically have a larger area footprint than non-leakage flows. This is because EDI trades area with power, as it uses more HVT cells to fix timing to reduce leakage. This sometimes necessitates reclaiming any extra area during postRoute Opt to get better convergence in timing. EDI has an option to turn on area reclaim postRoute which is hold aware also and will not degrade hold timing.
setOptMode -postRouteAreaReclaim holdAndSetupAware

7) Running standalone Leakage Optimization to do extra leakage reclamation:
optLeakagePower
This may be needed if some of the settings have changed or if leakage flows are not being used.

8) PreRoute Optimization works with an extra DRC Margin of 0.2 in the flow. On some designs it is known to result in extra optimization causing more runtime and worse leakage. The option below is used to reset this extra margin in DRV fixing:
setOptMode -drcMargin -0.2

Remember to reset this margin for postRoute optimization to 0, as postRoute doesn't work with this extra margin of 0.2.  Note that the extra drcMargin is sometimes useful in reducing the SI effects, so by removing the extra margin, more effort may be needed to fix SI later in the flow.
I hope these tips help you achieve your power goals of your designs!

Backend (Physical Design) Interview Questions and Answers

Do you know about input vector controlled method of leakage reduction?
  • Leakage current of a gate is dependant on its inputs also. Hence find the set of inputs which gives least leakage. By applyig this minimum leakage vector to a circuit it is possible to decrease the leakage current of the circuit when it is in the standby mode. This method is known as input vector controlled method of leakage reduction.

How can you reduce dynamic power?
  • -Reduce switching activity by designing good RTL
  • -Clock gating
  • -Architectural improvements
  • -Reduce supply voltage
  • -Use multiple voltage domains-Multi vdd
What are the vectors of dynamic power?
  • Voltage and Current

If you have both IR drop and congestion how will you fix it?
  • -Spread macros
  • -Spread standard cells
  • -Increase strap width
  • -Increase number of straps
  • -Use proper blockage

Is increasing power line width and providing more number of straps are the only solution to IR drop?
  • -Spread macros
  • -Spread standard cells
  • -Use proper blockage

In a reg to reg path if you have setup problem where will you insert buffer-near to launching flop or capture flop? Why?
  • (buffers are inserted for fixing fanout voilations and hence they reduce setup voilation; otherwise we try to fix setup voilation with the sizing of cells; now just assume that you must insert buffer !)
  • Near to capture path.
  • Because there may be other paths passing through or originating from the flop nearer to lauch flop. Hence buffer insertion may affect other paths also. It may improve all those paths or degarde. If all those paths have voilation then you may insert buffer nearer to launch flop provided it improves slack.

What is the most challenging task you handled?
What is the most challenging job in P&R flow?
  • -It may be power planning- because you found more IR drop
  • -It may be low power target-because you had more dynamic and leakage power
  • -It may be macro placement-because it had more connection with standard cells or macros
  • -It may be CTS-because you needed to handle multiple clocks and clock domain crossings
  • -It may be timing-because sizing cells in ECO flow is not meeting timing
  • -It may be library preparation-because you found some inconsistancy in libraries.
  • -It may be DRC-because you faced thousands of voilations

How will you synthesize clock tree?
  • -Single clock-normal synthesis and optimization
  • -Multiple clocks-Synthesis each clock seperately
  • -Multiple clocks with domain crossing-Synthesis each clock seperately and balance the skew

How many clocks were there in this project?
  • -It is specific to your project
  • -More the clocks more challenging !

How did you handle all those clocks?
  • -Multiple clocks-->synthesize seperately-->balance the skew-->optimize the clock tree

Are they come from seperate external resources or PLL?
  • -If it is from seperate clock sources (i.e.asynchronous; from different pads or pins) then balancing skew between these clock sources becomes challenging.
  • -If it is from PLL (i.e.synchronous) then skew balancing is comparatively easy.

Why buffers are used in clock tree?
  • To balance skew (i.e. flop to flop delay)

What is cross talk?
  • Switching of the signal in one net can interfere neigbouring net due to cross coupling capacitance.This affect is known as cros talk. Cross talk may lead setup or hold voilation.

How can you avoid cross talk?
  • -Double spacing=>more spacing=>less capacitance=>less cross talk
  • -Multiple vias=>less resistance=>less RC delay
  • -Shielding=> constant cross coupling capacitance =>known value of crosstalk
  • -Buffer insertion=>boost the victim strength

How shielding avoids crosstalk problem? What exactly happens there?
  • -High frequency noise (or glitch)is coupled to VSS (or VDD) since shilded layers are connected to either VDD or VSS.
  • Coupling capacitance remains constant with VDD or VSS.

How spacing helps in reducing crosstalk noise?
  • width is more=>more spacing between two conductors=>cross coupling capacitance is less=>less cross talk

Why double spacing and multiple vias are used related to clock?
  • Why clock?-- because it is the one signal which chages it state regularly and more compared to any other signal. If any other signal switches fast then also we can use double space.
  • Double spacing=>width is more=>capacitance is less=>less cross talk
  • Multiple vias=>resistance in parellel=>less resistance=>less RC delay


How buffer can be used in victim to avoid crosstalk?
  • Buffer increase victims signal strength; buffers break the net length=>victims are more tolerant to coupled signal from aggressor.

 more questions comming soon... :-)

Challenges of 20nm IC Design

Saleem Haider, Synopsys interview....

Designing at the 20nm node is harder than at 28nm, mostly because of the lithography and process variability challenges that in turn require changes to EDA tools and mask making. The attraction of 20nm design is realizing SoCs with 20 billion transistors. Synopsys has re-tooled their EDA software to enable 20nm design.




20nm Geometries with 193nm Wavelength

Using immersion lithography the clever process development engineers have figured out how to resolve 20nm geometries using 193nm wavelength light, however to make these geometries yield now requires two separate masks, called Double Patterning Technology (DPT).

                                             Figure 1: Immersion Lithography

With DPT you have to split a single layer like Poly or Metal 1 onto two separate masks, then the exposures from the two masks are overlaid to produce that layer with 20nm geometries.

                           
                                               Figure 2: Double Patterning Technology (DPT)

Looking ahead to 14nm and smaller nodes this trend will continue with three or more patterns per layer.

When a mask layer is turned into two parts the process is called coloring, and the trick is to make sure that two adjacent geometries are on different colors.

                
                                                     Figure 3 :DPT Coloring

With DPT you have to make sure that your cell library and Place & Route tool are both DPT-compliant.

Often in your IC layout the DPT process will have to use stitching to accomodate via arrays:


This stitching will cause issues with line-end effects that in turn can degrade yield:

                

The earlier that you identify these issues, the sooner that you can make engineering trade-offs.

Foundries create layout rules at 20nm to specify how to produce high yield, and there are some 5,000 rules at this node.

Using DPT techniques will also cause a variation in capacitance values between adjacent nets caused by subtle shifts in the double masks.

                                  

DPT-Ready EDA Tools
Synopsys has updated their EDA tools to enable 20nm design, specifically:





Q&A

Q: Where can I read more about 20nm design with Synopsys tools?
A: Achronix did a paper at the Synopsys User Group, and they fabricated at Intel's custom foundry using FinFET technology.

Q: How popular is your DRC and LVS tool, IC Validator?
A: There have been 100 tapeouts in the past year for IC Validator tool.

Q: How many 20nm designs are there?
A: Test chips were done first last year, and now production designs are taping out with commercial foundries.

Q: How many mask layers require DPT in a 20nm design?
A: It depends on the foundry. First layer metal, maybe second layer of metal. As you relax the metal pitch, then you don't need DPT. Poly needs DPT.

Q: What about mask costs at 20nm with DPT?
A: It adds to the costs. It's always a trade off, the foundry can relax the pitches and void DPT usage.

Q: Which foundries have qualified 20nm with Synopsys tools?
A: TSMC, Samsung, GLOBALFOUNDRIES have qualified and endorse the Synopsys flow for 20nm.

Q: What can you tell me about your Custom IC design tools?
A: Our custom tools are also DPT aware, (SpringSoft, CiraNova, Custom Designer) - coming together.

Q: Why should I visit Synopsys at DAC?
A: We'll have live product demos, talk about advanced nodes, show emerging nodes, 14nm, 16nm, discuss new product features, and have special events. There is an IC Compiler luncheon where customers speak, and that's on Monday.

   more information at  http://www.synopsys.com/Solutions/EndSolutions/20nmdesign/Documents/20nm-and-beyond-white-paper.pdf