scispace - formally typeset
Search or ask a question

Showing papers presented at "Asia and South Pacific Design Automation Conference in 2006"


Proceedings ArticleDOI
24 Jan 2006
TL;DR: The design and implementation of a low-latency on-chip network that is able to route flits in a single clock cycle, helping to minimise on- chip communication latencies and maximise the effectiveness of buffering resources is detailed.
Abstract: Many of the issues that will be faced by the designers of multi-billion transistor chips may be alleviated by the presence of a flexible global communication infrastructure. In the short term, such a network will provide scalable chip-wide communication and ease the complexity of handling multi-cycle communications. In the long term, the network will become a primary tool for optimising power and data transfers and for scheduling computations. This paper details the design and implementation of a low-latency on-chip network. The network's speculative routers are in the best case able to route flits in a single clock cycle, helping to minimise on-chip communication latencies and maximise the effectiveness of buffering resources. Results from our 180nm test chip demonstrate an inter-router data transfer rate in excess of 16Gbit/s for each link. In the best case each router hop adds just 1 clock cycle to the final communication latency.

120 citations


Proceedings ArticleDOI
24 Jan 2006
TL;DR: A temperature-aware 3D global routing algorithm with insertion of "Thermal vias" and "thermal wires" to lower the effective thermal resistance of the material, thereby reducing chip temperature.
Abstract: 3D integrated circuits (3D ICs) provide an attractive solution for improving circuit performance. Such solutions must be embedded in an electrothermally-conscious design methodology, since 3D ICs generate a significant amount of heat per unit volume. In this paper, we propose a temperature-aware 3D global routing algorithm with insertion of "thermal vias" and "thermal wires" to lower the effective thermal resistance of the material, thereby reducing chip temperature. Since thermal vias and thermal wires take up lateral routing space, our algorithm utilizes sensitivity analysis to judiciously allocate their usage, and iteratively resolve contention between routing and thermal vias and thermal wires. Experimental results show that our routing algorithm can effectively reduce the peak temperature and alleviate routing congestion.

103 citations


Proceedings ArticleDOI
24 Jan 2006
TL;DR: This work presents a method to efficiently map the applications on to the NoC architecture, satisfying the design constraints of each individual use-case, and explores the possibility of integrating dynamic voltage and frequency scaling (DVS/DFS) techniques with the use- case centric NoC design methodology.
Abstract: To provide a scalable communication infrastructure for systems on chips (SoCs), networks on chips (NoCs), a communication centric design paradigm is needed. To be cost effective, SoCs are often programmable and integrate several different applications or use-cases on to the same chip. For the SoC platform to support the different use-cases, the NoC architecture should satisfy the performance constraints of each individual use-case. In this work we motivate the need to consider multiple use-cases during the NoC design process. We present a method to efficiently map the applications on to the NoC architecture, satisfying the design constraints of each individual use-case. We also present novel ways to dynamically reconfigure the network across the different use-cases and explore the possibility of integrating dynamic voltage and frequency scaling (DVS/DFS) techniques with the use-case centric NoC design methodology. We validate the performance of the design methodology on several SoC applications. The dynamic reconfiguration of the NoC integrated with DVS/DFS schemes results in large power savings for the resulting NoC systems.

101 citations


Proceedings ArticleDOI
24 Jan 2006
TL;DR: Simulation results show that, when combined with either deterministic or adaptive output selection, CAIS achieves significant better performance than the traditional first-come-first-served (FCFS) input selection, with low hardware overhead.
Abstract: The performance of network-on-chip (NoC) largely depends on the underlying routing techniques, which have two constituencies: output selection and input selection. Previous research on routing techniques for NoC has focused on the improvement of output selection. This paper investigates the impact of input selection, and presents a novel contention-aware input selection (CAIS) technique for NoC that improves the routing efficiency. When there are contentions of multiple input channels competing for the same output channel, CAIS decides which input channel obtains the access depending on the contention level of the upstream switches, which in turn removes possible network congestion. Simulation results with different synthetic and real-life traffic patterns show that, when combined with either deterministic or adaptive output selection, CAIS achieves significant better performance than the traditional first-come-first-served (FCFS) input selection, with low hardware overhead (<3%).

93 citations


Proceedings ArticleDOI
24 Jan 2006
TL;DR: This paper presents an efficient graph construction algorithm to model the problem of post-routing redundant via insertion and an effective MIS heuristic to solve the problem, and shows that this heuristic inserts more redundant vias and distributes them more uniformly among via layers than a commercial tool and an existing method.
Abstract: Reducing the yield loss due to via failure is one of the important problems in design for manufacturability. A well known and highly recommended method to improve via yield/reliability is to add redundant vias. In this paper, we study the problem of post-routing redundant via insertion and formulate it as a maximum independent set (MIS) problem. We present an efficient graph construction algorithm to model the problem, and an effective MIS heuristic to solve the problem. The experimental results show that our MIS heuristic inserts more redundant vias and distributes them more uniformly among via layers than a commercial tool and an existing method. The number of inserted redundant vias can be increased by up to 21.24%. Besides, since redundant vias can be classified into on-track and off-track ones, and on-track ones have better electrical properties, we also present two methods (one is modified from the MIS heuristic, and the other is applied as a post processor) to increase the amount of on-track redundant vias. The experimental results indicate that both methods perform very well.

88 citations


Proceedings ArticleDOI
24 Jan 2006
TL;DR: By combining with many bandwidth reduction techniques and data reused schemes, very efficient architecture and implementation for plate-form based system is proved by the prototype chips.
Abstract: H.264/AVC is the latest video coding standard. It significantly outperforms the previous video coding standards, but the extraordinary huge computation complexity and memory access requirement make the hardwired codec solution a tough job. This paper describes the design methodology for H.264/AVC video codec. The system architecture and scheduling will be addressed. The design consideration and optimization for its significant modules including bandwidth optimized motion compensation engine, reconfigurable intra predictor generator, low bandwidth parallel integer motion estimation will be mentioned. Due to the complex, sequential, and highly data-depended characteristics of all essential algorithms in H.264/AVC, not only the pipeline structure but also efficient memory hierarchy is required. The design case with a hybrid task pipelining scheme, a balanced schedule with block-level, MB-level, and frame-level pipelining, will be presented. By combining with many bandwidth reduction techniques and data reused schemes, very efficient architecture and implementation for plate-form based system is proved by the prototype chips.

80 citations


Proceedings ArticleDOI
24 Jan 2006
TL;DR: This framework provides designers with the system-level power profile in a cycle-accurate manner and is targeted to run fast and accurately, which is enabled by adopting different modeling techniques depending on the power characteristics of various IP blocks.
Abstract: In this work, we propose a SoC power estimation framework built on our system-level simulation environment. Our framework provides designers with the system-level power profile in a cycle-accurate manner. We target the framework to run fast and accurately, which is enabled by adopting different modeling techniques depending on the power characteristics of various IP blocks. The framework can be applied to any target SoC design.

80 citations


Proceedings ArticleDOI
24 Jan 2006
TL;DR: An analytic method is presented to determine the provably smallest possible slot length that must be allocated in a TDMA resource, to serve an event-triggered hard real-time load with arbitrary deterministic timing behavior.
Abstract: We present an analytic method to determine the provably smallest possible slot length that must be allocated in a TDMA resource, to serve an event-triggered hard real-time load with arbitrary deterministic timing behavior. Based on this method, we then present constructive methods to find all feasible as well as the optimal cycle length in a TDMA resource, and we show how to determine the minimum required band-width of a TDMA resource. We demonstrate the applicability and computational efficiency of the presented methods in a case study of a large distributed embedded system with a TDMA bus, where we will find the optimal parameter set for the TDMA bus.

75 citations


Proceedings ArticleDOI
24 Jan 2006
TL;DR: A three-phase solution framework, which integrates power management scheduling and task voltage assignment, is proposed, which outperforms existing methods by an average of 18% in terms of the system-wide energy savings.
Abstract: This paper addresses the problem of minimizing energy consumption of a computer system performing periodic hard real-time tasks with precedence constraints. In the proposed approach, dynamic power management and voltage scaling techniques are combined to reduce the energy consumption of the CPU and devices. The optimization problem is first formulated as an integer programming problem. Next, a three-phase solution framework, which integrates power management scheduling and task voltage assignment, is proposed. Experimental results show that the proposed approach outperforms existing methods by an average of 18% in terms of the system-wide energy savings.

69 citations


Proceedings ArticleDOI
24 Jan 2006
TL;DR: A method is given to rapidly find the L1 cache miss rate of an application and an energy model and an execution time model are developed to find the best cache configuration for the given embedded application.
Abstract: Modern embedded system execute a single application or a class of applications repeatedly. A new emerging methodology of designing embedded system utilizes configurable processors where the cache size, associativity, and line size can be chosen by the designer. In this paper, a method is given to rapidly find the L1 cache miss rate of an application. An energy model and an execution time model are developed to find the best cache configuration for the given embedded application. Using benchmarks from Mediabench, we find that our method is on average 45 times faster to explore the design space, compared to Dinero IV while still having 100% accuracy.

67 citations


Proceedings ArticleDOI
24 Jan 2006
TL;DR: A near optimal hardware architecture for deblocking filter in H.264/MPEG-4 AVC with novel filtering order and a data reuse strategy that result in significant saving in filtering time, local memory usage, and memory traffic is proposed.
Abstract: We propose a near optimal hardware architecture for deblocking filter in H.264/MPEG-4 AVC. We propose a novel filtering order and a data reuse strategy that result in significant saving in filtering time, local memory usage, and memory traffic. Every 16-16 macroblock requires 192 filtering operations. After a few initialization cycles, our 5-stage pipelined architecture is able to perform one filtering operation per cycle. Compared with some state-of-the-art designs, our architecture delivers the fastest level of performance while using much smaller gate count and memory. We have implemented and integrated the proposed deblocking filter into an H.264 main profile video decoder and verified it with an FPGA prototype.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: This paper proposes an automated approach for synthesizing a bus matrix communication architecture which satisfies all performance constraints in the design and minimizes wire congestion in the matrix and shows that this approach results in up to 9times component savings when compared to a full bus matrix and up to 3.2times savings whenCompared to a maximally connected reduced bus matrix.
Abstract: Modern multi-processor system-on-chip (MPSoC) designs have high bandwidth constraints which must be satisfied by the underlying communication architecture. Bus matrix based communication architectures consist of several parallel buses which provide a suitable backbone to support high bandwidth systems, but suffer from high cost overhead due to extensive bus wiring inside the matrix. Manual traversal of the vast exploration space to synthesize a minimal cost bus matrix that also satisfies performance constraints is practically infeasible. In this paper, we address this problem by proposing an automated approach for synthesizing a bus matrix communication architecture which satisfies all performance constraints in the design and minimizes wire congestion in the matrix. To validate our approach, we consider several industrial strength applications from the networking domain and show that our approach results in up to 9/spl times/ component savings when compared to a full bus matrix and up to 3.2/spl times/ savings when compared to a maximally connected reduced bus matrix.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: This paper deals with optimization of the instruction memory scratchpad based on a methodology that uses a metric which is used to find basic blocks which are executed frequently and in close proximity in time, called the concomitance.
Abstract: Scratchpad memory has been introduced as a replacement for cache memory as it improves the performance of certain embedded systems. Additionally, it has also been demonstrated that scratchpad memory can significantly reduce the energy consumption of the memory hierarchy of embedded systems. This is significant, as the memory hierarchy consumes a substantial proportion of the total energy of an embedded system. This paper deals with optimization of the instruction memory scratchpad based on a methodology that uses a metric which we call the concomitance. This metric is used to find basic blocks which are executed frequently and in close proximity in time. Once such blocks are found, they are copied into the scratchpad memory at appropriate times; this is achieved using a special instruction inserted into the code at appropriate places. For a set of benchmarks taken from Mediabench, our scratchpad system consumed just 59% (avg) of the energy of the cache system, and 73% (avg) of the energy of the state of the art scratchpad system, while improving the overall performance. Compared to the state of the art method, the number of instructions copied into the scratchpad memory from the main memory is reduced by 88%.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: MEVA-3D, an automated physical design and architecture performance estimation flow for 3D architectural evaluation which includes 3D floorplanning, routing, interconnect pipelining and automated thermal via insertion, and associated die size, performance, and thermal modeling capabilities, is developed.
Abstract: Although the emerging three-dimensional integration technology can significantly reduce interconnect delay, chip area, and power dissipation in nanometer technologies, its impact on overall system performance is still poorly understood due to the lack of tools and systematic flows to evaluate 3D microarchitectural designs. The contribution of this paper is the development of MEVA-3D, an automated physical design and architecture performance estimation flow for 3D architectural evaluation which includes 3D floorplanning, routing, interconnect pipelining and automated thermal via insertion, and associated die size, performance, and thermal modeling capabilities. We apply this flow to a simple, out-of-order superscalar microprocessor to evaluate the performance and thermal behavior in 2D and 3D designs, and demonstrate the value of MEVA-3D in providing quantitative evaluation results to guide 3D architecture designs. In particular, we show that it is feasible to manage thermal challenges with a combination of thermal vias and double-sided heat sinks, and report modest system performance gains in 3D designs for these simple test examples.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: An efficient and accurate thermal-aware floor-planning high-level synthesis system that makes use of integrated high- level and physical-level thermal optimization techniques to reduce the design's power consumption and peak temperature is proposed.
Abstract: Thermal effects are becoming increasingly important during integrated circuit design. Thermal characteristics influence reliability, power consumption, cooling costs, and performance. It is necessary to consider thermal effects during all levels of the design process, from the architectural level to the physical level. However, design-time temperature prediction requires access to block placement, wire models, power profile, and a chip-package thermal model. Thermal-aware design and synthesis necessarily couple architectural-level design decisions (e.g., scheduling) with physical design (e.g., floorplanning) and modeling (e.g., wire and thermal modeling). This article proposes an efficient and accurate thermal-aware floor-planning high-level synthesis system that makes use of integrated high-level and physical-level thermal optimization techniques. Voltage islands are automatically generated via novel slack distribution and voltage partitioning algorithms in order to reduce the design's power consumption and peak temperature. A new thermal-aware floorplanning technique is proposed to balance chip thermal profile, thereby further reducing peak temperature. The proposed system was used to synthesize a number of benchmarks, yielding numerous designs that trade off peak temperature, integrated circuit area, and power consumption. The proposed techniques reduces peak temperature by 12.5/spl deg/C on average. When used to minimize peak temperature with a fixed area, peak temperature reductions are common. Under a constraint on peak temperature, integrated circuit area is reduced by 9.9% on average.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: A profit-aware yield model is proposed, based on which a statistical design methodology is presented to improve profit of a design considering frequency binning and product price profile, and a low-complexity sensitivity-based gate sizing algorithm is developed to improve the profitability of design over an initial yield-optimized design.
Abstract: Designing high-performance systems with high yield under parameter variations has raised serious design challenges in nanometer technologies. In this paper, we propose a profit-aware yield model, based on which we present a statistical design methodology to improve profit of a design considering frequency binning and product price profile. A low-complexity sensitivity-based gate sizing algorithm is developed to improve the profitability of design over an initial yield-optimized design. We also propose an algorithm to determine optimal bin boundaries for maximizing profit with frequency binning. Finally, we present an integrated design methodology for simultaneous sizing and bin placement to enhance profit under an area constraint. Experiments on a set of ISCAS85 benchmarks show up to 26% (36%) improvement in profit for fixed bin (for simultaneous sizing and bin placement) with three frequency bins considering both leakage and delay bounds compared to a design optimized for 90% yield at iso-area.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: This paper conducts extensive fault injection experiments on five benchmark examples using a cycle-accurate processor simulator and shows the existence of optimal or Pareto-optimal cache size selection to optimize the three design objectives.
Abstract: Improving performance, reducing energy consumption and enhancing reliability are three important objectives for embedded computing systems design. In this paper, we study the joint impact of cache size selection on these three objectives. For this purpose, we conduct extensive fault injection experiments on five benchmark examples using a cycle-accurate processor simulator. Performance and reliability are analyzed using the performability metric. Overall, our experiments demonstrate the importance of a careful cache size selection when designing energy-efficient and reliable systems. Furthermore, the experimental results show the existence of optimal or Pareto-optimal cache size selection to optimize the three design objectives.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: A three-step approach, named XDP, for mixed-size detailed placement, which is the only detailed placement that successfully produces legal placement for all the examples, while APlace and Fengshui fail for 4/9 and 1/3 of the examples.
Abstract: The rapid increase in IC design complexity and wide-spread use of intellectual-property (IP) blocks have made the so-called mixed-size placement a very important topic in recent years. Although several algorithms have been proposed for mixed-sized placements, most of them primarily focus on the global placement aspect. In this paper we propose a three-step approach, named XDP, for mixed-size detailed placement. First, a combination of constraint graph and linear programming is used to legalize macros. Then, an enhanced greedy method is used to legalize the standard cells. Finally, a sliding-window-based cell swapping is applied to further reduce wirelength. The impact of individual techniques is analyzed and quantified. Experiments show that when applied to the set of global placement results generated by APlace (Kahng and Wang, 2004), XDP can produce wirelength comparable to the native detailed placement of APlace, and 3% shorter wire-length compared to Fengshui 5.0 (Agnitori et al., 2005). When applied to the set of global placements generated by mPL6 (Chan et al., 2005), XDP is the only detailed placement that successfully produces legal placement for all the examples, while APlace and Fengshui fail for 4/9 and 1/3 of the examples. For cases where legal placements can be compared, the wirelength produced by XDP is shorter by 3% on average compared to APlace and Fengshui. Furthermore, XDP displays a higher robustness than the other tools by covering a broader spectrum of examples by different global placement tools.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: It is demonstrated that the proposed flip-flops are also suitable for enhanced scan based delay fault testing, which allows arbitrary two-pattern test application for the best combinational path testability.
Abstract: With technology scaling, soft error resilience is becoming a major concern in circuit design. This paper presents a class of low-overhead flip-flops suitable for soft error detection and correction. The proposed design reuses logic elements typically available in a standard-cell implementation of a flip-flop to reduce hardware overhead. We demonstrate that the proposed flip-flops are also suitable for enhanced scan based delay fault testing, which allows arbitrary two-pattern test application for the best combinational path testability. The proposed flip-flops show an average power reduction of 16% and area improvement of 17% compared to the best alternative techniques with no additional delay overhead.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: An effective and efficient yield-preferred via insertion method based on a novel geotopological layout platform, GEOTOP, that chooses the most yield-favored via candidate and insert it into the layout without causing any design rule violations.
Abstract: Yield-preferred via insertion is an effective method to reduce the yield loss caused by via failures. The existing methods to apply the redundant-cut vias in metal layers are not efficient nor adequate. In this paper, we present an effective and efficient yield-preferred via insertion method based on a novel geotopological layout platform, GEOTOP. Our method chooses the most yield-favored via candidate and insert it into the layout without causing any design rule violations. Experiments with real industry designs show that our method can achieve very high rate of yield-preferred via without increasing the design die size within acceptable running time.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: The method begins with timing and power driven coarse placement, followed by a few iterations between voltage assignment and placement refinement to generate voltage islands, aiming to reduce total power under timing constraints and to implement voltage islands with minimal overheads.
Abstract: In this paper we propose a method for standard cell placement with support for dual supply voltages, aiming to reduce total power under timing constraints and to implement voltage islands with minimal overheads. The method begins with timing and power driven coarse placement, followed by a few iterations between voltage assignment and placement refinement to generate voltage islands. Several techniques, including timing and power driven net weighting, seed growth based voltage assignment, and soft clustering strategy for placement refinements are employed in our implementation. Experimental results on a set of MCNC benchmarks show that our approach is able to produce feasible placement for dual-Vdd designs and significantly reduce total power with a wirelength increase within 14% compared to a power and timing driven placer without voltage islands.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: Three efficient DVS techniques for an MPEG decoder are presented and a workload prediction model is developed based on the block level statistics of each MPEG frame that exhibits a remarkable improvement in accuracy of the prediction.
Abstract: In this paper we present three efficient DVS techniques for an MPEG decoder. Their energy reduction is comparable to that of the optimal solution. A workload prediction model is also developed based on the block level statistics of each MPEG frame. Compared with previous works, the new model exhibits a remarkable improvement in accuracy of the prediction. The experimental results show that, with the new prediction model, the presented DVS techniques achieve more energy reduction than previous works while delivering the same Quality of Service (QoS).

Proceedings ArticleDOI
24 Jan 2006
TL;DR: A SystemC library for POSIX modeling and simulation is presented that provides an early and fast estimation of the performance of the system as a consequence of the architectural mapping decisions and supports high-level design-space exploration.
Abstract: Early estimation of the execution time of real-time embedded SW is an essential task in complex, HW/SW embedded system design. Application SW execution time estimation requires taking into account the impact of the underlying RTOS. As a consequence, RTOS modeling is becoming an active research area. SystemC provides a framework for multiprocessing, HW/SW co-simulation at several abstraction levels. In this paper, a SystemC library for POSIX modeling and simulation is presented. By using the library, the SystemC specification using POSIX functions is converted automatically into a timed simulation estimating the execution time of the application SW running on the POSIX platform. The library works directly on the source code. Therefore, it provides an early and fast estimation of the performance of the system as a consequence of the architectural mapping decisions. Although accuracy is lower than when using lower-level techniques, it supports high-level design-space exploration as simulation time is significantly less than RT (ISS) simulation.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: The data suggest that localization, generalization, and MUS extraction from both the abstract and concrete models are essential for effective verification and refinement tends to converge faster when multiple MUses are extracted in each iteration.
Abstract: In this paper, we explore the application of counter-example-guided abstraction refinement (CEGAR) in the context of microprocessor correspondence checking. The approach utilizes automatic datapath abstraction augmented with automatic refinement based on 1) localization, 2) generalization, and 3) minimal unsatisfiable subset (MUS) extraction. We introduce several refinement strategies and empirically evaluate their effectiveness on a set of microprocessor benchmarks. The data suggest that localization, generalization, and MUS extraction from both the abstract and concrete models are essential for effective verification. Additionally, refinement tends to converge faster when multiple MUses are extracted in each iteration.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: A technique that locates the task next to the borders of the free area for as many cycles as possible, trying to minimize the area fragmentation is proposed and combined with a look-ahead heuristic that allows delaying the scheduling of a task to the next event, increasing the solution search space.
Abstract: To get efficient HW management in 2D reconfigurable systems, heuristics are needed to select the best place to locate each arriving task. We propose a technique that locates the task next to the borders of the free area for as many cycles as possible, trying to minimize the area fragmentation. Moreover, we combine it with a look-ahead heuristic that allows delaying the scheduling of a task to the next event, increasing the solution search space.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: This paper presents an analytical floorplanning algorithm that can be used to efficiently pack soft modules into a fixed die and the efficiency is much higher than that of the simulated annealing based algorithms for benchmarks containing a large number of modules.
Abstract: Fixed-die floorplanning is an important problem in the modern physical design process. An effective floorplanning algorithm is crucial to improving both the quality and the time-to-market of the design. In this paper, we present an analytical floorplanning algorithm that can be used to efficiently pack soft modules into a fixed die. The locations and sizing of the modules are simultaneously optimized so that a minimum total wire length is achieved. Experiments on the MCNC and GSRC benchmarks show that our algorithm can achieve above a 90% success rate with a 10% white space constraint in the fixed die, and the efficiency is much higher than that of the simulated annealing based algorithms for benchmarks containing a large number of modules.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: This paper forms a constraint-driven I/O planning and placement problem, and solves it by a multi-step algorithm based upon integer linear programming, and shows that the proposed algorithm can effectively find a large scale I/o placement solution and satisfy all given design constraints in less than 10 minutes.
Abstract: System-on-chip and system-in-package result in increased number of I/O cells and complicated constraints for both chip designs and package designs. This renders the traditional manually tuned and chip-centered I/O designs suboptimal in terms of both turn around time and design quality. In this paper, we formally introduce a set of design constraints suitable for chip-package co-design. We formulate a constraint-driven I/O planning and placement problem, and solve it by a multi-step algorithm based upon integer linear programming. Experiment results using real industry designs show that the proposed algorithm can effectively find a large scale I/O placement solution and satisfy all given design constraints in less than 10 minutes. In contrast, the state-of-the-art without considering those design constraints simply cannot meet all design constraints by relying solely upon the conventional iterative approach.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: A capacitor-less low dropout regulator (LDR) with direct current feedback is proposed, and a symmetrically-matched voltage mirror in sensing the load current is employed, and gives excellent line and load regulations.
Abstract: A capacitor-less low dropout regulator (LDR) with direct current feedback is proposed. A symmetrically-matched voltage mirror in sensing the load current is employed, and gives excellent line and load regulations. The dynamic biasing results in an LDR with pole-tracking that extends the bandwidth of the loop gain at high load currents. The LDR was fabricated in a 0.35/spl mu/m CMOS process with an active area of 0.11mm/sup 2/, and measurement results corroborated well with both analysis and simulation.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: How OA improves interoperability among applications in an EDA flow is described and how OA benefits developers of both EDA tools and flows is detailed.
Abstract: The OpenAccess database provides a comprehensive open standard data model and robust implementation for IC design flows. This paper describes how it improves interoperability among applications in an EDA flow. It details how OA benefits developers of both EDA tools and flows. Finally, it outlines how OA is being used in the industry, at semiconductor design companies, EDA tool vendors, and universities.

Proceedings ArticleDOI
24 Jan 2006
TL;DR: Experimental results showed that the proposed memory-grouping method reduced the area of the memory BIST wrapper by up to 40.55% and the ability to select from two types of connection methods produced a greater reduction in area than using a single connection method.
Abstract: With the increasing demand for SoCs to include rich functionality, SoCs are being designed with hundreds of small memories with different sizes and frequencies. If memory BIST logics were individually added to these various memories, the area overhead would be very high. To reduce the overhead, memory BIST logic must therefore be shared. This paper proposes a memory-grouping method for memory BIST logic sharing. A memory-grouping problem is formulated and an algorithm to solve the problem is proposed. Experimental results showed that the proposed method reduced the area of the memory BIST wrapper by up to 40.55%. The results also showed that the ability to select from two types of connection methods produced a greater reduction in area than using a single connection method.