Why is your AI chip design always one step behind?

Humanity’s exploration of artificial intelligence (AI) has never stopped.

Since the 1980s, the emergence of multi-layer neural networks and back-propagation algorithms has ignited a new spark in the artificial intelligence industry. In 2016, Alpha Go defeated a Korean professional Go player with nine dan, marking another wave of artificial intelligence. advent. At this stage, the field of artificial intelligence has fully blossomed.

The history of AI chip development

The rise of artificial intelligence has brought new opportunities to the semiconductor industry, and the semiconductor market has undergone earth-shaking changes. However, if artificial intelligence is to be transplanted to terminals such as smartphones, Internet of Vehicles, and IoT, higher requirements are placed on the computing power and energy consumption of hardware. Take mobile hardware as an example, to complete these operations, it must meet the requirements of high speed and low power consumption at the same time.

In response to these needs, AI core computing chips have also undergone four major changes.

Before 2007, artificial intelligence research and application experienced several ups and downs, and it has not developed into a mature industry; at the same time, limited by factors such as algorithms and data at that time, artificial intelligence did not have a particularly strong demand for chips at this stage, and general-purpose The CPU chip can provide enough computing power.

With the development of high-definition video, games and other industries, GPU products have made rapid breakthroughs; at the same time, it has been found that the parallel computing characteristics of GPUs just meet the requirements of artificial intelligence algorithms for big data parallel computing. For example, GPUs are more efficient than traditional CPUs in deep learning algorithms. The efficiency can be improved by 9 to 72 times in operation, so start to try to use GPU for artificial intelligence calculation.

Since 2010, cloud computing has been widely promoted, and artificial intelligence researchers can use a large number of CPUs and GPUs to perform hybrid computing through cloud computing. In fact, the main computing platform for artificial intelligence today is cloud computing. However, the requirements of the artificial intelligence industry for computing power are rapidly increasing. Therefore, after entering 2015, the industry began to develop special chips for artificial intelligence. Through better hardware and chip architecture, the computing efficiency, energy consumption ratio and other performance have been further improved. .

The cornerstone of AI SoCs

As traditional architectures have been found to be inefficient for AI SoCs, system specifications require more and more architectural exploration to optimize designs to increase the throughput of neural network processing. The arrival of the FinFET era has prompted product architects and system-on-chip (SoC) engineers to take a closer look at the efficiency of the calculations performed in each clock cycle.

More and more companies are offering sophisticated neural network architectures, but the runtimes of these complex functions are also increasing the temperature on the silicon and tightening power budgets. In addition, due to the changing nature of RTL code, the rapidly evolving architecture makes delivery schedules even tighter. Faced with the dual challenges of power consumption and time-to-market, developing a full-chip layout that fits in the same die area and performs at the desired throughput level in mission mode is no easy task.

Designers need to address the power, performance, and area (PPA) goals of high-performance artificial intelligence (AI) SoCs at the component level using the building blocks that make up computing circuits. These blocks of Boolean logic and memory storage elements are called base IP.

The most popular deep learning technique today is the deep neural network (DNN), which underlies many modern AI applications. Since DNNs showed breakthrough results in speech recognition and image recognition tasks, the number of applications using DNNs has exploded. These DNN methods are widely used in driverless cars, cancer detection, game AI, etc. DNNs currently surpass humans in accuracy in many domains.

But when implementing DNN, if you choose the wrong way, it may bring big trouble to the project schedule. Therefore, designing with foundational IP that provides flexibility for process correction during the design cycle is essential for a successful product launch.

Synopsys’ foundational IP portfolio includes HPC Design Kits. The kit is a collection of logic library cells and memory co-optimized with EDA tools on advanced nodes, designed to push the PPA limits of any design, and optimized for AI-enabled designs.

It is important to know that the most important advantage of using a basic IP solution from an EDA vendor is interoperability. This means that designers can use the scripts included with the IP to carry out the work-channel cleanup process on the most cutting-edge process nodes without wasting time on synergies.

In addition to supplying a broad portfolio of silicon-proven products for ideal PPA goals, Synopsys also supports custom services for individual design needs, making its business more flexible than any other offering.

How to deal with AI SoC design challenges?

As AI SoCs continue to grow in complexity, the design process of optimizing, testing, and benchmarking SoC performance requires tools, services, and/or expertise to optimize AI systems, in addition to the ease of implementing the underlying building blocks. Nurturing the design through customization and optimization during the design process ultimately determines the success of the SoC in the market.

Designers cannot get the ideal high-performance, market-leading AI solutions by relying only on traditional design processes. They must consider a wider range of semiconductor solutions.

In terms of specialized processing power, SoCs incorporating neural network capabilities must accommodate both heterogeneous and massively parallel matrix multiplication operations. Heterogeneous components require scalar, vector DSP and neural network algorithm capabilities.

In terms of storage performance, AI models use a lot of storage, which increases the cost of silicon. Training a neural network may require several GB to 10 GB of data, which requires the use of the latest DDR technology to meet capacity requirements.

In terms of real-time data connectivity, once the AI ​​model is trained and possibly compressed, real-time data can be executed through many different interface IP solutions.

At the same time, although replicating the human brain is a long way off, the human brain has been used as an effective model for building artificial intelligence systems and continues to be modeled by leading research institutions around the world.

The SoC development process is constantly changing, but essentially includes the following standard elements: system specification and architectural design; logic and functional circuit design; physical design, verification, and analysis; manufacturing, packaging, and testing; and release silicon verification. Adding AI capabilities may increase the complexity of each link. The integrated IP clearly defines some theoretical caps of capability, but optimizing the design can bring the implementation results closer to the theoretical maximum.

The memory access and processing capabilities of traditional SoC architectures are not sufficient. Adding efficient matrix multiply accelerators or high-bandwidth memory interfaces alone does help, but is not enough to become a market leader in AI, which reinforces the idea of ​​AI-specific optimizations during system design.

As traditional architectures have been found to be inefficient for AI SoCs, system specifications require more and more architectural exploration to optimize the design. And because the traditional architecture is considered to be less efficient, there is a greater need to provide architecture services.

In addition, generations of AI SoCs are undergoing a makeover, leveraging experienced design teams for optimization and customization. Deep learning algorithms include many stored weights, ideally stored in on-chip SRAM to save power and processing effort, and there is a clear trend toward customizing SRAM compilers to optimize power and density.

Custom processors are one of the most popular IP developments for new AI SoC solutions. The tools to design custom processors are invaluable, both to ensure that gate-level optimizations are fully utilized and reused, and to not fall behind the ecosystem needed to support custom processors.

Developing an AI SoC requires some of the most innovative IP on the market. These include rapid adoption of new technologies such as HBM2e, PCIe5, CCIX, and the latest MIPI. To adopt these standard technologies, designers need advanced simulation and prototyping solutions that support early software development and performance verification. These tools are often used to implement AI, again due to the immaturity and complexity of the design.

The pre-built AI SoC verification environment should only be used by those with AI SoC development experience. As a result, design services and companies designing second- and subsequent-generation chipsets have an inherent advantage over first movers in terms of time-to-market. Designers can rely on design services as an effective way to leverage AI SoC expertise, reducing time-to-market and freeing up in-house design teams to focus on differentiating features.

Hardening service for interface IP is another optimization tool that enables lower power and smaller area designs. Hardened IP makes room for the SoC, providing valuable on-chip SRAM and processor components for higher AI performance.

IP selected for integration as AI functions enter new markets provides key components for AI SoCs. Synopsys offers many specialized solutions, including memory interface IP, multi-port on-chip SRAM compilers, and a complete portfolio of interface options for real-time data, all three IP solutions are key components for next-generation AI designs.

Summarize

This competitive environment creates opportunities for differentiation and system optimization as architectures rapidly evolve and refine into more specific application scenarios. System and IP configuration alternatives need to be selected through architectural modeling to optimize AI system-on-chip (SoC) designs to rapidly form competitive solutions. Synopsys’ IP portfolio saves chip designers time, allowing them to focus on designing differentiated functions.

  

The Links:   MCC720-18 GP2500-SC41-24V

Related Posts