In new system-on-chip (SoC) designs, especially for portable devices, optimization of overall system power consumption is becoming as important as performance and area optimization. Some EDA tools have features such as clock gating, bucking, frequency throttling, and leakage current reduction, and some chip manufacturers offer low-power libraries and processes, all of which are very time-consuming; in the best case, provide twice the Performance improvements, as these improvements are made at the back end of the design cycle.
The best time for power optimization is at the very beginning of the design cycle, at the system level where the architecture is determined. Determining system-level architecture, such as the number and size of local memories and caches, has a significant impact on power consumption. Optimizing at the beginning of the design cycle can reduce power consumption by more than tenfold.
Tensilica’s Xenergy is the industry’s first software tool for evaluating the impact of power consumption on the entire processor subsystem (processor, cache and local memory) based on the actual execution of application code on the subsystem. This method of estimating power consumption at the beginning of the design cycle takes only a few minutes, while the RTL-based power analysis method takes hours or even days. SoC architects can use this data to power-optimize software programs and Tensilica’s Xtensa processors.For users of Tensilica’s Diamond Standard processors, the tool helps with software optimization, but the Diamond Standard
The processor cannot be changed.
Processor and memory power optimization
The Xenergy tool includes a binary code software and related information, the former representing the target processor and the latter representing the manufacturing process and operating conditions. The Xenergy tool uses Tensilica’s instruction set simulator ISS to execute binary code software, the output is a processor core and memory power and energy reports, including dynamic power, leakage power, total power, and core and tight coupling local memory power consumption. Designers can modify program software or Xtensa configuration hardware to optimize processor power distribution and re-run the Xenergy tool. The whole process is shown in Figure 1.
Designers can use the Xenergy tool to perform two basic tasks. One is to reduce the number of memory accesses by modifying the application software, thereby reducing the power consumption of the processor and memory; the other is to adjust the Xtensa configurable processor and its associated memory, by selecting different configuration options, increasing instruction extensions, register files , new execution components, and changes in the number and size of local memory and caches.
The focus of consideration is the energy consumption of the entire system, which can be deceptive in some cases. The power consumption of the entire system is the product of the power consumption factor (mW/MHz) and the number of clock cycles (milliseconds) required to execute a certain load. If a new instruction is added to the Xtensa configurable processor, although the power consumption factor is increased, the overall number of clock cycles is reduced. For example, a 20% increase in power consumption factor results in a 3x faster program execution. In this case, the overall system power consumption is actually reduced by 60%.
Figure 1: Xenergy energy estimation software enables power consumption estimation for applications running on Tensilica Xtensa Configurable processors or Diamond Standard processors.
working principle
For various processor configurations and different manufacturing processes, Tensilica generates memory access (read and write) power consumption statistical models and power consumption statistical models for each instruction through detailed synthesis, RTL code design, and gate-level simulation. Xenergy tools use these models, even with designer-defined instructions written in the Tensilica Instruction Extensions (TIE) language.
The Xenergy tool uses a memory access power statistical model and an instruction power statistical model, including designer-defined extension instructions written in the Tensilica instruction extension language. The development of these statistical models includes detailed synthesis, RTL code design, and gate-level simulation for various processor configurations and different manufacturing processes.
The Xenergy tool uses Tensilica’s instruction set simulator ISS to simulate the application with clock cycle accuracy. After simulation, comprehensive statistics of each instruction execution and each memory access can be given. For the processor and memory used, the Xenergy tool can estimate the corresponding dynamic power, leakage power and total power consumption.
Memory and application code effects
Some TIE instructions can improve application performance, but greatly increase the number of memory accesses, which in turn increases system power consumption. Also, updates to the cache (capacity and associativity) contribute to power optimization. The Xenergy program can help designers understand the impact of changes across the processor that are caused by different memories during processor configuration.
Similarly, Xenergy tools can help developers modify application code to reduce processor and memory power consumption. For example, refactoring data structures in an application can reduce the number of memory accesses. By using Tensilica’s standard software tools, developers can discover improvements to their applications. Using the Xenergy tool, developers can discover ways to reduce system power consumption by modifying program code.
one example
We use the RGB to YUV color conversion typical program from EEMBC (Embedded Microprocessor Typical Program Consortium, see website www.eembc.org) to illustrate how the Xenergy tool would be used in a practical application.
We can also use Tensilica’s XPRES (Xtensa Processor Extension Synthesis) compiler, which uses application software written in C or C++ as input data and produces processor extensions expressed in the TIE language. Three extended instruction sets can be generated for the Xtensa processor through the XPRES compiler.
1. The XPRES compiler is required to generate TIE instructions to complete instruction operation fusion, that is, to fuse multiple operators into a single complex operation.
2. We can then ask XPRES to also generate SIMD (Single Instruction Multiple Data) functional units (and corresponding instructions) to perform vector operations, i.e. apply the same operator to multiple data elements.
3. Finally, we asked the XPRES compiler to extend the Xtensa processor to a VLIW (Very Long Instruction Word) architecture and take advantage of Tensilica’s FLIX (Fixed Length Instruction Extensions) technology. The XPRES compiler utilizes VLIW instructions to build multi-issue datapaths, which can include multiple operations.
Figure 2: Performance, power, and area correspondences after scaling for different Xtensa processors.
The three Xtensa configuration results are shown in Figure 2.Cycles and performance are equivalent, as determined by the color conversion application executing on the instruction set emulator ISS
Certainly. The gate count can be estimated by Tensilica’s TIE compiler. All other data are generated by the Xenergy tool.
Figure 2 shows the performance improvement for SIMD operations and fused operations generated by the XPRES compiler, about 3.8 times, and about 5 times the number of gates. The correspondence between processor and memory power consumption and performance is fairly good. Also shown is a performance improvement of about 20% when the XPRES compiler generates the VLIW (FLIX) architecture. However, the gate count is doubled and the processor power consumption is poor.
Therefore, the performance improvement due to SIMD operation will result in lower power consumption, and the system power consumption/energy will increase due to the increase in chip area (gate count). This is the best optimization case.
The above examples illustrate that the Xenergy evaluation tool is an indispensable software tool for SoC designers when making tradeoffs between performance, area, and power consumption.
Summary of this article
Tensilica’s Xenergy tool software provides SoC designers with an advance estimate of the total energy consumed by the processor and memory subsystems running certain applications. Designers can immediately see the overall system power consumption after changing the Xtensa configuration and TIE instruction code. This is especially important for designers who use Xtensa processors instead of RTL to design SoC datapaths. By using custom TIE instructions, the power consumption of the system can be assessed early, which helps to correctly assess system power consumption, area and performance.
The Links: 6MBP50NA060 SKKH330/16E