About efficiency

Efficiency

>Moore’s law has improved the performance and cost of processors to incredible levels and this process continues. Program execution times and hardware cost hardly bother us anymore.

For the new smart devices we are now more bothered by their batteries – their cost, size, weight, and the time they last before they need to be replaced or recharged. We now want processors that can do the work needed but consume less energy while doing it. Energy efficiency is now the most important thing also for the big server processors; there the problem is to get them to do as much work as possible without generating more heat than can be cooled away.

What can make a processor more efficient than others?

Using a more advanced CMOS generation or a low-power CMOS technology variant is not a good answer, since any processor can be realized in any technology, and since we are now approaching the limit for energy density. A quote from Wikipedia (look for “CMOS”) describes this:

  “Earlier, the power consumption of CMOS devices was not the major concern while designing chips. Factors like speed and area dominated the design parameters. As the CMOS technology moved below sub-micron levels the power consumption per unit area of the chip has risen tremendously”.

More interesting is therefore what architectural differences can do.

The word “architecture” is used for the definition of the processor as seen by a programmer, i.e. its repertoire of instruction types, with the functional specification of each one. This also defines the set of registers and other resources where data is temporarily stored within the machine, and the operations that can be performed. “Architecture” can also refer to a deeper description, including features not directly visible to the programmer, such as execution pipeline and cache memory.

Active energy efficiency

A processor core consists entirely of transistors, and each one of these is, ideally, either fully “on” (no voltage) or fully “off” (no current). Energy is consumed during the short transition times, when transistors are changing between these states, which they can do once per cycle of the clock. The consumption is therefore proportional to the clock frequency and to the average number of transistors that change state during a clock cycle.

The architecture of a RISC processor (such as ARM and MIPS) is based on a set of rules aimed at maximizing the frequency of possible simple operations. Such processors have instructions that each occupy a wide word in program memory, and that only produce one simple (single-cycle) operation. Differences in instruction set architecture don’t matter much for these processors; the net amount of work they perform per instruction bit is about the same, due to the limitations caused by the rules

A non-RISC processor, on the other hand, can have an architecture that achieves dramatically better efficiency, because it can have instructions that don’t require as many bits to read from memory and that each perform more, perhaps a sequence involving several operations and memory accesses for data. Neither instruction width or execution time need to be the same for its instruction types.

The Imsys processor has more than five times higher code density (for compiled C language code) than ARM “Thumb” has, according to independent benchmarks. Thus, to accomplish a certain task, the processor typically reads less than 1/5 as many bits from memory. Not only are instructions more efficiently coded (fewer wasted bits per instruction), the number of instructions is also lower, because the instruction set is richer, closer to what the compiler needs. The number of operations is reduced because the stack-oriented architecture eliminates the need for many register moves. The operations also typically involve fewer bits, due to a narrower datapath. The net result is reduced energy consumption in processor and memory for the execution of a given program.

Passive energy efficiency

Standby consumption is often important in battery-operated devices. This consumption is due to leakage current, which is independent of activity. Instead it is proportional to the silicon area of the processor core.

The Imsys core is considerably smaller than an ARM9 core, a RISC processor that has performance, e.g. for Java, at about the same level. The ratio is roughly 1/5 (depends on ARM9xx variant).

Thus, the Imsys processor has an architectural advantage also for passive energy efficiency.

Heavily used special functions

Special functions are often responsible for a big part of a processor’s time. If they can be optimized by reduction of such overhead activity that is due to inefficiency of the architecture, then the processor can work faster – and consume less energy for a given task.

Traditionally optimization has aimed only at increasing speed. Typically such optimization (cache memories, increased pipeline depth, speculative execution,…) have increased energy consumption. Addition of hardware blocks for special functions can reduce active consumption but instead increase passive consumption.

The Imsys processor can be optimized for any kind of special functions, by a more flexible control of its ordinary datapath resources, without the use of inefficient redundant resources or activities. This is because its internal operation is controlled by a microprogram, inside the processor core. Most of it is in very dense and energy efficient read-only memory.

Such energy efficient, and cost efficient, optimization is used for Java bytecode interpretation, but also for floating-point and big number arithmetic, graphics, data compression/decompression, audio and video processing, and high-speed I/O interfaces such as Ethernet Media Access Control.