Enhanced RISC Processor "SILVERBIRD"

Parametrizable Hybrid Stack-Register Processor as VHDL Soft-Intellectual Property Module
Students project at the Integrated Systems Laboratory of the Swiss Federal Institute of Technology, Zurich, Switzerland
Students: Peter Luethi, Daniel Forrer, Stefan Moscibroda
Assistants: Thomas Roewer, Manfred Stadler

Design, Verification & Integration: October 1999 - February 2000
Hardware Testing & Measurements: May - July 2000

Table of Contents [Toc]

Overview
Architecture
Highlights
Parameters
Test integration
Technical data
Conclusions
Acknowledgements
Publications

Overview

The design of integrated circuits is currently subject to extensive changes. Until now, project-specific code has been written for every new design. This results in highly optimized code for the target application, but also leads to increased development time, especially for large designs. Since time to market becomes more and more important, the traditional way of designing integrated circuits has to be altered to get the efficiency as before. As the complexity of circuits increases, there is urgent need for new design methodologies, which allows fast development of demanding applications up to complete system-on-a-chip integrations.
One possibility to cope with this efficiency problem is the use of so called Intellectual Property (IP) Modules or Virtual Components (VC). This method is based on the idea to put pre-defined functional blocks together to a complete system. Quick and easy adaptations on the reusable blocks speeds up system design and provides more time for thorough testing, an important issue in cost-intensive chip design.

During this students project, there should now be developed a parametrizable RISC processor in VHDL from scratch. The processor has to be parametrizable in a wide range and has to manage medium to high interrupt loads without any problems. A convenient testbench environment is also required to obtain the ability for quick implementation of the processor IP in a system-on-a-chip (SOC) application.

Architecture

First we had to figure out which pipeline depth our processor should obtain. The answer of this question is always crucial for both overall processor performance and implementation complexity. To keep the complexity within reasonable bounds and allow for fast interrupt launch with minimum latency, we finally decided to choose a classic four stage pipeline consisting of Instruction Fetch, Instruction Decode, Execute and WriteBack stage.

To meet the demanding requirements of managing high interrupt loads and being parametrizable, we have decided to combine the advantages of a stack architecture with the ones of a register-based approach. Therefore, the general purpose registers of our processor are implemented as top of stack registers (red area). In case of an interrupt, precious processing time for context switch can be saved by just pushing the current register contents on the stack. The maximum interrupt latency achieved by our architecture is two clock cycles, but in most situations, we are able to launch the interrupt service routine within one clock cycle. To obtain maximum processor performance, neither the pipeline is ever flushed nor any no-operation cycles are performed. On branchs, we use delayed-branch execution of one instruction for not wasting any cycles.
A striking argument against a pure stack processor was the need for compiler-compatibility: A compiler for a stack architecture is difficult to implement, because it always needs to trace the exact position of each register. As a consequence, the entire stack has to be controlled by software ("push" & "pop" instructions).

Architecture III Our solution provides a fix amount of general purpose registers for every interrupt level (red area). The whole stack control is done by the processor itself and requires no software-based "push" and "pop" operations. This organization is easy to support by a high-level compiler since it does not have to control the stack at all.
The processor IP Module provides also the ability to parametrize the presence or absence of an additional address ALU allowing for simplified and faster block access to data located in the memory (blue area I respectively II).

One slight disadvantage of our architecture is the large chip area taken by the stacks. But this can be omitted by implementing an interface from the top of stack registers to an on-chip RAM and putting the main part of the stack contents to the RAM. It will result in more control logic and maybe in lower performance, unless the user builds a complex control logic to cope with the slow RAM. This way to save chip area is only preferable on large parameterizations.
On the other hand, decreasing costs for chip area and significant increasing complexity of systems rectify this minor disadvantage. Because time to market plays a more and more important role, engineers need to focus on straightforward engineering and clean designs.

Highlights

Highly parametrizable hybrid stack-register processor IP module
Parametrizable RISC instruction set (40 instructions)
4 stage pipeline: Instruction Fetch, Instruction Decode, Execute, WriteBack
No pipeline flush at any time (neither jump instructions nor interrupt launch)
Read-after-Write sequences are allowed on any registers or memory locations
Parametrizable Perl assembler
Possibility to easily implement high-level language compiler due to register-bank-like random-access registers
Verification flow, which takes automatically the current parameter setting. Software can already be verified during implementation time.
Register bank implemented as Top-of-Stack:
- quick context saving during interrupt launch with "push" / "pop" (managed by the processor hardware)
- very fast interrupt launch: max. 2 Tclk (without memory R/W stall) = 16.46 ns @ 121.5 MHz
The processor is designed to manage medium to high interrupt loads easily

Parameters

Data width & data memory address range
Instruction memory address range
Number of data & address registers
Data/address stack & return address stack depth
Deactivation of instructions not needed
Parametrizable data memory interface based on a FIFO buffer to keep off performance killing memory operations from the processor. Allows processor burst writes on the data memory without processor stalls until FIFO buffer is full.

Functional regions of processor visualized in Silicon Ensemble

	Instruction Fetch Stage & Return Address Stack
	Instruction Decode Stage
	Execute Stage & Condition Code Stack
	WriteBack Stage
	Data & Address Stack
	Data Memory Interface

Visualization of Functional Regions
Screenshot of our RISC Processor and its various Blocks in Silicon Ensemble.
Picture courtesy of Peter Luethi

Ready for Tape Out: Screenshot in Cadence DFII

Screenshot taken from Cadence DFII
In the middle of each side are the core power supply pads arranged,
other power supply pads around the die are for peripheral power (output pads).
The advantage of splitted power supply allows for spike free core power supply
and provides the ability to measure only the core power consumption.
Picture courtesy of Peter Luethi

Test Integration

We have decided to implement the data memory as on-chip static RAM (SRAM) and to leave the instruction memory off-chip to get the ability to verify the maximum processor speed on the tester. The data width has been set to 16 bit, the on-chip data memory has a size of 1024 x 16 bit.
The entire physical testability has been met with 11 scan pathes through the processor core (full-scan) and the complete isolation and extern accessibility of the embedded SRAM. For the SRAM, we have written an extern alternating chess pattern test program to check the correct physical integration.
The whole back-end design downto the final seal-ring has also been carried out by ourself: As a consequence we got a thorough understanding of the back-end design-flow and back-annotation with Silicon Ensemble 5.3, Pearl and Cadence DesignFramework II.

Chip Photography of our "SILVERBIRD" RISC Processor

Chip Photography of our "SILVERBIRD" RISC Processor
You can see the bonding wires attached to each pad. Bonding wires connect the
pads of the die to the package leads.
Photo courtesy of Peter Luethi


*Package of our "SILVERBIRD" Processor* Ceramic Pin Grid Array (CPGA) with 120 pins

Technical Data

Last updated referring to latest measurements : 14th July 2000

Process

0.6 um 3 LM CMOS Process

5 Volts, Austria Micro Systems AMS HK 3.20

Configuration
Instruction Memory:	off-chip	Number of Data Registers:	12
Data Memory:	on-chip	Number of Address Registers:	4
Instruction Memory Address Width:	11 bit	Stack Depth:	4
Data Width:	16 bit	Return Address Stack Depth:	20
Data Memory Address Width:	10 bit	on-chip DMem (SRAM):	1024 x 16 bit

Performance (5 V, 25°C ambient temperature)
Max. Operating Frequency:	121.5 MHz
Max. Throughput:	121.5 MIPS
Interrupt Latency:	max. 2 Tclk = 16.46 ns @ 121.5 MHz
Core Power Consumption:	288 mA @ 121.5 MHz, 5 V
	19.7 mA @ 21.9 MHz, 1.9 V
Power / MIPS:	11.85 mW / MIPS @ 121.5 MHz, 5 V
	1.7 mW / MIPS @ 21.9 MHz, 1.9 V

Dimensions
Chip Size:	4.6 x 4.6 mm
Chip Area:	21.16 mm²
Core Size:	3.7 x 3.7 mm
Core Area:	13.69 mm²
Number of Pins:	120 including power supply
Package:	120 pin CPGA

Statistics
Number of Standard Cells:	8712
Number of Transistors (without SRAM):	104'657
Estimated Number of Transistors with SRAM:	~ 230'000
Standard Cell Usage:	792 Cells / mm²
Transistor Usage (without SRAM):	10'208 Transistors / mm²

Voltage - Operating Frequency - Shmoo Plot
The x-axis indicates the applied supply voltage, the y-axis the applied operating frequency (clock period [ns] ):
Minimum supply voltage: 1.87 Volt, corresponding operating frequency: 22 MHz
Maximum operating frequency at 5 Volt: 121.5 MHz
Picture courtesy of Peter Luethi

Conclusions

We have developed an embedded processor IP-Module which is highly adaptable in both functionality and configuration. This was achieved by separating the processor core and the system interfaces. The hybrid stack-register processor is excellently suited for applications with high interrupt loads. There is a convenient verification flow covering automatically the configuration of the processor.

Our team has been successfully introduced to the entire design-flow (system engineering, front-end design, back-end design & back-annotation) of an ASIC with this huge student project. We had to cope with the difficulties of system engineering, project scheduling and the complexity of the design tools. The callenge to design a perfect solution and the possibility to realize such a project already during our study encouraged us keeping up our hard work even during the christmas holidays. As a consequence, we could finally present an outstanding result and obtained the highest grade for it.

"It's not a disgrace, if you can't achieve a perfect solution. But it is, if you don't even try."

Acknowledgements

Special thanks to:

	My team members: Daniel Forrer and Stefan Moscibroda. Was a great project beside a huge amount of work. Thanks, guys!
	Our assistants: Thomas Roewer and Manfred Stadler of the Integrated Systems Laboratory, ETH Zurich
	Integrated Systems Laboratory of the Swiss Federal Institute of Technology, Zurich, Switzerland and KTI (Swiss Commission for Technology and Innovations) for funding this project.

Publications

Parametrizable Hybrid Stack-Register Processor as Soft Intellectual Property Module

Paper about our "SILVERBIRD" Processor Soft IP-Module
written for the 13th Annual IEEE International ASIC/SOC Conference 2000,
Washington D.C., Virginia USA, 13th - 16th September 2000

Last updated: 2006/01/15

[Toc] [Top]