How to build a very simple RISC-V Processor

Disclaimers

I’m not an expert, and this was done a year ago, so I might get a lot of things wrong.

Introduction

The Task

Last year (December 2021), The final course project for “HARDWARE SYNTHESIS LAB” was to implement in FPGA an Calculator with following spec:

  • 5 decimal point precision
  • Can do Addition, Subtraction, Multiplication, Division, and Square Root
  • Use UART Serial Terminal as an UI
  • Done Individually

Solutions: Motivation for using High-level Approach

One of the most common solutions last year was to do the UI Interaction with a Finite State Machine (FSM), a communicating UART state machine, and a dedicated circuit to perform calculations using Fixed Point Decimal Representation. This method is efficient and straight-forward to implement.

However, The limitations of this approach is that translating Calculator design into an FSM is a laborious process since FSM is really low-level. This makes complex functionalities difficult. Given that this year’s final project required a Graphical Display. I’m compelled to describe another approach which was to implement the calculator in high-level language C. This give following benefits:

  • Easier to design.
  • Easier to test and debug.
  • Less worries about hardware details


This article will contain two parts: In the first part, I will briefly describe the overview of the mentioned approach, in the second part, I will describe the implementation I did last year, which was to write a Processor.

From C to Bitstream: An Overview

Our goal is to be able to write a C Program that can be run on the FPGA. The general flow is as follows: We first write a C Program, then we use C compiler and toolchain to compile the code to a specific ISA, then we store the machine code in the ROM and run it with a Soft Processor implementing the ISA. We finally synthesize all of that into a bitstream and run it on FPGA. The following diagram shows the general flow:


Design Choices

To accomplish the goal, we need to decide on a few choices. While there may appear to be many choices, choosing one choice usually constrains the design on other choices which make decisions easy.

ISA

Instruction Set Architecture specifies the rules of interpreting the machine code. I select ISA based on following properties:

Soft-Core Availability

If we do not want to implement our own processor, we must chose ISA that has free soft-core available

  • RISC-V for example, has many open-source soft-core available.
  • ARM and Xillinx’s MicroBlaze also have royalty-free cores available.

Toolchain

  • Xillinx’s MicroBlaze is integrated with Vivado’s Block Design and Vitis as an IDE for writing C
  • Other ISA like RISC-V also have GCC Toolchain for compiling C

Simplicity

  • Some ISA kind such as CISC are so complicated that it would not fit into an FPGA
  • Additionally, If we want to write our own processor, we must use an ISA that is easy to understand and implement.

Processor

When selecting processor, we may have following choices:

  • Build your Own or Use Existing One
    • Building your own Processor is Fun and Rewarding but also risks delaying the project. (Mine was submitted late by 9 hours tooking 7.5 days to work on)
    • Using an Existing Processor is cost-efficient and we can be sure that it is well-tested, but we also have to be careful to learn to use it properly.
  • Processor Features
    • Some downloaded processors are customizable. Or if we build our own then it is totally customizable.
    • But we have to be careful about implementation time and necessity
    • Some customizable aspect includes:
      • Memory Size
        • Depends on the size of your program
      • Pipelining
        • If we don’t do pipelining, then we must lower FPGA clock speed to avoid timing constraint violation
      • Other tricks like Caching, Branch Prediction, Additional Bus.
        • For a calculator, performance might not matter that much.

Peripheral

We must also interact with the outside world such as via PS/2, UART, or VGA. Some considerations:

  • Build your own or Use Existing One
    • Xillinx does provide IP Core (Intellectual Property) , some are free, some need to be purchased.
    • Or we could implement our own IP
  • Integration with Processor
    • Some processor simply assign data to peripheral using simple address, data and write-enable line
    • Some processor use standardize protocol like AXI to communicate with peripheral
    • Some have choice between Interrupt or Polling
  • Usage
    • Some have provided C libraries for interfacing, others require us to write our own code.
    • If we designed our own peripheral, then we must specify protocols for controlling it
      • We might already be familiar with how Micocontroller HAL works, which uses Control Flag register, Data Register, and if we feel fancy then we might use Interrupt or build a DMA (?).

General Implementation Guide

Once we have decided on the design, we can start implementing them, those are usually design-specific but I found following techniques useful:

  • Short Testing Loop
    • The further in the implementation process, the more expensive it is to test. So we must test as early as possible.
    • Some of the technique includes:
      • Using Testbench.
        • My code contains more testbench lines than actual module code. The testbench have almost full coverage of the code
      • Using Stub and Emulation.
        • Avoid having to test on FPGA as much as possible, use Emulator and Code Stub to allow local testing
      • Modularization
        • Keep modules small so that it can be test easily
  • Iterative Prototyping
    • Complex circuit can be hard to debug when there is a bug. Using a dummy program or prototype to verify that the system/sub-system/modules are working properly while it is being built can help localize the bug and validate assumptions.
  • Do not build an entire system then only test after it is done. It can cause debugging hell.
  • Leave plenty of time for debugging
    • Majority of the time I spent on my project is for debugging rather than design and implementation.
  • Read the documentation
    • Many hours of debugging can be saved by carefully reading documentation.

Building a simple RISC-V processor




In my final project, which has source code available here: https://github.com/saengowp/yah-riscv/. Excluding the testbench code, the total verilog code is only <1K lines.

Design

I have decided on following:

  • ISA: RISC-V
  • Processor
    • I’ve decided to build my own RISC-V processor because it is “FUN”
    • The processor design is by-the-book with no additional feature except for pipelining which should help avoid timing violation
  • Peripheral
    • I just memory-mapped UART. (Discuss later)

Architecture

The processor is a 5-stage pipelined processor consisting of Fetch, Decode, Register Read, ALU, Register Write/Memory Unit. UART I/O is memory-mapped to high address space.

The picture below shows a block diagram of the system.

Development Process

Because Vivado is heavy and slow, I use Icarus Verilog and GTK Wave to compile Verilog files and run my testbenches. Each module is developed along with their testbench which provides full-coverage.

Processor Components

Instruction Decoder

I personally find this as the most important component of them all, as this component must issue commands to all the other components in the pipeline, so it requires some careful thinking. Thankfully, RV32I only contains ~10 instructions that I need to implement.
    This is the first component I wrote in the project and dictated all the design of the other components. Writing this module is practically half of the design as we have to think about all the control signals and data path.

ALU

In this component, it must support both data arithmetic and also address arithmetic, the implementation is straightforward as we implement each operator per the specs.

Memory Unit, Register File, and Memory.

These modules are simple. Nothing to mention.

UART

This component is memory-mapped as a circular buffer for Rx and Tx along with a writeable control register for determining and commanding the Rx, Tx buffer read/write head.
    The processor must poll the control register for new data/transmission completion.

Pipeline Control

This component is a few blocks of combination logic for inserting bubble into the pipeline, it is a little bit tricky as we have to handle RaW and Control hazard.

Testing

The modules are individually tested using their testbenches. The module is then combined into a single-cycle CPU and tested with a sample program. The tests assembly from https://github.com/riscv-software-src/riscv-tests are modified and then tested inside an emulator. The system is then combined into a multi-cycle CPU and re-test again. UART is also stubbed and tested.
    Finally, The system along with the test program is written as a bitstream into an FPGA board. Hello World programs are then tested.

Programming

RISC-V has its own GCC Toolchain. To compile a C program into a Verilog Rom File. We just need:

  • _start Assembly
    • This short assembly program initialize the program stack and then jump to main program
  • Linker File
    • This file describes where to place which objects binary in the memory.

Once that is done, we can run a few GCC commands to cross-compile, link and then use ELF2HEX to turn it into a Verilog rom file. Commands are available in my repo. We then synthesized that together with the core and then used it to program the FPGA.

Some extra note on calculator

The way I represent the number in my calculator is quite inefficient. I stored it as binary and convert it to decimal when displaying it. Please see boatinw's submission E in this scoreboard  to see how primitive decimal representation and division could be implemented.

Conclusion

This article describes the benefit of using High-Level approach to program the FPGA, describe some possible design that can be used, general implementation techniques and showed an example of implementing a simple RISC-V processor.
    Designing and Implementing soft-core processors can be easy and offers many benefits by utilizing and integrating existing IPs but can also be fun (and surprisingly not take too much of a time) by rolling your own IPs. There are many creative ways to complete this project. This guide only covers some of the options I took and explains some important tricks I found.
    I hope that this article inspires your solution somewhat too! Good luck with your final project!



Comments

Popular posts from this blog

Capturing image with under-powered microcontoller: OV7670 + STM32F401