### SoC-Platforms & Design Methods for SoC Project and Labs

#### COE838: Systems on Chip Design http://www.ecb.torontomu.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ecb.torontomu.ca/~gnkhan Elect., Computer & Biomedical Engineering Toronto Metropolitan University

#### **Overview**

- SoC Platforms --Xilinx Zynq-7000 SoC platform.
- Intel (Altera) Cyclone-V SoC platforms, DE1-SoC.
- DE1-SoC Bootup Process and Embedded OS.
- Bootup Process and Device Tree.
- Root File System and Yocto Pre-built Packages.

SoC Platforms, DE1-SoC Manuals and Material from Chapter 3 of the Text by M.J. Flynn

## SoC Platforms

- SoC Platforms can be of two categories:
  - Soc FPGA
  - Commercial and Specific SoC platforms employed by Apple, Samsung, Siemens, etc.
- SoC FPGA Platforms:
  - Zynq-7000 by Xilinx
  - Cyclone-V and Arria-V platforms by Intel (Altera)

## Main HPS/FPGA Systems





#### Intel (Altera) DE1-SoC Cyclone V Arria V, etc

#### Xilinx Zedboard Zynq 7000 Zynq 7XXX etc



Zynq SoC

## What we'll be covering

- HPS/FPGA Systems
  - Intel Altera DE1-SoC



- ARM Cortex-A9 and Cyclone V FPGA
- Interfacing, emulating, simulating real HW/SW systems
- Integrating IPs in your system as accelerators
- Creating software to execute on your CPU and offloading to accelerators

## Intel (Altera) SoC Family FPGA

#### **SoC Family**

| Family           | KLE    | Blo<br>Mem<br>Bits ( | nory                                        | Var. Pree<br>Multiplie<br>Blocks |                 | δA     | HPS<br>Dedicate<br>I/Os | ed                                                    | Max.<br>Transceiver:<br>(GP)   | s Max. D                    | ansceiver<br>Data Rate<br>Dps) | SoC Hard<br>Memory<br>Controller | FPGA Hard<br>Memory<br>Controllers  |
|------------------|--------|----------------------|---------------------------------------------|----------------------------------|-----------------|--------|-------------------------|-------------------------------------------------------|--------------------------------|-----------------------------|--------------------------------|----------------------------------|-------------------------------------|
| Cyclone V<br>SoC | 25     | 1.4                  |                                             | 36                               | 14              | 45 181 |                         |                                                       | 6                              |                             | 3                              | 1                                | 1                                   |
|                  | 40     | 2.7                  |                                             | 58                               | 14              | 145    |                         |                                                       | 6                              | 3                           |                                | 1                                | 1                                   |
|                  | 85     | 4.0                  |                                             | 87                               | 28              | 288    |                         |                                                       | 9                              | 5                           |                                | 1                                | 1                                   |
|                  | 110    | 5.6                  |                                             | 112                              | 12 288          |        | 181                     |                                                       | 9                              | 5                           |                                | 1                                | 1                                   |
| Arria V          | 350    | 17.                  | .3                                          | 809                              | 52              | 8      | 208                     |                                                       | 30 / 16                        | 6                           | / 10                           | 1                                | 3                                   |
| SoC              | 460    | 22                   | .8                                          | 1,068                            | 52              | 8      | 208                     |                                                       | 30 / 16                        | 6                           | / 10                           | 1                                | 3                                   |
|                  |        |                      | Non-Transceiver Devices<br>(FPGA User I/Os) |                                  |                 |        | 25                      | Transceiver Devices<br>(FPGA User I/Os, Transceivers) |                                |                             |                                |                                  |                                     |
| Family           | KL     | E                    | U484-<br>19x1                               |                                  | 572-WB<br>23x23 |        | 96-WB<br>1x31           |                                                       | U672-WB<br>23x23<br>/O, 3G/5G) | F896-V<br>31x3<br>(I/O, 3G/ | 1                              | F896-FC<br>31x31<br>0, 6G, 10G)  | F1152-FC<br>35x35<br>(I/O, 6G, 10G) |
|                  | 25     |                      | 66                                          |                                  | 145             |        | -                       |                                                       | 145, 6                         | _                           |                                | -                                | -                                   |
| Cyclone V<br>SoC | 40     |                      | 66 1                                        |                                  | 145             | 45     |                         |                                                       | 145, 6                         | -                           |                                | -                                | -                                   |
|                  | 85     |                      | 66                                          |                                  | 145             | 45 2   |                         |                                                       | 145, 6                         | 288, 9                      | 9                              | -                                | -                                   |
|                  | 110 66 |                      |                                             | 145                              |                 | 288    |                         | 145, 6                                                | 288, 9                         |                             | -                              | -                                |                                     |
| Arria V<br>SoC   | 350    |                      | -                                           |                                  | -               | -      |                         |                                                       | -                              | -                           | 1                              | 170, 12, 4                       | 350, 18, 8                          |
|                  | 460 -  |                      | _                                           |                                  | -               |        | -                       |                                                       | -                              | _                           | 1                              | 170, 12, 4                       | 350, 18, 8                          |
| HPS I/O          |        |                      | 161                                         |                                  | 181             |        | 181                     |                                                       | 181                            | 181                         |                                | 208                              | 208                                 |

### Cyclone-V SoC FPGA

#### Cyclone-V FPGA family is based on 1.1-V, 28nm technology High performance designs and SoC prototyping

|                                             | 5CSXC2 | 5CSXC4 | 5CSXC5  | 5CSXC6  |  |  |  |
|---------------------------------------------|--------|--------|---------|---------|--|--|--|
| ALMs                                        | 9,434  | 15,094 | 32,075  | 41,509  |  |  |  |
| LEs (K)                                     | 25     | 40     | 85      | 110     |  |  |  |
| Registers                                   | 37,736 | 60,376 | 128,300 | 166,036 |  |  |  |
| M10K memory blocks                          | 140    | 224    | 397     | 514     |  |  |  |
| M10K memory (Kb)                            | 1,400  | 2,240  | 3,972   | 5,140   |  |  |  |
| MLAB memory (Kb)                            | 138    | 220    | 480     | 621     |  |  |  |
| Variable-precision DSP blocks               | 36     | 58     | 87      | 112     |  |  |  |
| 18 x 18 multipliers                         | 72     | 116    | 174     | 224     |  |  |  |
| Processor cores (ARM Cortex-A9)             | Dual   | Dual   | Dual    | Dual    |  |  |  |
| Global clock networks                       | 16     |        |         |         |  |  |  |
| PLLs <sup>2</sup> (FPGA)                    | 4      | 5      | 6       | 6       |  |  |  |
| PLLs <sup>2</sup> (HPS)                     | 3      | 3      | 3       | 3       |  |  |  |
| Transceiver count (3.125 Gbps)              | 6      | 6      | 9       | 9       |  |  |  |
| PCIe hard IP blocks (Gen1 x4)               | 2      | 2      | 2       | 2       |  |  |  |
| GPIOs (FPGA)                                | 145    | 145    | 288     | 288     |  |  |  |
| GPIOs (HPS)                                 | 188    | 188    | 188     | 188     |  |  |  |
| Hard memory controllers <sup>4</sup> (FPGA) | 1      | 1      | 1       | 1       |  |  |  |
| Hard memory controllers <sup>4</sup> (HPS)  | 1      | 1      | 1       | 1       |  |  |  |

## Arria-V SX SoC FPGA

#### Arria-V FPGA family is based on 1.1V, 28nm It can prototype/implement ARM Cortex A9 CPU based SoCs

| 5ASXB3  | 5ASXB5                                                                              |  |  |
|---------|-------------------------------------------------------------------------------------|--|--|
| 132,075 | 174,340                                                                             |  |  |
| 350     | 462                                                                                 |  |  |
| 528,300 | 697,360                                                                             |  |  |
| 1,729   | 2,282                                                                               |  |  |
| 17,288  | 22,820                                                                              |  |  |
| 2,014   | 2,658                                                                               |  |  |
| 809     | 1,068                                                                               |  |  |
| 1,618   | 2,186                                                                               |  |  |
| Dual    | Dual                                                                                |  |  |
| 16      | 5                                                                                   |  |  |
| 10      | 14                                                                                  |  |  |
| 3       | 3                                                                                   |  |  |
|         | 132,075<br>350<br>528,300<br>1,729<br>17,288<br>2,014<br>809<br>1,618<br>Dual<br>10 |  |  |

### **Cyclone-V SoC Platform**

Cyclone-V SoCs reduce system power, cost, and board size while increasing system performance by integrating discrete processor, FPGA, and DSP functions into a single, user customizable ARM-based (SoC)



\*Optional Configuration

https://www.terasic.com.tw/cgi-

bin/page/archive.pl?Language=English&CategoryNo=165&No=836#contents

# DE1-SoC

- Cyclone-V SoC 5CSEMA5F31 Device
- Dual-core ARM Cortex-A9 (HPS)
- 85K programmable logic elements
- 4,450Kbits embedded memory
- 6 fractional PLLs
- 2 hard memory controllers
- Quad serial configuration device EPCQ256
- On-board USB-Blaster II
- 64MB (32Mx16) SDRAM on FPGA
- 1GB (2x256Mx16) DDR3 SDRAM on HPS
- Micro SD card socket on HPS



### **Cyclone-V HPS**

#### 800MHz/1.05 GHz, dual-core ARM Cortex-A9 processor including

- L1 Inst. & data caches (32 KB each), FPU and NEON media engine.
- 512 KB of shared L2 cache with **error correction code** (ECC) support
- 64 KB of scratch RAM with ECC support
- Multiport SDRAM controller with support for DDR2, DDR3, etc.
- QSPI and NAND flash controller with DMA
- SD/SDIO/MMC controller with DMA
- 2x 10/100/1000 Ethernet media access control (MAC) with DMA
- 2x USB On-The-Go (OTG) controller with DMA
- 4x I<sup>2</sup>C controller and 2x UART
- 2x (SPI) master peripherals, 2x SPI slave peripherals
- Up to 134 general-purpose I/O (GPIO)
- 7x general-purpose timers and 4 watchdog timers
- Warm and Cold reset

#### Inside the SoC





### **HPS-to-FPGA Interconnect**

#### HPS and the FPGA can operate independently. Tightly coupled via AMBA bridges:

- IP bus masters in FPGA fabric have access to HPS bus slaves.
- HPS bus masters have access to bus slaves in FPGA fabric.
- Both bridges are AMBA AXI-3 compliant and support simultaneous read and write transactions.
- Additional 32-bit light-weight HPS-to-FPGA bridge with low latency i/f between HPS and peripherals in FPGA fabric.
- Six FPGA masters can share the HPS SDRAM controller with ARM Cortex A9 processor.
- A9 processor can be used to configure FPGA fabric under program control via a dedicated 32-bit port.

### **HPS-to-FPGA Interconnect**

- HPS-to-FPGA: configurable 32-, 64-, or 128-bit AMBA AXI interface optimized for high bandwidth
- FPGA-to-HPS: configurable 32-, 64-, or 128-bit AMBA AXI interface optimized for high bandwidth
- Lightweight HPS-to-FPGA: 32-bit AMBA AXI interface optimized for low latency
- FPGA-to-HPS SDRAM controller: Configurable multi-port interfaces with 6 command ports, 4x 64-bit read data ports and 4x 64-bit write data ports
- 32-bit FPGA configuration manager

### Cyclone V HPS Memory Map



## AXI / HPS-FPGA Interconnects



HPS-to-FPGA: Configurable 32, 64 or 128bit AMBA AXI interface optimized for high bandwidth.

FPGA-to-HPS: --- " ----Opposite direction (Mainly used for ACP: Accelerator Coherency Port) which is an AXI 64-bit slave port.

Light weight HPS-to-FPGA: 32-bit AMBA AXI interface optimized for low latency.

# HPS – FPGA Bridges



- To modify data and clock signals to support transportation (protocols, clocking, etc.) between components



| Region Name             | Description                                    | Base Address | Size  |
|-------------------------|------------------------------------------------|--------------|-------|
| FPGA Slaves             | For accessing FPGA slaves connected to the h2f | 0xC0000000   | 960MB |
|                         | bridge                                         |              |       |
| HPS Peripherals         | Accessing slaves directly connected to the HPS | 0xFC000000   | 64MB  |
| Lightweight FPGA Slaves | Accessing slaves connected to the lwh2f bridge | 0xFF200000   | 2MB   |

### Lab 3: DE1-SoC Tutorial



## Lab3 – DE1-SoC Tutorial

Creating a HPS/FPGA which controls 7-seg and LEDs on FPGA through software on HPS



https://www.youtube.com/watch?v=BHg5CjTeftY

## Intel (Altera) DE1-SoC & Embedded OS



- Intel (Altera) has their own tools that generate files required for the OS and its boot process
- Caters to the underlying hardware which you generate with Qsys.

## SD Card Partitions and Contents



• The SD card contains data pertaining to the operating system and underlying architecture

Type 83 is Linux data partition.

- If viewing from Windows machine. It can not view many of these files
  - May be FAT32 files only

- 5 necessary elements for SoC based Linux (embedded OS)
- 1. Pre-loader
- 2. Device Tree
- 3. Boot-loader
- 4. Kernel
- 5. Root File System





 Assuming SoC off, and microSD contains OS related environment that was flashed to the card.



microSD Card



- When powered on, **Boot ROM** (BR) moves pre-loader from SD card to onchip memory
- BR sets up minimal configuration
- BR is not user accessible
- Specifies where to access pre-loader





- On-chip mem on Cyclone V = 64Kb
- Once **boot ROM** has finished, turns execution over to **Pre-loader**









microSD Card





#### microSD Card

- Look at SD Card layout
  - FPGA Configuration
- It is possible to get the HPS to configure your FPGA at boot up using the FPGA configuration file
  - i.e., the bit-stream on the SD
- Use u-boot script to point to necessary locations



4 replicas of the pre-loader Done since SD cards are known for having bad sectors. We want to guarantee a successful bootup!

Indicates how and where the OS is located on SD, so it can be booted by the system and loaded to its storage

# Toolchain

- Set of software development tools "chained" together by specific stages
  - i.e. GCC toolchain
  - Tools include compiler (gcc, g++), libraries (glibc), debugger (dgb) etc
- If we compile from our host computer for HPS:
  Armv7 or aarch32 toolchain



Connects application to system HW



#### microSD Card

- 1. Preloader
- 2. Device Tree
- 3. Bootloader
- 4. Kernel
- 5. Root File System

Bitbake or use pre-built targets, toolchain selection

## Device Tree

- Data structure describing all the hardware devices that the Linux kernel has access to in the system
- At runtime, allows kernel to enable necessary drivers to access the devices



## Device Tree

- Allows us to have **one** kernel for a system, don't need to regenerate it every time your HW changes
  - Qsys project always adding HW, etc.
  - Just recompile new device tree that kernel may read any time a change is made

## Device Tree

 Qsys generates .sopcinfo file – that has the info of all devices attached to HPS, used to develop a Device Tree Blob (dtb)



#### Root File System

- File system where root directory is located
- File system where all other files systems are mounted
- Built with bitbake with custom parameters (recipes and layers) or use a pre-built RFS

# Bitbake Terms (using Yocto)

- **Recipe** bitbake ingredients and cooking time
  - What version and source codes to use for building (RFS, kernel)
  - Dependencies, targets ("apps") to be built by bitbake (make, config, ls, etc)
- Layer Set of recipes created by a group for a particular purpose
  - Specifies what core (ARM8) and kernel
  - Altera created meta-altera type for FPGA

# Yocto Pre-Built Packages

#### Source code package

- SoC Linux kernel source code
- SoC U-Boot source code
- Root file system source code
- Linaro GCC tool chain
- The Yocto Project
- Installation Script

#### Binary (pre-built) package

- kernel
- Device tree blob
- Root file system
- Complete microSD image

#### What is not provided?

 All of the packages known by Yocto (1500+)



Intel Altera SoC EDS: Embedded Development Suite

#### SoC EDS =

#### DS-5 or Higher + Compiler Toolchain + SOC hw libs + HW/SW interface + embedded command shell

## Custom Logic Bindings

- 1. Custom Logic can be a master or slave
- 2. NOIS CPU (Soft IP) is usually the Avalon-MM Master.
- 3. Any custom I/O can be the Avalon-MM Slave.



#### **Avalon-MM Slave Interface**

| Name          | Width                 | Direction | Comments                        |
|---------------|-----------------------|-----------|---------------------------------|
| avs_address   | 64 bits               | Input     | Address of slave being accessed |
| avs_read      | 1 bit                 | Input     | Read operation requested        |
| avs_write     | 1 bit                 | Input     | Write operation requested       |
| avs_readdata  | 8, 16, 32, or 64 bits | Output    | Data read from slave            |
| avs_writedata | 8, 16, 32, or 64 bits | Input     | Data to be written to slave     |

#### Write Waveforms:



### **Avalon Bus Module**

The Avalon bus module (an Avalon bus) is a unit of active logic, which takes the place of passive, metal bus lines on a physical PCB. Suitable as an onchip BUS.



# **Avalon Bus**

- Avalon bus is on-chip bus architecture that accommodate the SOC environment.
- The interface to peripherals is synchronous with the Avalon clock.
- Multiplexers (not tri-state buffers) in the bus determine which signals drive which peripheral. Peripheral is not required to tri-state its output.
- Address, data and control signals use separate, dedicated ports. It simplifies the design of peripherals as they don't need to decode address and data bus cycles as well as disable its outputs when it is not selected.

# **Slave Arbitrator**

#### Avalon bus module contains one slave arbitrator for each shared slave port. Slave arbitrator performs the following.

- Defines control, address, and data paths from multiple master ports to the slave port and specifies the arbitration mechanism to use when multiple masters contend for a slave at the same time.
- At any given time, selects which master port has access to the slave port and forces all other contending masters (if any) to wait, based on the arbitration assignments.
- Controls the slave port based on address, data & control signals presented by the currently selected master port.

#### **Avalon-MM Slave Interface – Hardware Side**



#### **Avalon-MM Slave Interface – Software Side**



#### **Putting it all Together**



### MD5 SoC Project



# **Avalon Bus Module**

The Avalon bus module (an Avalon bus) is a unit of active logic, which takes the place of passive, metal bus lines on a physical PCB. Suitable as an onchip BUS.



## **Avalon Bus**

- Avalon bus is on-chip bus architecture that accommodate the SOC environment.
- The interface to peripherals is synchronous with the Avalon clock.
- Multiplexers (not tri-state buffers) inside the bus determine which signals drive which peripheral. Peripherals are never required to tri-state their outputs.
- Address, data and control signals use separate, dedicated ports. It simplifies the design of peripherals as they don't need to decode address and data bus cycles as well as disable its outputs when it is not selected.

# **Slave Arbitrator**

Avalon bus module contains one slave arbitrator for each shared slave port. Slave arbitrator performs the following.

- Defines control, address, and data paths from multiple master ports to the slave port and specifies the arbitration mechanism to use when multiple masters contend for a slave at the same time.
- At any given time, selects which master port has access to the slave port and forces all other contending masters (if any) to wait, based on the arbitration assignments.
- Controls the slave port based on address, data & control signals presented by the currently selected master port.