Overview

This page describes the system architecture of the OPRECOMP kW platform.

The goal is to develop one computation node of an HPC system, based on a high-performance processing node (IBM POWER 8/9) and an array of interconnected controllable precision processors. The OPRECOMP kW demonstrator platform provides an evaluation framework to investigate scalability and functionality of the software stack. This platform is one node of an HPC system with transprecision capabilities.

The kwPilot demonstrator is developed in two stages. In the first phase, the system is based on existing POWER8 - CAPI (Coherent Accelerator Processor Interface) attached FPGA prototyping system developed by IBM. The POWER8 processor is used for precise calculations, and an interconnected array of PULP based processing units with controllable precision is implemented on the FPGA.

In the second phase, we have upgraded the node to a POWER9, with CAPI 2.0 connection to an advanced version of PULP, on a large-scale FPGA. This system is used to validate the approach.

The actual power and performance numbers will be extracted through measurements on a dedicated ASIC that will be manufactured containing an array of the PULP based transprecision computation cores. Finally, the results on the kW system will be evaluated and they will be further projected up to a potential multi-node MW architecture.

The first version of our kw demonstrator, namely, the kW-Pilot 1.0 platform, has been developed within the first phase specifications, i.e. POWER8 - CAPI1.0 attached FPGA prototyping platform.

The final version of our kw demonstrator, namely, the kW-Pilot 2.0 platform, has been developed within the final phase specifications, i.e. POWER9 - CAPI2.0 attached FPGA prototyping platform.

The development of the OPRECOMP kw-pilot is systematically developed in three pillars, namely the kw-pilot Hardware Development Kit (kwHDK), the kw-pilot Software Development Kit (kwSDK) and the kw-pilot Cloud Development Kit (kwCDK)

  • The kw-pilot Hardware Development Kit (HDK) considers transprecision devices attached to a POWER8/9 node via CAPI1.0/CAPI2.0 or OpenCAPI3.0. We have experimented with inserting transprecision computing capabilities in different granularity levels. Specifically the transprecision devices can be either PULP cores or standalone acceleratos. In the former case the transprecision computing capabilities have been introduced at the instruction level, whereas at the latter case, we have designed customized hardware accelerators suppporting different precision across the computation progress.

  • The kw-pilot Software Development Kit (SDK) describes the programming environment that enables the mapping of applications to this kw-pilot platform.

  • The kw-pilot Cloud Development Kit (CDK) integrates the kW-Pilot platform to a cloud infrastructure of IBM Zurich Lab.

The kwHDK, the kwSDK and the kwCDK are supported by different design tools, libraries and APIs as depicted in the following figure.

../_images/kw_tools_table.png

Fine-Grain Transprecision Computing

In the fine-grain transprecision aspect of our kW-Pilot platform, we define a high-performance processing node (IBM POWER 8/9) and an array of interconnected controllable precision PULP processors (https://pulp-platform.org/). The development of the OPRECOMP kw-pilot is systematically developed in three pillars, namely the kw-pilot Hardware Development Kit (kwHDK), the kw-pilot Software Development Kit (kwSDK) and the kw-pilot Cloud Development Kit (kwCDK), as depicted in the following figure.

../_images/fig1.png
  • The kw-pilot Hardware Development Kit (HDK.

  • The kw-pilot Software Development Kit (SDK) describes the programming environment that enables the mapping of applications to this kw-pilot platform.

  • The kw-pilot Cloud Development Kit (CDK) integrates the kW-Pilot platform to a cloud infrastructure of IBM Zurich Lab.

kwHDK

The repository with the approrpiate files for kwHDK is PowerPULP.

git clone git@iis-git.ee.ethz.ch:oprecomp/powerpulp.git

This repository contains PowerPULP, a PULP system attached to a IBM POWER8/9 machine via the CAPI interface. We currently support the following combinations:

  • board ad8k5, design xcku115, part xcku115-flva1517-2-e

  • board adku3, design xcku060, part xcku060-ffva1156-2-e

Getting Started

This project uses bender [1] for dependency management (kind of), so make sure that is installed if you want to change IPs. To get started, do the following:

Set the following environment variables to something reasonable for your setup:

  • export VSIM="vsim-10.7b -64"

  • export VLOG="vlog-10.7b -64"

  • export VCOM="vcom-10.7b -64"

  • export VOPT="vopt-10.7b -64"

  • export VIVADO="vivado-2019.1.1 vivado"

Flow

To synthesize and implement the design for an FPGA, proceed as follows:

  1. generate the IPs needed for capi_adapter

  2. synthesize ``capi_adapter`</del>`

  3. synthesize ``pulp_cluster`</del>`

  4. synthesize ``pulp_soc`</del>`

  5. synthesize ``pulp_top`</del>`

  6. synthesize and implement the entire board

Refer to scripts/end_to_end.sh for a full end-to-end example. The outbut *.bit and *.bin file can then be used to flash the corresponding FPGA card.

Simulation

The module hierarchy is structured as follows:

  • card top-level (see capi repository)

  • pulp_top

  • hero_powerpulp

  • pulp_cluster (multiple)

  • riscv_core (multiple)

The following testbenches are available:

  • tb_pslse to simulate pulp_top together with the CAPI interface

  • tb_soc to simulate hero_powerpulp, where the host memory is emulated as a regular AXI memory

To compile Xilinx sim libraries (needed only onece), run the script compile_simlibs.sh. To perform a simulation of the PULP system with emulated host, in VSIM source the script scripts/vsim/compile.tcl first, and then scripts/vsim/run_soc.tcl after changing the binary to load in that script as desired.

In order to run a full simulation with a running pslse on your machine, make sure to have run the scripts/vsim/compile.tcl script in your working directory first. Then, you’ll need three terminals to do the following:

  1. First, start the RTL simulation with scripts/vsim/run_pslse.sh (or scripts/vsim/run_pslse.tcl if you’re already in vsim), and run it (run -all). This will open a server on a local port that a pslse instance can connect to. Simulation time will not advance here unless started through this port.

  2. In a second terminal, launch pslse which will connect to the simulator and in turn offer the pslse server on another port. See instructions on building pslse in the capi repository.

  3. Lastly, run your host application in a third

FPGA Bitstreams

This is a summary table of FPGA implmentations for the kw-Pilot, during the course of the OPRECOMP H2020 project.

Host

Transprecision Device

Comm/tion

FPGA

Description

Publication

S822LC-P8

PULP Overlay

CAPI 1.0

AD8K5

1cluster 8cores

COOL CHIPS 2020

AC922-P9

Accelerator Overlay

CAPI 2.0

AD9V3

BLSTM

FPT 2018, COOL CHIPS 2018

IC922-P9

Accelerator Overlay

CAPI 2.0

AD9V3

Hello world

N/A

IC922-P9

Accelerator Overlay

CAPI 2.0

AD9V3

Scalar Mult/tion

ASAP 2019

IC922-P9

Accelerator Overlay

CAPI 2.0

AD9V3

High Order Stencils

SAMOS 2019

IC922-P9

Accelerator Overlay

OpenCAPI 3.0

AD9V3

PHRYCTORIA

RAW-IPDPS 2020

IC922-P9

DDR4 Controller

AXI 4

HTG-830

DDR4-Controller

N/A

Note: Bitstreams can be generated by following the implementation flow provided in the corresponding repository.

kwSDK

The source code for the kW demonstrator software and extensions to the virtual platform is maintained inside OPRECOMP’s repository.

It consists of the following parts:

  • boot: The bootloader that is embedded in the accelerator on a boot ROM. Loads a binary from host memory and executes it.

  • libcxl: Simulation implementation of libcxl to interface with the virtual platform.

  • liboprecomp: Library to offload PULP binaries onto the accelerator in conjunction with the bootloader.

Note that these are automatically included with the PULP SDK. So if you’re using the SDK, the above is already available.

Samples

The samples directory contains examples on how to write software on the POWER8 and PULP side and how to establish communication via CAPI. See the samples’ Makefiles for instructions on how to link against libcxl and liboprecomp. The following samples are available:

  • samples/nop: Sends a WED to the PULP side which in turn prints it to the console.

  • samples/square: Generates a list of random integers and sends them to PULP to calculate their squares, then checks the result. This example includes DMA transfers and the setup of a proper Work Element Descriptor.

Use make run in any of the sample directories to build the host and PULP binaries, and run them in the virtual platform. See the Makefiles for details.

Setup on POWER8

On a fresh POWER8 system, do the following to get a minimal setup that allows you to offload code to a PULP system on an FPGA attached to the machine.

# install libcxl
git clone https://github.com/ibm-capi/libcxl.git
pushd libcxl
git checkout v1.6
make
sudo make install
popd

# install liboprecomp
pushd liboprecomp
make
sudo make install
popd

Coarse-Grain Transprecision Computing

In the coarse-grain transprecision aspect of our kW-Pilot platform, we define a high-performance processing node (IBM POWER 8/9) and an array of interconnected controllable precision accelerators. We have extended the open-source OC-Accel framework of OpenPOWER consortium with transprecision capabilities. The OpenCAPI Transprecision Acceleration Framework, abbreviated as OC-Accel-TP, is a framework that helps you implement your FPGA acceleration solutions with OpenCAPI technology. The framework is a fork of OC-Accel Framework, that also includes some examples with transprecision capabilities.

https://github.com/oprecomp/oc-accel-tp

In this case, we define:

  • The kw-pilot Hardware Development Kit (HDK

  • The kw-pilot Software Development Kit (SDK) describes the programming environment that enables the mapping of applications to this kw-pilot platform.

  • The kw-pilot Cloud Development Kit (CDK) integrates the kW-Pilot platform to a cloud infrastructure of IBM Zurich Lab.