Development workflow

The development workflow of SMI applications is relatively complex, as there are several components that work together to compile your application with SMI properly. If you do not need to have a detailed control over this process, we provide a CMake function that automates most of it. If you need more fine-grained control, take a look at the detailed guide.

CMake workflow (recommended)

The CMake function makes some simplifying assumptions to be easier to use:

you have just a simple .cpp file with host code
each program in a MPMD scenario uses just one device source file If these assumptions hold, it is recommended to use the CMake workflow, using the CMake function smi_target:

smi_target(<target-name> <topology> <host-source> <device-sources> <num-ranks>)
smi_target(myprogram topology.json myprogram.cpp myprogram.cl 8)

Parameters:

target-name - name of the target created by the function
topology - JSON file containing mapping between FPGAs and their programs and connections between FPGAs (described below)
host-source - source file of the host program which configures the computation
device-sources - list of device source files, each file represents a single program (1 file = SPMD, 2+ files = MPMD)
num-ranks - number of ranks (this is used only for building emulation programs)

This function will run the whole SMI compilation workflow and output the compiled bitstream, host program and routing tables into the CMake build directory. After that you should be able to run the host program and start the computation on an FPGA.

Note that for this to work, you have to use the generated host header file in your host code. The header file, along with notes on how to use it with the CMake workflow, is described below.

By default, the maximum number of supported ranks is 8. To change this, you have to specify an optional parameter can be specified in the CMake rule. The set of supported optional parameters is the following:

smi_target(<target-name> <topology> <host-source> <device-sources> <num-ranks> <consecutive_reads> <max_num_ranks> <rendezvous>)
smi_target(myprogram topology.json myprogram.cpp myprogram.cl 8 8 16 OFF)

consecutive_reads - the number of consecutive reads to the same FIFO buffer performed by a Communication Kernels (descrived as R in the paper). By default is equal to 8;
max_num_ranks - the maximum number of ranks. This is used to generate resource efficient hardware. By default it is equal to 8;
rendezvous - boolean parameter indicating if rendezvous (ON) or eager (OFF) comminication mode must be used for point-to-point communications.

Topology description

Whether you use the CMake workflow or execute the SMI scripts manually, you always have to provide a topology description file which describes which program should be executed on which FPGAs and also describes the connections between FPGAs.

The topology description is a JSON file. Suppose that the application is composed of 4 different programs that will be executed on 4 different FPGAs interconnected between each other. The corresponding topology file might look like this:

{
    "fpgas": {
      "fpga-0001:acl0": "program_1",
      "fpga-0002:acl0": "program_2",
      "fpga-0003:acl0": "program_3",
      "fpga-0004:acl0": "program_4",
    },
    "connections": {
      "fpga-0001:acl0:ch2": "fpga-0002:acl0:ch3",
      "fpga-0002:acl0:ch1": "fpga-0003:acl0:ch0",
      "fpga-0003:acl0:ch1": "fpga-0004:acl0:ch0"
    }
}

fpgas is a dictionary that maps names of FPGAs to a program that will be executed on them. The program name is extracted from the name of the device source file that you use in codegen or the CMake script. So if you use program_1.cl, the corresponding entry in the JSON file should be program_1.

connections is a dictionary that should describe connections between FPGA channels. The format to describe a channel is <nodename>:<devicename>:<channelname>. nodename indicates the hostname of the host in which the FPGA is installed, devicename indicates the FPGA name as visible by the Intel OpenCL environment, <channelname> indicates the name of the I/O channel. For example:

"fpga-0001:acl0:ch2": "fpga-0002:acl0:ch3"

indicates that the second I/O channel of the FPGA acl0 installed in the host fpga-0001 is connected to the third channel of the FPGA acl0 installed in the host fpga-0002.

Manual workflow

High-level overview of the SMI workflow:

Device codegen

transform your device code
generate SMI implementation
generate metadata used by later steps

Compile FPGA code

your transformed device code is compiled together with the generated SMI implementation

Host codegen

create a header file with initialization functions for each used FPGA program
uses metadata created in 1)

Compile host code

your host code is compiled together with the generated host header file

Generate routing tables

routing tables are generated from a topology description file

Run the host program

the host program is provided with the routing tables and the compiled FPGA bitstream and starts the FPGA computation

Below is a detailed description of the workflow

Compilation stage (detailed usage)

SMI uses code generation to create all the necessary communication logic used by your program. The code generator requires a description of used operations in the program (ports, data types etc.). To automate the process of extracting the program metadata from the user's source code, we provide a source rewriter tool that parses your device kernel code using Clang, extracts the necessary metadata and generates a new version of your code which should be compiled together with the code generated SMI logic.

All SMI workflow commands are provided by the codegen script, which is located in codegen/main.py.

Step 1: device code generation

First, you have to run the codegen script and provide it with information about the topology and your device source code.

$ python codegen/main.py codegen-device <topology-file> <rewriter-binary> <kernel-src-dir> <kernel-bin-dir>
  <smi-output> <kernel-metadata> <user-sources>

Parameters:

topology-file path to the JSON file with FPGA topology
rewriter-binary path to the compiled source rewriter (its source code is in the directory source-rewriter)
kernel-src-dir directory containing your device code
kernel-bin-dir output directory where the rewritten user source files will be created
smi-output output file path where the SMI device code will be created
kernel-metadata output file path where the codegen metadata will be stored
user-sources list of file paths with your device code (file paths have to be relative to kernel-src-dir)

When you execute this script, it will do several things. It will copy your user-sources from kernel-src-dir to kernel-bin-dir, run the source-rewriter on them, use the metadata from the rewriter to generate the SMI implementation that will be located at smi-output and store the metadata into kernel-metadata (they will be used by later steps).

If you have multiple programs (MPMD), you have to execute this script once for each program.

After running the device codegen, you should use an FPGA compiler of your choice to compile smi-output together with your transformed source files located in kernel-bin-dir to build the bitstream which will be used when running the program.

Step 2: host code generation

After generating code for the device, you have to generate code for the host:

$ python codegen/main.py codegen-host <host-output> <kernel-metadata>

Parameters:

host-output output file path where the host file will be created
kernel-metadata list of file paths with metadata generated by device codegen (one per each program)

Running this script will create a header file at host-output that should be included by your host code. After running the host codegen, you should use a host compiler of your choice to compile your host code which should use the generated host-output.

Using the generated host header

The header file will provide an initialization function per each FPGA program that you have defined (according to the number of metadata files passed to host codegen or the number of used device source files the in CMake workflow). The function is called SMIInit_<program-name>. It should be invoked on the host side to configure all the SMI functionalities before running the FPGA application logic. The function has the following signature:

SMI_Comm SmiInit_<program-name>(
        int rank,
        int ranks_count,
        const char* program_path,
        const char* routing_dir,
        cl::Platform &platform,
        cl::Device &device,
        cl::Context &context,
        cl::Program &program,
        int fpga,
        std::vector<cl::Buffer> &buffers)

The routing_dir should be a path to a directory containing the routing tables.

CMake workflow notes

routing_dir should be set to "smi-routes". Path to the generated host file will be smi_generated_host.c, so you should put the following line into your host code: #include <smi_generated_host.c>.

Step 3: Routing table generation

To generate routing tables, you have to provide the topology file and the metadata produced by device codegen:

$ python codegen/main.py route <topology-file> <routing-dir> <kernel-metadata>

Parameters:

topology-file path to the JSON file with FPGA topology
routing-dir output directory where the routing tables will be created
kernel-metadata list of file paths with metadata generated by device codegen (one per each program)

After running the script, routing-dir should contain a routing table for each channel of each FPGA in the topology file.

Note that changing the topology file and regenerating the routing tables does not require recompilation of device nor host code.

Step 4: Executing the program

After you have the compiled FPGA bitstream(s), the routing tables and the compiled host program, you have everything necessary to run the FPGA computation. Call the SMIInit function that corresponds to the program that you want to execute and start your own device kernels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development workflow

CMake workflow (recommended)

Topology description

Manual workflow

Compilation stage (detailed usage)

Step 1: device code generation

Step 2: host code generation

Using the generated host header

CMake workflow notes

Step 3: Routing table generation

Step 4: Executing the program

Clone this wiki locally