Skip to content

The programming interface

Tiziano De Matteis edited this page Jul 23, 2019 · 4 revisions

SMI exposes primitives for both point-to-point and collective communications. Communication in SMI codes is based on transient channels: when established, a streaming interface is exposed at the specified port at either end, allowing data to be streamed across the network using FIFO semantics, with an optional finite amount of buffer space at each endpoint. A streaming message consists of one or more elements with a specified data type. The communication endpoints are uniquely identified by their rank and port parameters. Ranks uniquely identify FPGA devices, and ports distinguish distinct communication endpoints within a rank. Once established, channels exist in code in the form of channel descriptors that the user can employ for performing communications. A single rank can exist per FPGA. Ranks involved in communication and the total number of ranks can then be dynamically altered without recompiling the program, by simply updating the routing configuration at each rank (see Development Workflow).

Point-to-point communications

The user can declare a send or *receive *channel by specifying the number of elements to send, the data type of the elements, the source or destination rank, the port, and the communicator. Channels are implicitly closed when the specified number of elements have been sent or received.

SMI_Channel SMI_Open_send_channel(int count, SMI_Datatype type, int destination, int port, SMI_Comm comm);
SMI_Channel SMI_Open_recv_channel(int count, SMI_Datatype type, int source, int port, SMI_Comm comm);

Analogously to MPI, communicators allow communication to be further organized into logical groups.

Please Note: In the current implementation only a global communicator is supported

To send and receive data elements from within the pipelined HLS code, SMI provides the SMI_Push and SMI_Pop primitives:

void SMI_Push(SMI_Channel* chan, void* data);
void SMI_Pop(SMI_Channel* chan, void* data);

Both functions operate on a channel descriptor of a previously opened channel, and a pointer either to the data to be sent, or to the target at which to store the data. These primitives are blocking, such that SMI_Push does not return before the the data element has been safely sent to the network, and the sender is free to modify it, and SMI_Pop returns only after the output buffer contains the newly received data element.

Additionally, the type specified by the SMI_Push/_Pop operations must match the ones defined in the Open_Channel primitives. With these primitives, communication is programmed in the same way that data is normally streamed between intra-FPGA modules.

In SMI, communication channels are characterized by an asynchronicity degree k > 0, meaning that the sender can run ahead of the receiver by up to k data elements. If the sender tries to push the k+1-th element before an element is popped by the receiver, the sender will stall. Because of this asynchronicity, an SMI send is non-local: it can be started whether or not the receiver is ready to receive, but its completion may depend on the receiver, if the message size is larger than k.

The user can define the asynchronicity degree of a channel while opening it using the functions:

SMI_Channel SMI_Open_send_channel_ad(int count, SMI_Datatype type, int destination, int port, SMI_Comm comm, int asynch_degree);
SMI_Channel SMI_Open_recv_channel_ad(int count, SMI_Datatype type, int source, int port, SMI_Comm comm, int asynch_degree);

Collectives

Collective communication is key to develop distributed applications that can scale to a large number of nodes. In collective operations, all ranks in a given communicator must be involved in communicating data. SMI defines the Broadcast, Reduce, Scatter, and Gather collective operation primitives analogous to their MPI counterparts. Each collective operation defined by SMI implies a distinct channel type, open channel operation, and communication primitive.

SMI allows multiple collective communications to execute in parallel, provided that they use separate ports.

Please Note: to prevent the compiler to allocate the channel descriptor in BRAM rather than in logic (resulting in lower performance), the channel descriptor must be declared as register-resident data. For example: SMI_ScatterChannel __attribute__((register)) chan= SMI_Open_scatter_channel(...)

The communicator object will be built on the host side of the application. On the device, the user can access to its content by using the following functions:

int SMI_Comm_size(SMI_Comm comm);
int SMI_Comm_rank(SMI_Comm comm);

The former can be used to obtain the size of the communicator. The latter returns the rank of the caller.

For all the following collectives, the asynchronicity degree can be specified in a similar way to the case of point-to-point communications.

Broadcast

To perform a Broadcast, each rank opens a broadcast-specific channel (SMI_BChannel), indicating the count and the data type of the message elements, the rank of the root, the port, and the communicator:

SMI_BChannel SMI_Open_broadcast_channel(
    int count, SMI_Datatype type, int port, int root, SMI_Comm comm);

To participate in the broadcast operation, each rank will use the associated primitive:

void SMI_Broadcast(SMI_BChannel* chan, void* data);

If the caller is the root, it will push the data towards the other ranks. Otherwise, the caller will pop data elements from the network.

Reduce

The reduce channel is opened with the respective primitive:

SMI_RChannel SMI_Open_reduce_channel(int count, SMI_Datatype data_type, SMI_Op op,  int port, int root, SMI_Comm comm)

in which is indicated the length of the message (count), the data type, the reduction operation to apply, the port number, the root rank, and the communicator. Currently, SMI support SMI_Add, SMI_Max and SMI_Min as reduce operation.

Each rank uses then the associated communication primitive:

void SMI_Reduce(SMI_RChannel *chan, volatile void* data_snd, volatile void* data_rcv)

in which data_snd is the data that must be reduced and data_rcv is the result of the reduction (valid only on the root rank).

Scatter

For opening a Scatter channel, the user can invoke:

SMI_ScatterChannel SMI_Open_scatter_channel(int send_count,  int recv_count, SMI_Datatype data_type, int port, int root, SMI_Comm comm)

in which send_count indicates the number of elements sent by the root to each rank, and recv_count represents the number of data elements received by each rank.

Communication is performed by means of the primitive:

void SMI_Scatter(SMI_ScatterChannel *chan, void* data_snd, void* data_rcv)

where data_snd is the pointer to the data elements that must be sent (root side) and data_rcv is the pointer to the memory area in which the received data element is stored.

The user must consider the asymmetric nature of scatter (root and non-root ranks send/receive different data elements), in writing her own program. For example:

    ///...
    SMI_ScatterChannel  __attribute__((register)) chan= SMI_Open_scatter_channel(N,N, SMI_INT, 0,root,comm);
    const int my_rank=SMI_Comm_rank(comm);
    const int num_ranks=SMI_Comm_size(comm);
    const int loop_bound=(my_rank==root)?N*num_ranks:N;//consider different loop bounds for the root and non_root
    for(int i=0;i<loop_bound;i++)   //perform pipelined communication
    {
        //<root prepares data to send>
        SMI_Scatter(&chan,&to_send, &to_rcv);
        //...
    }

Gather

A Gather channel is opened by using :

SMI_GatherChannel SMI_Open_gather_channel(int send_count,  int recv_count, SMI_Datatype data_type, int port, int root, SMI_Comm comm)

while the communication is performed with the primitive:

void SMI_Gather(SMI_GatherChannel *chan, void* send_data, void* rcv_data)

Similarly to Scatter, in the user program, different loop bounds must be considered to perform communication in root and non-root ranks.

Clone this wiki locally