A2.3 log

Status: in progress

Changes related to reduce functions

Since OpenCL 1.1 has no equivalent for warp shuffle functions, the OpenCL kernels are using shared memory to implement reductions.

The reductions work on values of type float3.

One of the most frequently called reduction function is reduce_force_j_generic. Here a block of threads must perform a reduction for each of the 8 lines of an 8x8 matrix with elements of type float3. The initial implementation was assigning all the reduction work for a certain line, line j, to the thread for which get_local_id(0) returns 0.

This has been changed so that the reduction work for a line is split in three between the threads with get_local_id(0) being 0, 1 and 2. Each such thread will do the reduction for one of the 3 components of the float3 buffer.

Similar changes have been added to all the other reduction functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A2.3 log

Status: in progress

Clone this wiki locally