Skip to content
AncaSC edited this page Nov 18, 2014 · 1 revision

Status: in progress

Changes related to reduce functions
Since OpenCL 1.1 has no equivalent for warp shuffle functions, the OpenCL kernels are using shared memory to implement reductions.

The reductions work on values of type float3.

One of the most frequently called reduction function is reduce_force_j_generic. Here a block of threads must perform a reduction for each of the 8 lines of an 8x8 matrix with elements of type float3. The initial implementation was assigning all the reduction work for a certain line, line j, to the thread for which get_local_id(0) returns 0.

This has been changed so that the reduction work for a line is split in three between the threads with get_local_id(0) being 0, 1 and 2. Each such thread will do the reduction for one of the 3 components of the float3 buffer.

Similar changes have been added to all the other reduction functions.

Clone this wiki locally