Vortex Microarchitecture

Vortex GPGPU Execution Model

Vortex uses the SIMT (Single Instruction, Multiple Threads) execution model with a single warp issued per cycle.

Threads
- Smallest unit of computation
- Each thread has its own register file (32 int + 32 fp registers)
- Threads execute in parallel
Warps
- A logical clster of threads
- Each thread in a warp execute the same instruction
  - The PC is shared; maintain thread mask for Writeback
- Warp's execution is time-multiplexed at log steps
  - Ex. warp 0 executes at cycle 0, warp 1 executes at cycle 1

Vortex RISC-V ISA Extension

Thread Mask Control
- Control the number of warps to activate during execution
- TMC count: activate count threads
Warp Scheduling
- Control the number of warps to activate during execution
- WSPAWN count, addr: activate count warps and jump to addr location
Control-Flow Divergence
- Control threads to activate when a branch diverges
  - SPLIT predicate: apply 'taken' predicate thread mask adn save 'not-taken' into IPDOM stack
  - JOIN: restore 'not-taken' thread mask
Warp Synchronization
- BAR id, count: stall warps entering barrier id until count is reached

Vortex Pipeline/Datapath

Vortex has a 5-stage pipeline: FI | ID | Issue | EX | WB.

Fetch
- Warp Scheduler
  - Track stalled & active warps, resolve branches and barriers, maintain split/join IPDOM stack
- Instruction Cache
  - Retrieve instruction from cache, issue I-cache requests/responses
Decode
- Decode fetched instructions, notify warp scheduler when the following instructions are decoded:
  - Branch, tmc, split/join, wspawn
- Precompute used_regs mask (needed for Issue stage)
Issue
- Scheduling
  - In-order issue (operands/execute unit ready), out-of-order commit
- IBuffer
  - Store fetched instructions, separate queues per-warp, selects next warp through round-robin scheduling
- Scoreboard
  - Track in-use registers
- GPRs (General-Purpose Registers) stage
  - Fetch issued instruction operands and send operands to execute unit
Execute
- ALU Unit
  - Single-cycle operations (+,-,>>,<<,&,|,^), Branch instructions (Share ALU resources)
- MULDIV Unit
  - Multiplier - done in 2 cycles
  - Divider - division and remainder, done in 32 cycles
    - Implements serial alogrithm (Stalls the pipeline)
- FPU Unit
  - Multi-cycle operations, uses FPnew Library on ASIC, uses hard DSPs on FPGA
- CSR Unit
  - Store constant status registers - device caps, FPU status flags, performance counters
  - Handle external CSR requests (requests from host CPU)
- LSU Unit
  - Handle load/store operations, issue D-cache requests, handle D-cache responses
  - Commit load responses - saves storage, Scoreboard tracks completion
- GPGPU Unit
  - Handle GPGPU instructions
    - TMC, WSPAWN, SPLIT, BAR
  - JOIN is handled by Warp Scheduler (upon SPLIT response)
Commit
- Commit
  - Update CSR flags, update performance counters
- Writeback
  - Write result back to GPRs, notify Scoreboard (release in-use register), select candidate instruction (ALU unit has highest priority)
Clustering
- Group mulitple cores into clusters (optionally share L2 cache)
- Group multiple clusters (optionally share L3 cache)
- Configurable at build time
- Default configuration:
  - #Clusters = 1
  - #Cores = 4
  - #Warps = 4
  - #Threads = 4
FPGA AFU Interface
- Manage CPU-GPU comunication
  - Query devices caps, load kernel instructions and resource buffers, start kernel execution, read destination buffers
- Local Memory - GPU access to local DRAM
- Reserved I/O addresses - redirect to host CPU, console output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

microarchitecture.md

microarchitecture.md

Vortex Microarchitecture

Vortex GPGPU Execution Model

Vortex RISC-V ISA Extension

Vortex Pipeline/Datapath

Files

microarchitecture.md

Latest commit

History

microarchitecture.md

File metadata and controls

Vortex Microarchitecture

Vortex GPGPU Execution Model

Vortex RISC-V ISA Extension

Vortex Pipeline/Datapath