GEM5 O3 CPU Backend

This is my note on reading GEM5’s O3 cpu backend. I could not find a good document online, and the code is a little bit entangled and tricky to understand. So here I would extract the key function chain to show how an instruction is handled by the backend.

Hopefully this could help more people. I assume you are already familiar with GEM5.

Compute Instructions

Compute instructions are simpler as they do not access memory and not interact with the LSQ. It is actually pretty straightforward and here is a high-level description. I first show the calling chain (only important functions), and then describe its functionality.

Rename::tick()->Rename::RenameInsts()
IEW::tick()->IEW::dispatchInsts()
IEW::tick()->InstructionQueue::scheduleReadyInsts()
IEW::tick()->IEW::executeInsts()
IEW::tick()->IEW::writebackInsts()
Commit::tick()->Commit::commitInsts()->Commit::commitHead()
  • Rename (Rename::renameInsts()) As suggested by the name, registers are renamed and the instruction is pushed to IEW stage. It checks that IQ/LSQ can hold the new instruction.
  • Dispatch (IEW::dispatchInsts()) This function inserts the renamed instruction into the IQ and LSQ.
  • Schedule (InstructionQueue::scheduleReadyInsts()) The IQ manages ready instructions (operands ready) in a ready list, and schedule them to available FU. The latency of FU is charged here, and instructions are sent to execution when FU done.
  • Execute (IEW::executeInsts()) Here we invoke the execute() function of the compute instruction and send them to commit. Notice execute() will write results to the destiniation register.
  • Writeback (IEW::writebackInsts()) Here we invoke InstructionQueue::wakeDependents(), and dependent instructions will be added to the ready list for scheduling.
  • Commit (Commit::commitInsts()) Once the instruction reaches the head of ROB, it will be committed and released from ROB.

Load Instruction

Load instructions shares the same path as compute instructions until execution.

IEW::tick()->IEW::executeInsts()
  ->LSQUnit::executeLoad()
    ->StaticInst::initiateAcc()
      ->LSQ::pushRequest()
        ->LSQUnit::read()
          ->LSQRequest::buildPackets()
          ->LSQRequest::sendPacketToCache()
    ->LSQUnit::checkViolation()
DcachePort::recvTimingResp()->LSQRequest::recvTimingResp()
  ->LSQUnit::completeDataAccess()
    ->LSQUnit::writeback()
      ->StaticInst::completeAcc()
      ->IEW::instToCommit()
IEW::tick()->IEW::writebackInsts()
  • LSQUnit::executeLoad() will initiate the access by invoking the instruction’s initiateAcc(). Through the execution context interface, initiateAcc() will call initiateMemRead() and eventually be directed to LSQ::pushRequest().
  • LSQ::pushRequest() will allocate a LSQRequest to track all states. It will also start translation. If the translation finished, it will remember the virtual address and invoke LSQUnit::read().
  • LSQUnit::read() will check if the load is aliased with any previous store.
    • If can forward, then it schedule WritebackEvent for next cycle.
    • If aliased but cannot forward, it calls InstructionQueue::rescheduleMemInst() and LSQReuqest::discard().
    • Otherwise, it send packets to cache.
  • LSQUnit::writeback() will invoke StaticInst::completeAcc(), which will eventually write loaded value to destination register. Then the instruction is pushed to commit queue, so that IEW::writebackInsts() will mark it done and wake up its dependents. Starting from here it shares same path with compute instructions.

Store Instruction

Store instructions are similar to load instructions, but only writeback to cache after committed.

IEW::tick()->IEW::executeInsts()
  ->LSQUnit::executeStore()
    ->StaticInst::initiateAcc()
      ->LSQ::pushRequest()
        ->LSQUnit::write()
    ->LSQUnit::checkViolation()
Commit::tick()->Commit::commitInsts()->Commit::commitHead()
IEW::tick()->LSQUnit::commitStores()
IEW::tick()->LSQUnit::writebackStores()
  ->LSQRequest::buildPackets()
  ->LSQRequest::sendPacketToCache()
  ->LSQUnit::storePostSend()
DcachePort::recvTimingResp()->LSQRequest::recvTimingResp()
  ->LSQUnit::completeDataAccess()
    ->LSQUnit::completeStore()
  • Unlike LSQUnit::read(), LSQUnit::write() will only copy the store data, but not send packet to cache, as the store is not committed.
  • After the store is committed, LSQUnit::commitStores() will mark the SQ entry as canWB, so that later LSQUnit::writebackStores() will send the store request to cache.
  • Finally, when the response comes back, LSQUnit::completeStore() will release the SQ entries.

Atomic Instruction

Atomic instructions are similar to store instructions, but they are executed non-speculatively.

Rename::tick()->Rename::RenameInsts()
IEW::tick()->IEW::dispatchInsts()
  ->LSQUnit::insertStore()
  ->InstructionQueue::insertNonSpec()
    ->MemDepUnit::insertNonSpec()
Commit::tick()->Commit::commitInsts()->Commit::commitHead()
IEW::tick()->InstructionQueue::scheduleNonSpec()
  ->MemDepUnit::nonSpecInstReady()
    ->MemDepUnit::moveToReady()
      ->InstructionQueue::addReadyMemInst()
IEW::tick()->InstructionQueue::scheduleReadyInsts()
IEW::tick()->IEW::executeInsts()
  ->LSQUnit::executeStore()
    ->StaticInst::initiateAcc()
      ->LSQ::pushRequest()
        ->LSQUnit::write()
    ->LSQUnit::SQEntry::canWB() = true
    ->LSQUnit::checkViolation()
IEW::tick()->LSQUnit::writebackStores()
  ->LSQRequest::buildPackets()
  ->LSQRequest::sendPacketToCache()
  ->LSQUnit::storePostSend()
DcachePort::recvTimingResp()->LSQRequest::recvTimingResp()
  ->LSQUnit::completeDataAccess()
    ->LSQUnit::writeback()
      ->IEW::instToCommit()
    ->LSQUnit::completeStore()
Commit::tick()->Commit::commitInsts()->Commit::commitHead()
  • When dispatching, the atomic instruction is inserted into the IQ and marked non-speculative. Compared to normal InstructionQueue::insert(), InstructionQueue::insertNonSpec() won’t call addIfReady(), thus not scheduling the instruction.
  • When the atomic instruction reaches the ROB head, the commit stage checks if the instruction has been executed. If not, it sets the nonSpecSeqNum and clears its canCommit flag. Now the IEW stage knows that it can schedule the instruction.
  • When executing the atomic instruction, the SQ entry is immediatly marked canWB(), as it’s already the head of ROB.
  • When the response comes back, the final value will be written back to register, and now the instruction can finally commit.

Branch Misspeculation

Branch misspeculation is handled in the IEW::executeInsts(). It will notify the commit stage to start squashing all instructions in the ROB until the misspeculated branch.

IEW::tick()->IEW::executeInsts()->IEW::squashDueToBranch()

Memory Order Misspeculation

The InstructionQueue has a MemDepUnit to track memory order dependence. The IQ will not schedule an instruction if MemDepUnit says there is no more dependence.

In LSQUnit::read(), the LSQ will search for possible aliasing store and forward if possible. Otherwise, the load is blocked and rescheduled when the blocking store completes, by notifying the MemDepUnit.

Both LSQUnit::executeLoad/Store() will call LSQUnit::checkViolation() to search in the LQ for possible misspeculation. If found, it will set LSQUnit::memDepViolator and later IEW::executeInsts() will start to squash.

IEW::tick()->IEW::executeInsts()
  ->LSQUnit::executeLoad()
    ->StaticInst::initiateAcc()
    ->LSQUnit::checkViolation()
  ->IEW::squashDueToMemOrder()