GEM5 O3 CPU Backend
This is my note on reading GEM5’s O3 cpu backend. I could not find a good document online, and the code is a little bit entangled and tricky to understand. So here I would extract the key function chain to show how an instruction is handled by the backend.
Hopefully this could help more people. I assume you are already familiar with GEM5.
Compute Instructions
Compute instructions are simpler as they do not access memory and not interact with the LSQ. It is actually pretty straightforward and here is a high-level description. I first show the calling chain (only important functions), and then describe its functionality.
Rename::tick()->Rename::RenameInsts()
IEW::tick()->IEW::dispatchInsts()
IEW::tick()->InstructionQueue::scheduleReadyInsts()
IEW::tick()->IEW::executeInsts()
IEW::tick()->IEW::writebackInsts()
Commit::tick()->Commit::commitInsts()->Commit::commitHead()
- Rename (
Rename::renameInsts()
) As suggested by the name, registers are renamed and the instruction is pushed to IEW stage. It checks that IQ/LSQ can hold the new instruction. - Dispatch (
IEW::dispatchInsts()
) This function inserts the renamed instruction into the IQ and LSQ. - Schedule (
InstructionQueue::scheduleReadyInsts()
) The IQ manages ready instructions (operands ready) in a ready list, and schedule them to available FU. The latency of FU is charged here, and instructions are sent to execution when FU done. - Execute (
IEW::executeInsts()
) Here we invoke theexecute()
function of the compute instruction and send them to commit. Noticeexecute()
will write results to the destiniation register. - Writeback (
IEW::writebackInsts()
) Here we invokeInstructionQueue::wakeDependents()
, and dependent instructions will be added to the ready list for scheduling. - Commit (
Commit::commitInsts()
) Once the instruction reaches the head of ROB, it will be committed and released from ROB.
Load Instruction
Load instructions shares the same path as compute instructions until execution.
IEW::tick()->IEW::executeInsts()
->LSQUnit::executeLoad()
->StaticInst::initiateAcc()
->LSQ::pushRequest()
->LSQUnit::read()
->LSQRequest::buildPackets()
->LSQRequest::sendPacketToCache()
->LSQUnit::checkViolation()
DcachePort::recvTimingResp()->LSQRequest::recvTimingResp()
->LSQUnit::completeDataAccess()
->LSQUnit::writeback()
->StaticInst::completeAcc()
->IEW::instToCommit()
IEW::tick()->IEW::writebackInsts()
LSQUnit::executeLoad()
will initiate the access by invoking the instruction’sinitiateAcc()
. Through the execution context interface,initiateAcc()
will callinitiateMemRead()
and eventually be directed toLSQ::pushRequest()
.LSQ::pushRequest()
will allocate aLSQRequest
to track all states. It will also start translation. If the translation finished, it will remember the virtual address and invokeLSQUnit::read()
.LSQUnit::read()
will check if the load is aliased with any previous store.- If can forward, then it schedule
WritebackEvent
for next cycle. - If aliased but cannot forward, it calls
InstructionQueue::rescheduleMemInst()
andLSQReuqest::discard()
. - Otherwise, it send packets to cache.
- If can forward, then it schedule
LSQUnit::writeback()
will invokeStaticInst::completeAcc()
, which will eventually write loaded value to destination register. Then the instruction is pushed to commit queue, so thatIEW::writebackInsts()
will mark it done and wake up its dependents. Starting from here it shares same path with compute instructions.
Store Instruction
Store instructions are similar to load instructions, but only writeback to cache after committed.
IEW::tick()->IEW::executeInsts()
->LSQUnit::executeStore()
->StaticInst::initiateAcc()
->LSQ::pushRequest()
->LSQUnit::write()
->LSQUnit::checkViolation()
Commit::tick()->Commit::commitInsts()->Commit::commitHead()
IEW::tick()->LSQUnit::commitStores()
IEW::tick()->LSQUnit::writebackStores()
->LSQRequest::buildPackets()
->LSQRequest::sendPacketToCache()
->LSQUnit::storePostSend()
DcachePort::recvTimingResp()->LSQRequest::recvTimingResp()
->LSQUnit::completeDataAccess()
->LSQUnit::completeStore()
- Unlike
LSQUnit::read()
,LSQUnit::write()
will only copy the store data, but not send packet to cache, as the store is not committed. - After the store is committed,
LSQUnit::commitStores()
will mark the SQ entry ascanWB
, so that laterLSQUnit::writebackStores()
will send the store request to cache. - Finally, when the response comes back,
LSQUnit::completeStore()
will release the SQ entries.
Atomic Instruction
Atomic instructions are similar to store instructions, but they are executed non-speculatively.
Rename::tick()->Rename::RenameInsts()
IEW::tick()->IEW::dispatchInsts()
->LSQUnit::insertStore()
->InstructionQueue::insertNonSpec()
->MemDepUnit::insertNonSpec()
Commit::tick()->Commit::commitInsts()->Commit::commitHead()
IEW::tick()->InstructionQueue::scheduleNonSpec()
->MemDepUnit::nonSpecInstReady()
->MemDepUnit::moveToReady()
->InstructionQueue::addReadyMemInst()
IEW::tick()->InstructionQueue::scheduleReadyInsts()
IEW::tick()->IEW::executeInsts()
->LSQUnit::executeStore()
->StaticInst::initiateAcc()
->LSQ::pushRequest()
->LSQUnit::write()
->LSQUnit::SQEntry::canWB() = true
->LSQUnit::checkViolation()
IEW::tick()->LSQUnit::writebackStores()
->LSQRequest::buildPackets()
->LSQRequest::sendPacketToCache()
->LSQUnit::storePostSend()
DcachePort::recvTimingResp()->LSQRequest::recvTimingResp()
->LSQUnit::completeDataAccess()
->LSQUnit::writeback()
->IEW::instToCommit()
->LSQUnit::completeStore()
Commit::tick()->Commit::commitInsts()->Commit::commitHead()
- When dispatching, the atomic instruction is inserted into the IQ and
marked non-speculative. Compared to normal
InstructionQueue::insert()
,InstructionQueue::insertNonSpec()
won’t calladdIfReady()
, thus not scheduling the instruction. - When the atomic instruction reaches the ROB head, the commit stage
checks if the instruction has been executed. If not, it sets the
nonSpecSeqNum
and clears itscanCommit
flag. Now the IEW stage knows that it can schedule the instruction. - When executing the atomic instruction, the SQ entry is immediatly
marked
canWB()
, as it’s already the head of ROB. - When the response comes back, the final value will be written back to register, and now the instruction can finally commit.
Branch Misspeculation
Branch misspeculation is handled in the IEW::executeInsts()
. It will
notify the commit stage to start squashing all instructions in the ROB
until the misspeculated branch.
IEW::tick()->IEW::executeInsts()->IEW::squashDueToBranch()
Memory Order Misspeculation
The InstructionQueue
has a MemDepUnit
to track memory order dependence.
The IQ will not schedule an instruction if MemDepUnit says there is no more
dependence.
In LSQUnit::read()
, the LSQ will search for possible aliasing store and
forward if possible. Otherwise, the load is blocked and rescheduled when the
blocking store completes, by notifying the MemDepUnit.
Both LSQUnit::executeLoad/Store()
will call LSQUnit::checkViolation()
to search in the LQ for possible misspeculation. If found, it will set
LSQUnit::memDepViolator
and later IEW::executeInsts()
will start to
squash.
IEW::tick()->IEW::executeInsts()
->LSQUnit::executeLoad()
->StaticInst::initiateAcc()
->LSQUnit::checkViolation()
->IEW::squashDueToMemOrder()