Hi,
I'm trying to figure out how "clflush" instruction works in gem5.
Specially, how it issues a signal to the cache controller to evict the
block from cache hierarchy throughout the system and how it receives
confirmation to clean the store buffer so that the next fence let following
instructions to proceed. Anyone have any idea how this works or where I
should look for better understanding ?
I have tried to trace clflush execution and found some confusing facts. It
would be great if anyone could clarify this.
"clflush" instruction execution eventually calls
Clflushopt::initiateAcc() (build/X86/arch/x86/generated/exec-ns.cc.inc ) as
macroop definition of CLFLUSH uses clflushopt. So, there is no dedicated
clflush operation in gem5 but all flush operations are treated as
clflushopt ?
When Clflushopt::initiateAcc() executes in timing simulation (
CPUType::TIMING), it eventually calls TimingSimpleCPU::writeMem() function
in src/cpu/simple/timing.cc. Here you have :
if (data == NULL) {
assert(flags & Request::STORE_NO_DATA);
// This must be a cache block cleaning request
memset(newData, 0, size);
} else {
memcpy(newData, data, size);
}
So, I was assuming it will have data==NULL and execute memset() but it
actually executes memcpy(). This seems weird. Am I missing something ?
Best
shaikhul
On 6/16/2023 11:39 AM, Khan Shaikhul Hadi via gem5-users wrote:
Hi,
I'm trying to figure out how "clflush" instruction works in gem5. Specially, how it issues a signal
to the cache controller to evict the block from cache hierarchy throughout the system and how it
receives confirmation to clean the store buffer so that the next fence let following instructions to
proceed. Anyone have any idea how this works or where I should look for better understanding ?
I have tried to trace clflush execution and found some confusing facts. It would be great if anyone
could clarify this.
clflush is a clflushopt followed by a microop that waits for the store queues
to be empty. This is what causes the stronger ordering of clflush vs clflushopt.
When Clflushopt::initiateAcc() executes in timing simulation ( CPUType::TIMING), it eventually
calls TimingSimpleCPU::writeMem() function in src/cpu/simple/timing.cc. Here you have :
if (data == NULL) {
assert(flags & Request::STORE_NO_DATA);
// This must be a cache block cleaning request
memset(newData, 0, size);
} else {
memcpy(newData, data, size);
}
So, I was assuming it will have data==NULL and execute memset() but it actually executes memcpy().
This seems weird. Am I missing something ?
Some processors have an operation to zero a cache block (line). That's what the memset is for.
Otherwise the flushed data have been sent to the memory and need to be stored (memcpy).
I'd have to go dig into the code, but maybe what you're seeing is that the instruction must
first do a virtual address translation, and only after the result of that is available (some
number of cycles later) can it send the actual request (which is put into a store queue and
acted on in due course).
Note further that an operation like clflush may travel all the way out to the coherent xbar
closest to the memory and then snoops will be sent down to all the caches (since the line
in question may be in some other processor's L1 cache (for example)). Whichever cache has
the data will respond. If none respond, then the cache line is not resident anyway (or was
not dirty and is now dropped by all the caches) so there is no further work to do.
There are some aspects of this where gem5 does not follow what x86 processors do ... in
particular, gem5 handles all x86 memory store operations (clflush is in this category)
in order (Intel TSO - total store order), even though Intel ordering of clflushopt and
clwb is weaker. I coded up something more like actual Intel behavior, but have not
submitted it back to gem5 :-( ... It made the store queue processing rather more subtle,
since the existing code counted on things proceeding in order.
HTH