 
                    I'm wondering if the following code looks reasonable as a benchmark to test performance of L1D cache in Gem5. Changing the size of the L1 cache in Gem5 (from 32KiB to 1KiB) doesn't seem to show any significant change in the benchmark performance (~1%). I would have thought that using a 1KiB L1D cache should show a large decrease in performance. I'm using: "RiscvMinorCPU()" with "system.mem_mode = 'timing'". Benchmark code should reuse memory within a 24KiB block, so the performance between a 32KiB cache and a 1KiB cache should be significant.
Call to benchmark code in the C code would look something like "ubench_cache(100, buffer, (1<<10)*24);".
static uint32_t ubench_cache(const size_t iters,
const void *in,
const size_t sz) {
const uint8_t buf = (uint8_t)in;
const size_t def_blk_sz = ((size_t)(1<<10)) * 24;
const size_t blk_sz = (sz < def_blk_sz) ? (sz) : (def_blk_sz);
const size_t cl_sz = 64;
const size_t in_iters = 100;
uint8_t h = 0;
for (size_t j=0; j < iters; j++) {               // Outer iterations
for (size_t b=0; b < sz; b+=blk_sz) {          // There will be only 1 block
for (size_t i=0; i < in_iters; i++) {        // Inner iterations
for (size_t o=0; o < cl_sz; o++) {         // Offset withing a cache line
for (size_t c=0; c < blk_sz; c+=cl_sz) { // Cache line within a block
const size_t ndx = b + c + o;
h += (ndx < sz) ? (buf[ndx]) : (i);
}
}
}
}
}
return (uint32_t)h;
}
To have something else to test with, can anyone recommend a small / simple cache benchmark written in C? I'm trying to see if my Gem5 configuration is the problem, or if this test isn't a good one.
Thanks much,
~Aaron Vose
