gem5-users@gem5.org

The gem5 Users mailing list

View all threads

Confused When Running Cache Benchmark in Gem5

AV
Aaron Vose
Sat, Jul 1, 2023 10:00 PM

Details are in the attached tarball.

I'm running a very simple RISC-V system configuration with only L1 data / instruction caches.
I'm running a very simple cache test executable that reuses memory inside of a 24KiB block.

When I run with 32KiB caches "riscv-gem5 ./cache_bench.py", it takes 6971237877 ticks.
When I run with 1KiB caches "riscv-gem5 ./cache_bench.py --no-cache" it takes 7042666907 ticks.
This is only a 1.01x improvement from having 32x the cache size; working set should be 24KiB.

I must be using Gem5 wrong since there is almost no improvement with increased cache sizes.
Does anyone have any ideas about what I might be missing here?

The test program, "ubench_cache.c" has been used in a different context with success.
I was able to use a Linux kernel module on an x86 machine to get access to uncacheable memory:
https://github.com/lemonsqueeze/uncached-ram-lkm
The benchmark has >100x performance with cacheable memory on x86 hardware vs uncacheable.

I feel like I must be missing something about configuring Gem5 correctly.
Any ideas or tips would be greatly appreciated.

Thanks,
~Aaron Vose

Details are in the attached tarball. I'm running a very simple RISC-V system configuration with only L1 data / instruction caches. I'm running a very simple cache test executable that reuses memory inside of a 24KiB block. When I run with 32KiB caches "riscv-gem5 ./cache_bench.py", it takes 6971237877 ticks. When I run with 1KiB caches "riscv-gem5 ./cache_bench.py --no-cache" it takes 7042666907 ticks. This is only a 1.01x improvement from having 32x the cache size; working set should be 24KiB. I must be using Gem5 wrong since there is almost no improvement with increased cache sizes. Does anyone have any ideas about what I might be missing here? The test program, "ubench_cache.c" has been used in a different context with success. I was able to use a Linux kernel module on an x86 machine to get access to uncacheable memory: https://github.com/lemonsqueeze/uncached-ram-lkm The benchmark has >100x performance with cacheable memory on x86 hardware vs uncacheable. I feel like I must be missing something about configuring Gem5 correctly. Any ideas or tips would be greatly appreciated. Thanks, ~Aaron Vose
AV
Aaron Vose
Sat, Jul 1, 2023 10:04 PM

Looks like the email system stripped out my tarball. Attaching again with the config file named "cache_bench_py_.txt" instead of "cache_bench.py".

Thanks,
~Aaron Vose

From: Aaron Vose via gem5-users gem5-users@gem5.org
Sent: Saturday, July 1, 2023 6:01 PM
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Aaron Vose avose@maxlinear.com
Subject: [gem5-users] Confused When Running Cache Benchmark in Gem5

This email was sent from outside of MaxLinear.

Details are in the attached tarball.

I'm running a very simple RISC-V system configuration with only L1 data / instruction caches.
I'm running a very simple cache test executable that reuses memory inside of a 24KiB block.

When I run with 32KiB caches "riscv-gem5 ./cache_bench.py", it takes 6971237877 ticks.
When I run with 1KiB caches "riscv-gem5 ./cache_bench.py --no-cache" it takes 7042666907 ticks.
This is only a 1.01x improvement from having 32x the cache size; working set should be 24KiB.

I must be using Gem5 wrong since there is almost no improvement with increased cache sizes.
Does anyone have any ideas about what I might be missing here?

The test program, "ubench_cache.c" has been used in a different context with success.
I was able to use a Linux kernel module on an x86 machine to get access to uncacheable memory:
https://github.com/lemonsqueeze/uncached-ram-lkmhttps://github.com/lemonsqueeze/uncached-ram-lkm
The benchmark has >100x performance with cacheable memory on x86 hardware vs uncacheable.

I feel like I must be missing something about configuring Gem5 correctly.
Any ideas or tips would be greatly appreciated.

Thanks,
~Aaron Vose

Looks like the email system stripped out my tarball. Attaching again with the config file named "cache_bench_py_.txt" instead of "cache_bench.py". Thanks, ~Aaron Vose From: Aaron Vose via gem5-users <gem5-users@gem5.org> Sent: Saturday, July 1, 2023 6:01 PM To: The gem5 Users mailing list <gem5-users@gem5.org> Cc: Aaron Vose <avose@maxlinear.com> Subject: [gem5-users] Confused When Running Cache Benchmark in Gem5 This email was sent from outside of MaxLinear. Details are in the attached tarball. I'm running a very simple RISC-V system configuration with only L1 data / instruction caches. I'm running a very simple cache test executable that reuses memory inside of a 24KiB block. When I run with 32KiB caches "riscv-gem5 ./cache_bench.py", it takes 6971237877 ticks. When I run with 1KiB caches "riscv-gem5 ./cache_bench.py --no-cache" it takes 7042666907 ticks. This is only a 1.01x improvement from having 32x the cache size; working set should be 24KiB. I must be using Gem5 wrong since there is almost no improvement with increased cache sizes. Does anyone have any ideas about what I might be missing here? The test program, "ubench_cache.c" has been used in a different context with success. I was able to use a Linux kernel module on an x86 machine to get access to uncacheable memory: https://github.com/lemonsqueeze/uncached-ram-lkm<https://github.com/lemonsqueeze/uncached-ram-lkm> The benchmark has >100x performance with cacheable memory on x86 hardware vs uncacheable. I feel like I must be missing something about configuring Gem5 correctly. Any ideas or tips would be greatly appreciated. Thanks, ~Aaron Vose