Hi Nick,
In regards to gem5-vega-se, the ls you sent before shows it does not exist
in /usr/local/bin, so it's not surprising that didn't work.
In regards to docker more broadly though, setting up volumes is tricky.
The mental model I always have is: with this volume I specify, is
everything I need accessible or not? When you "only" specify
/usr/local/bin, everything not in /usr/local/bin will not be part of the
docker's volume when it runs -- which is why Python (for example) cannot be
found in that case. So, you'd need to setup a volume(s) with all of the
files you need accessible in order to avoid the Python error.
This is why I had asked if you ran the same commands as specified here:
https://github.com/gem5bootcamp/gem5-bootcamp-env/blob/51590ae00b0e451c9b6a8854addbb94128ab4cac/materials/developing-gem5-models/11-gpu/README.md#to-run-square-in-gem5-static-register-allocator,
because I believe they were set up so all of this is handled for you in the
bootcamp.
Matt
On Sun, Mar 2, 2025 at 3:37 PM Beser, Nicholas D. Nick.Beser@jhuapl.edu
wrote:
Matt,
I really appreciate your help with this section.
I had tried explicitly specifying the user/local/bin based on a readme.
Doc that was listed when codespace started:
55 docker run -v $PWD:$PWD -v /usr/local/bin:/usr/local/bin -w $PWD
ghcr.io/gem5/gcn-gpu:v24-0 gem5-vega-se gem5/configs/example/apu_se.py -n
3 -c square
56 docker run -v $PWD:$PWD -v /usr/local/bin:/usr/local/bin -w $PWD
ghcr.io/gem5/gcn-gpu:v24-0 gem5-vega gem5/configs/example/apu_se.py -n 3
-c square
Gem5-vega-se did not exist, and the second run produced the following
error message:
docker run -v $PWD:$PWD -v /usr/local/bin:/usr/local/bin -w $PWD
ghcr.io/gem5/gcn-gpu:v24-0 gem5-vega gem5/configs/example/apu_se.py -n 3
-c square
gem5-vega: error while loading shared libraries: libpython3.12.so.1.0:
cannot open shared object file: No such file or directory
Nick
From: Matt Sinclair mattdsinclair.wisc@gmail.com
Sent: Sunday, March 2, 2025 4:30 PM
To: Beser, Nicholas D. Nick.Beser@jhuapl.edu
Cc: The gem5 Users mailing list gem5-users@gem5.org; Jason Lowe-Power
jason@lowepower.com
Subject: Re: [EXT] Re: [gem5-users] Success with GPU
*APL external email warning: *Verify sender mattdsinclair.wisc@gmail.com
before clicking links or attachments
Hi Nick,
Did you try setting up the docker volume to explicitly include
/usr/local/bin then? The reason (likely) why it worked after you compiled
gem5.opt explicitly is that the folder you built it in was explicitly
included in the docker volume. Unless you are saying you built gem5.opt
for VEGA_X86 in /usr/local/bin and it only worked after you did this?
It sounds like you are using (or gem5 is using) m5 dumpreset stats
somewhere in the run (e.g., here:
https://github.com/gem5/gem5/blob/develop/configs/example/apu_se.py#L1078)
and this is causing the separate stats outputs. But even if that is the
case, a) you should see the print on line 1077 of apu_se.py in your simout
to verify this happened and b) what is the issue with the stats? Just that
there are two pieces?
Thanks,
Matt
On Sun, Mar 2, 2025 at 3:23 PM Beser, Nicholas D. Nick.Beser@jhuapl.edu
wrote:
Matt,
Here is codespace with ls -l /usr/local/bin
root@codespaces-e992b2:/workspaces/en525-712-81-sp25-amd-gpu-using-gem5-gem5bootcamp2024#
ls -l /usr/local/bin
total 3274048
lrwxrwxrwx 1 root root 36 Feb 26 00:48 actionlint ->
/usr/local/lib/actionlint/actionlint
-rwxrwxr-x 1 ubuntu ubuntu 21869136 Feb 12 23:27 code
-rwxr-xr-x 1 root root 2953216 Feb 26 00:48 compose-switch
lrwxrwxrwx 1 root root 32 Feb 26 00:48 docker-compose ->
/etc/alternatives/docker-compose
-rwxr-xr-x 1 root root 73691062 Feb 26 00:48 docker-compose-v1
lrwxrwxrwx 1 root root 23 Jul 25 2024 gem5 ->
/usr/local/bin/gem5-chi
-rwxr-xr-x 1 root root 1185554096 Jul 25 2024 gem5-chi
-rwxr-xr-x 1 root root 1184136040 Jul 25 2024 gem5-mesi
-rwxr-xr-x 1 root root 884400248 Jul 25 2024 gem5-vega
root@codespaces-e992b2
:/workspaces/en525-712-81-sp25-amd-gpu-using-gem5-gem5bootcamp2024#
Since it returned with the binary, I assumed it would run. It did run
after we compiled it explicitly.
As it turns out the stats.txt file has two sets of simulation runs
documented. One completes after the CPU finishes, and the second completes
after the CPU checks the result.
src/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
GPU Kernel Completed dump and reset
src/sim/simulate.cc:199: info: Entering event queue @ 105637104000.
Starting simulation...
info: check result
PASSED!
breaking loop due to: exiting with last active thread context.
Ticks: 143303043500
Exiting because exiting with last active thread context
Nick
From: Matt Sinclair mattdsinclair.wisc@gmail.com
Sent: Sunday, March 2, 2025 4:17 PM
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Beser, Nicholas D. Nick.Beser@jhuapl.edu; Jason Lowe-Power <
jason@lowepower.com>
Subject: [EXT] Re: [gem5-users] Success with GPU
*APL external email warning: *Verify sender mattdsinclair.wisc@gmail.com
before clicking links or attachments
Hi Nick,
I'm not sure why you believe gem5-vega should be in /usr/local/bin? It's
been a few months since I last looked at this codespace, but looking at the
instructions here:
https://github.com/gem5bootcamp/gem5-bootcamp-env/blob/51590ae00b0e451c9b6a8854addbb94128ab4cac/materials/developing-gem5-models/11-gpu/README.md#to-run-square-in-gem5-static-register-allocator,
they do not seem to assume /usr/local/bin. Instead, they are setting up
the volume for docker for other folders. Have you tried this command? Of
course, it's possible I'm wrong though about /usr/local/bin -- but Bobby or
Jason would have to answer that.
Setting that aside, it looks like the instructions you have done are
basically bypassing the prebuilt gem5-vega and building it yourself -- this
is ultimately fine, and what my students do in my research group, but of
course takes a bit longer.
What is the issue with the stats file exactly? I guess you wrote your own
CPU version of square and that version is not behaving as expected? I am
not an expert at the CPU part of gem5, but I'd need more information about
how you disabled the CPU part to understand or try to look into this.
Likewise, what stats are you looking at for the CPU?
Thanks,
Matt
On Sun, Mar 2, 2025 at 2:54 PM Beser, Nicholas D. via gem5-users <
gem5-users@gem5.org> wrote:
Based on the discussion, It seems that docker can’t find the gem5-vega
that is in the /usr/local/bin. I noticed that the instructions also had us
building the VEGA_X86/gem5.opt binary with the following command:
I. docker run --volume
$(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v24-0 scons
build/VEGA_X86/gem5.opt -j# (# is the number of cores on your X86 system)
I build VEGA_X86 in codespace. The following command afterwards was able
to run the GPU square binary:
docker run --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v24-0
gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 -c
gem5-resources/src/gpu/square/bin/square
The program exsecuted correctly and create a stats.txt file. I have sent
this instruction to my class so they could proceed with the experiments
using the GPU.
I do have a question about the results in the stats.txt file. We noticed
that the program computed the square operation using the GPU, and then
compared the result using a CPU only code. When one of my students disabled
the CPU only code, he did not see a drop in the cpu instructions that would
have corresponded to that loop. I have them looking at the stats.txt file
for indications about what resources the GPU had used to perform the
operation.
Nick
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org
Matt,
No I did not run those commands. I had not seen them before. They are not setup for the repository I am working from (they look like they are setup for bootcamp 2022).
Nick
From: Matt Sinclair mattdsinclair.wisc@gmail.com
Sent: Sunday, March 2, 2025 4:49 PM
To: Beser, Nicholas D. Nick.Beser@jhuapl.edu
Cc: The gem5 Users mailing list gem5-users@gem5.org; Jason Lowe-Power jason@lowepower.com
Subject: Re: [EXT] Re: [gem5-users] Success with GPU
APL external email warning: Verify sender mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com before clicking links or attachments
Hi Nick,
In regards to gem5-vega-se, the ls you sent before shows it does not exist in /usr/local/bin, so it's not surprising that didn't work.
In regards to docker more broadly though, setting up volumes is tricky. The mental model I always have is: with this volume I specify, is everything I need accessible or not? When you "only" specify /usr/local/bin, everything not in /usr/local/bin will not be part of the docker's volume when it runs -- which is why Python (for example) cannot be found in that case. So, you'd need to setup a volume(s) with all of the files you need accessible in order to avoid the Python error.
This is why I had asked if you ran the same commands as specified here: https://github.com/gem5bootcamp/gem5-bootcamp-env/blob/51590ae00b0e451c9b6a8854addbb94128ab4cac/materials/developing-gem5-models/11-gpu/README.md#to-run-square-in-gem5-static-register-allocator, because I believe they were set up so all of this is handled for you in the bootcamp.
Matt
On Sun, Mar 2, 2025 at 3:37 PM Beser, Nicholas D. <Nick.Beser@jhuapl.edumailto:Nick.Beser@jhuapl.edu> wrote:
Matt,
I really appreciate your help with this section.
I had tried explicitly specifying the user/local/bin based on a readme. Doc that was listed when codespace started:
55 docker run -v $PWD:$PWD -v /usr/local/bin:/usr/local/bin -w $PWD ghcr.io/gem5/gcn-gpu:v24-0http://ghcr.io/gem5/gcn-gpu:v24-0 gem5-vega-se gem5/configs/example/apu_se.py -n 3 -c square
56 docker run -v $PWD:$PWD -v /usr/local/bin:/usr/local/bin -w $PWD ghcr.io/gem5/gcn-gpu:v24-0http://ghcr.io/gem5/gcn-gpu:v24-0 gem5-vega gem5/configs/example/apu_se.py -n 3 -c square
Gem5-vega-se did not exist, and the second run produced the following error message:
docker run -v $PWD:$PWD -v /usr/local/bin:/usr/local/bin -w $PWD ghcr.io/gem5/gcn-gpu:v24-0http://ghcr.io/gem5/gcn-gpu:v24-0 gem5-vega gem5/configs/example/apu_se.py -n 3 -c square
gem5-vega: error while loading shared libraries: libpython3.12.so.1.0: cannot open shared object file: No such file or directory
Nick
From: Matt Sinclair <mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com>
Sent: Sunday, March 2, 2025 4:30 PM
To: Beser, Nicholas D. <Nick.Beser@jhuapl.edumailto:Nick.Beser@jhuapl.edu>
Cc: The gem5 Users mailing list <gem5-users@gem5.orgmailto:gem5-users@gem5.org>; Jason Lowe-Power <jason@lowepower.commailto:jason@lowepower.com>
Subject: Re: [EXT] Re: [gem5-users] Success with GPU
APL external email warning: Verify sender mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com before clicking links or attachments
Hi Nick,
Did you try setting up the docker volume to explicitly include /usr/local/bin then? The reason (likely) why it worked after you compiled gem5.opt explicitly is that the folder you built it in was explicitly included in the docker volume. Unless you are saying you built gem5.opt for VEGA_X86 in /usr/local/bin and it only worked after you did this?
It sounds like you are using (or gem5 is using) m5 dumpreset stats somewhere in the run (e.g., here: https://github.com/gem5/gem5/blob/develop/configs/example/apu_se.py#L1078) and this is causing the separate stats outputs. But even if that is the case, a) you should see the print on line 1077 of apu_se.py in your simout to verify this happened and b) what is the issue with the stats? Just that there are two pieces?
Thanks,
Matt
On Sun, Mar 2, 2025 at 3:23 PM Beser, Nicholas D. <Nick.Beser@jhuapl.edumailto:Nick.Beser@jhuapl.edu> wrote:
Matt,
Here is codespace with ls -l /usr/local/bin
root@codespaces-e992b2:/workspaces/en525-712-81-sp25-amd-gpu-using-gem5-gem5bootcamp2024# ls -l /usr/local/bin
total 3274048
lrwxrwxrwx 1 root root 36 Feb 26 00:48 actionlint -> /usr/local/lib/actionlint/actionlint
-rwxrwxr-x 1 ubuntu ubuntu 21869136 Feb 12 23:27 code
-rwxr-xr-x 1 root root 2953216 Feb 26 00:48 compose-switch
lrwxrwxrwx 1 root root 32 Feb 26 00:48 docker-compose -> /etc/alternatives/docker-compose
-rwxr-xr-x 1 root root 73691062 Feb 26 00:48 docker-compose-v1
lrwxrwxrwx 1 root root 23 Jul 25 2024 gem5 -> /usr/local/bin/gem5-chi
-rwxr-xr-x 1 root root 1185554096 Jul 25 2024 gem5-chi
-rwxr-xr-x 1 root root 1184136040 Jul 25 2024 gem5-mesi
-rwxr-xr-x 1 root root 884400248 Jul 25 2024 gem5-vega
root@codespaces-e992b2:/workspaces/en525-712-81-sp25-amd-gpu-using-gem5-gem5bootcamp2024#
Since it returned with the binary, I assumed it would run. It did run after we compiled it explicitly.
As it turns out the stats.txt file has two sets of simulation runs documented. One completes after the CPU finishes, and the second completes after the CPU checks the result.
src/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
GPU Kernel Completed dump and reset
src/sim/simulate.cc:199: info: Entering event queue @ 105637104000. Starting simulation...
info: check result
PASSED!
breaking loop due to: exiting with last active thread context.
Ticks: 143303043500
Exiting because exiting with last active thread context
Nick
From: Matt Sinclair <mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com>
Sent: Sunday, March 2, 2025 4:17 PM
To: The gem5 Users mailing list <gem5-users@gem5.orgmailto:gem5-users@gem5.org>
Cc: Beser, Nicholas D. <Nick.Beser@jhuapl.edumailto:Nick.Beser@jhuapl.edu>; Jason Lowe-Power <jason@lowepower.commailto:jason@lowepower.com>
Subject: [EXT] Re: [gem5-users] Success with GPU
APL external email warning: Verify sender mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com before clicking links or attachments
Hi Nick,
I'm not sure why you believe gem5-vega should be in /usr/local/bin? It's been a few months since I last looked at this codespace, but looking at the instructions here: https://github.com/gem5bootcamp/gem5-bootcamp-env/blob/51590ae00b0e451c9b6a8854addbb94128ab4cac/materials/developing-gem5-models/11-gpu/README.md#to-run-square-in-gem5-static-register-allocator, they do not seem to assume /usr/local/bin. Instead, they are setting up the volume for docker for other folders. Have you tried this command? Of course, it's possible I'm wrong though about /usr/local/bin -- but Bobby or Jason would have to answer that.
Setting that aside, it looks like the instructions you have done are basically bypassing the prebuilt gem5-vega and building it yourself -- this is ultimately fine, and what my students do in my research group, but of course takes a bit longer.
What is the issue with the stats file exactly? I guess you wrote your own CPU version of square and that version is not behaving as expected? I am not an expert at the CPU part of gem5, but I'd need more information about how you disabled the CPU part to understand or try to look into this. Likewise, what stats are you looking at for the CPU?
Thanks,
Matt
On Sun, Mar 2, 2025 at 2:54 PM Beser, Nicholas D. via gem5-users <gem5-users@gem5.orgmailto:gem5-users@gem5.org> wrote:
Based on the discussion, It seems that docker can’t find the gem5-vega that is in the /usr/local/bin. I noticed that the instructions also had us building the VEGA_X86/gem5.opt binary with the following command:
I. docker run --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v24-0http://ghcr.io/gem5/gcn-gpu:v24-0 scons build/VEGA_X86/gem5.opt -j# (# is the number of cores on your X86 system)
I build VEGA_X86 in codespace. The following command afterwards was able to run the GPU square binary:
docker run --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v24-0http://ghcr.io/gem5/gcn-gpu:v24-0 gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 -c gem5-resources/src/gpu/square/bin/square
The program exsecuted correctly and create a stats.txt file. I have sent this instruction to my class so they could proceed with the experiments using the GPU.
I do have a question about the results in the stats.txt file. We noticed that the program computed the square operation using the GPU, and then compared the result using a CPU only code. When one of my students disabled the CPU only code, he did not see a drop in the cpu instructions that would have corresponded to that loop. I have them looking at the stats.txt file for indications about what resources the GPU had used to perform the operation.
Nick
gem5-users mailing list -- gem5-users@gem5.orgmailto:gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.orgmailto:gem5-users-leave@gem5.org
Whoops, thanks. I meant this one:
https://github.com/gem5bootcamp/2024/tree/main/materials/04-GPU-model.
Note in these commands we actually specified the full path to gem5-vega in
them (we also didn't need the docker in this case because of how Jason set
things up). I do not know though if that same setup carried over to yours
or not though. If not, then the comments from my previous email about the
volume would be my suggestion on how to proceed there.
Matt
On Sun, Mar 2, 2025 at 4:20 PM Beser, Nicholas D. Nick.Beser@jhuapl.edu
wrote:
Matt,
No I did not run those commands. I had not seen them before. They are not
setup for the repository I am working from (they look like they are setup
for bootcamp 2022).
Nick
From: Matt Sinclair mattdsinclair.wisc@gmail.com
Sent: Sunday, March 2, 2025 4:49 PM
To: Beser, Nicholas D. Nick.Beser@jhuapl.edu
Cc: The gem5 Users mailing list gem5-users@gem5.org; Jason Lowe-Power
jason@lowepower.com
Subject: Re: [EXT] Re: [gem5-users] Success with GPU
*APL external email warning: *Verify sender mattdsinclair.wisc@gmail.com
before clicking links or attachments
Hi Nick,
In regards to gem5-vega-se, the ls you sent before shows it does not exist
in /usr/local/bin, so it's not surprising that didn't work.
In regards to docker more broadly though, setting up volumes is tricky.
The mental model I always have is: with this volume I specify, is
everything I need accessible or not? When you "only" specify
/usr/local/bin, everything not in /usr/local/bin will not be part of the
docker's volume when it runs -- which is why Python (for example) cannot be
found in that case. So, you'd need to setup a volume(s) with all of the
files you need accessible in order to avoid the Python error.
This is why I had asked if you ran the same commands as specified here:
https://github.com/gem5bootcamp/gem5-bootcamp-env/blob/51590ae00b0e451c9b6a8854addbb94128ab4cac/materials/developing-gem5-models/11-gpu/README.md#to-run-square-in-gem5-static-register-allocator,
because I believe they were set up so all of this is handled for you in the
bootcamp.
Matt
On Sun, Mar 2, 2025 at 3:37 PM Beser, Nicholas D. Nick.Beser@jhuapl.edu
wrote:
Matt,
I really appreciate your help with this section.
I had tried explicitly specifying the user/local/bin based on a readme.
Doc that was listed when codespace started:
55 docker run -v $PWD:$PWD -v /usr/local/bin:/usr/local/bin -w $PWD
ghcr.io/gem5/gcn-gpu:v24-0 gem5-vega-se gem5/configs/example/apu_se.py -n
3 -c square
56 docker run -v $PWD:$PWD -v /usr/local/bin:/usr/local/bin -w $PWD
ghcr.io/gem5/gcn-gpu:v24-0 gem5-vega gem5/configs/example/apu_se.py -n 3
-c square
Gem5-vega-se did not exist, and the second run produced the following
error message:
docker run -v $PWD:$PWD -v /usr/local/bin:/usr/local/bin -w $PWD
ghcr.io/gem5/gcn-gpu:v24-0 gem5-vega gem5/configs/example/apu_se.py -n 3
-c square
gem5-vega: error while loading shared libraries: libpython3.12.so.1.0:
cannot open shared object file: No such file or directory
Nick
From: Matt Sinclair mattdsinclair.wisc@gmail.com
Sent: Sunday, March 2, 2025 4:30 PM
To: Beser, Nicholas D. Nick.Beser@jhuapl.edu
Cc: The gem5 Users mailing list gem5-users@gem5.org; Jason Lowe-Power
jason@lowepower.com
Subject: Re: [EXT] Re: [gem5-users] Success with GPU
*APL external email warning: *Verify sender mattdsinclair.wisc@gmail.com
before clicking links or attachments
Hi Nick,
Did you try setting up the docker volume to explicitly include
/usr/local/bin then? The reason (likely) why it worked after you compiled
gem5.opt explicitly is that the folder you built it in was explicitly
included in the docker volume. Unless you are saying you built gem5.opt
for VEGA_X86 in /usr/local/bin and it only worked after you did this?
It sounds like you are using (or gem5 is using) m5 dumpreset stats
somewhere in the run (e.g., here:
https://github.com/gem5/gem5/blob/develop/configs/example/apu_se.py#L1078)
and this is causing the separate stats outputs. But even if that is the
case, a) you should see the print on line 1077 of apu_se.py in your simout
to verify this happened and b) what is the issue with the stats? Just that
there are two pieces?
Thanks,
Matt
On Sun, Mar 2, 2025 at 3:23 PM Beser, Nicholas D. Nick.Beser@jhuapl.edu
wrote:
Matt,
Here is codespace with ls -l /usr/local/bin
root@codespaces-e992b2:/workspaces/en525-712-81-sp25-amd-gpu-using-gem5-gem5bootcamp2024#
ls -l /usr/local/bin
total 3274048
lrwxrwxrwx 1 root root 36 Feb 26 00:48 actionlint ->
/usr/local/lib/actionlint/actionlint
-rwxrwxr-x 1 ubuntu ubuntu 21869136 Feb 12 23:27 code
-rwxr-xr-x 1 root root 2953216 Feb 26 00:48 compose-switch
lrwxrwxrwx 1 root root 32 Feb 26 00:48 docker-compose ->
/etc/alternatives/docker-compose
-rwxr-xr-x 1 root root 73691062 Feb 26 00:48 docker-compose-v1
lrwxrwxrwx 1 root root 23 Jul 25 2024 gem5 ->
/usr/local/bin/gem5-chi
-rwxr-xr-x 1 root root 1185554096 Jul 25 2024 gem5-chi
-rwxr-xr-x 1 root root 1184136040 Jul 25 2024 gem5-mesi
-rwxr-xr-x 1 root root 884400248 Jul 25 2024 gem5-vega
root@codespaces-e992b2
:/workspaces/en525-712-81-sp25-amd-gpu-using-gem5-gem5bootcamp2024#
Since it returned with the binary, I assumed it would run. It did run
after we compiled it explicitly.
As it turns out the stats.txt file has two sets of simulation runs
documented. One completes after the CPU finishes, and the second completes
after the CPU checks the result.
src/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
GPU Kernel Completed dump and reset
src/sim/simulate.cc:199: info: Entering event queue @ 105637104000.
Starting simulation...
info: check result
PASSED!
breaking loop due to: exiting with last active thread context.
Ticks: 143303043500
Exiting because exiting with last active thread context
Nick
From: Matt Sinclair mattdsinclair.wisc@gmail.com
Sent: Sunday, March 2, 2025 4:17 PM
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Beser, Nicholas D. Nick.Beser@jhuapl.edu; Jason Lowe-Power <
jason@lowepower.com>
Subject: [EXT] Re: [gem5-users] Success with GPU
*APL external email warning: *Verify sender mattdsinclair.wisc@gmail.com
before clicking links or attachments
Hi Nick,
I'm not sure why you believe gem5-vega should be in /usr/local/bin? It's
been a few months since I last looked at this codespace, but looking at the
instructions here:
https://github.com/gem5bootcamp/gem5-bootcamp-env/blob/51590ae00b0e451c9b6a8854addbb94128ab4cac/materials/developing-gem5-models/11-gpu/README.md#to-run-square-in-gem5-static-register-allocator,
they do not seem to assume /usr/local/bin. Instead, they are setting up
the volume for docker for other folders. Have you tried this command? Of
course, it's possible I'm wrong though about /usr/local/bin -- but Bobby or
Jason would have to answer that.
Setting that aside, it looks like the instructions you have done are
basically bypassing the prebuilt gem5-vega and building it yourself -- this
is ultimately fine, and what my students do in my research group, but of
course takes a bit longer.
What is the issue with the stats file exactly? I guess you wrote your own
CPU version of square and that version is not behaving as expected? I am
not an expert at the CPU part of gem5, but I'd need more information about
how you disabled the CPU part to understand or try to look into this.
Likewise, what stats are you looking at for the CPU?
Thanks,
Matt
On Sun, Mar 2, 2025 at 2:54 PM Beser, Nicholas D. via gem5-users <
gem5-users@gem5.org> wrote:
Based on the discussion, It seems that docker can’t find the gem5-vega
that is in the /usr/local/bin. I noticed that the instructions also had us
building the VEGA_X86/gem5.opt binary with the following command:
I. docker run --volume
$(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v24-0 scons
build/VEGA_X86/gem5.opt -j# (# is the number of cores on your X86 system)
I build VEGA_X86 in codespace. The following command afterwards was able
to run the GPU square binary:
docker run --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v24-0
gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 -c
gem5-resources/src/gpu/square/bin/square
The program exsecuted correctly and create a stats.txt file. I have sent
this instruction to my class so they could proceed with the experiments
using the GPU.
I do have a question about the results in the stats.txt file. We noticed
that the program computed the square operation using the GPU, and then
compared the result using a CPU only code. When one of my students disabled
the CPU only code, he did not see a drop in the cpu instructions that would
have corresponded to that loop. I have them looking at the stats.txt file
for indications about what resources the GPU had used to perform the
operation.
Nick
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org
Thank you, I will take a look at these.
Nick
From: Matt Sinclair mattdsinclair.wisc@gmail.com
Sent: Sunday, March 2, 2025 5:31 PM
To: Beser, Nicholas D. Nick.Beser@jhuapl.edu
Cc: The gem5 Users mailing list gem5-users@gem5.org; Jason Lowe-Power jason@lowepower.com
Subject: Re: [EXT] Re: [gem5-users] Success with GPU
APL external email warning: Verify sender mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com before clicking links or attachments
Whoops, thanks. I meant this one: https://github.com/gem5bootcamp/2024/tree/main/materials/04-GPU-model. Note in these commands we actually specified the full path to gem5-vega in them (we also didn't need the docker in this case because of how Jason set things up). I do not know though if that same setup carried over to yours or not though. If not, then the comments from my previous email about the volume would be my suggestion on how to proceed there.
Matt
On Sun, Mar 2, 2025 at 4:20 PM Beser, Nicholas D. <Nick.Beser@jhuapl.edumailto:Nick.Beser@jhuapl.edu> wrote:
Matt,
No I did not run those commands. I had not seen them before. They are not setup for the repository I am working from (they look like they are setup for bootcamp 2022).
Nick
From: Matt Sinclair <mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com>
Sent: Sunday, March 2, 2025 4:49 PM
To: Beser, Nicholas D. <Nick.Beser@jhuapl.edumailto:Nick.Beser@jhuapl.edu>
Cc: The gem5 Users mailing list <gem5-users@gem5.orgmailto:gem5-users@gem5.org>; Jason Lowe-Power <jason@lowepower.commailto:jason@lowepower.com>
Subject: Re: [EXT] Re: [gem5-users] Success with GPU
APL external email warning: Verify sender mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com before clicking links or attachments
Hi Nick,
In regards to gem5-vega-se, the ls you sent before shows it does not exist in /usr/local/bin, so it's not surprising that didn't work.
In regards to docker more broadly though, setting up volumes is tricky. The mental model I always have is: with this volume I specify, is everything I need accessible or not? When you "only" specify /usr/local/bin, everything not in /usr/local/bin will not be part of the docker's volume when it runs -- which is why Python (for example) cannot be found in that case. So, you'd need to setup a volume(s) with all of the files you need accessible in order to avoid the Python error.
This is why I had asked if you ran the same commands as specified here: https://github.com/gem5bootcamp/gem5-bootcamp-env/blob/51590ae00b0e451c9b6a8854addbb94128ab4cac/materials/developing-gem5-models/11-gpu/README.md#to-run-square-in-gem5-static-register-allocator, because I believe they were set up so all of this is handled for you in the bootcamp.
Matt
On Sun, Mar 2, 2025 at 3:37 PM Beser, Nicholas D. <Nick.Beser@jhuapl.edumailto:Nick.Beser@jhuapl.edu> wrote:
Matt,
I really appreciate your help with this section.
I had tried explicitly specifying the user/local/bin based on a readme. Doc that was listed when codespace started:
55 docker run -v $PWD:$PWD -v /usr/local/bin:/usr/local/bin -w $PWD ghcr.io/gem5/gcn-gpu:v24-0http://ghcr.io/gem5/gcn-gpu:v24-0 gem5-vega-se gem5/configs/example/apu_se.py -n 3 -c square
56 docker run -v $PWD:$PWD -v /usr/local/bin:/usr/local/bin -w $PWD ghcr.io/gem5/gcn-gpu:v24-0http://ghcr.io/gem5/gcn-gpu:v24-0 gem5-vega gem5/configs/example/apu_se.py -n 3 -c square
Gem5-vega-se did not exist, and the second run produced the following error message:
docker run -v $PWD:$PWD -v /usr/local/bin:/usr/local/bin -w $PWD ghcr.io/gem5/gcn-gpu:v24-0http://ghcr.io/gem5/gcn-gpu:v24-0 gem5-vega gem5/configs/example/apu_se.py -n 3 -c square
gem5-vega: error while loading shared libraries: libpython3.12.so.1.0: cannot open shared object file: No such file or directory
Nick
From: Matt Sinclair <mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com>
Sent: Sunday, March 2, 2025 4:30 PM
To: Beser, Nicholas D. <Nick.Beser@jhuapl.edumailto:Nick.Beser@jhuapl.edu>
Cc: The gem5 Users mailing list <gem5-users@gem5.orgmailto:gem5-users@gem5.org>; Jason Lowe-Power <jason@lowepower.commailto:jason@lowepower.com>
Subject: Re: [EXT] Re: [gem5-users] Success with GPU
APL external email warning: Verify sender mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com before clicking links or attachments
Hi Nick,
Did you try setting up the docker volume to explicitly include /usr/local/bin then? The reason (likely) why it worked after you compiled gem5.opt explicitly is that the folder you built it in was explicitly included in the docker volume. Unless you are saying you built gem5.opt for VEGA_X86 in /usr/local/bin and it only worked after you did this?
It sounds like you are using (or gem5 is using) m5 dumpreset stats somewhere in the run (e.g., here: https://github.com/gem5/gem5/blob/develop/configs/example/apu_se.py#L1078) and this is causing the separate stats outputs. But even if that is the case, a) you should see the print on line 1077 of apu_se.py in your simout to verify this happened and b) what is the issue with the stats? Just that there are two pieces?
Thanks,
Matt
On Sun, Mar 2, 2025 at 3:23 PM Beser, Nicholas D. <Nick.Beser@jhuapl.edumailto:Nick.Beser@jhuapl.edu> wrote:
Matt,
Here is codespace with ls -l /usr/local/bin
root@codespaces-e992b2:/workspaces/en525-712-81-sp25-amd-gpu-using-gem5-gem5bootcamp2024# ls -l /usr/local/bin
total 3274048
lrwxrwxrwx 1 root root 36 Feb 26 00:48 actionlint -> /usr/local/lib/actionlint/actionlint
-rwxrwxr-x 1 ubuntu ubuntu 21869136 Feb 12 23:27 code
-rwxr-xr-x 1 root root 2953216 Feb 26 00:48 compose-switch
lrwxrwxrwx 1 root root 32 Feb 26 00:48 docker-compose -> /etc/alternatives/docker-compose
-rwxr-xr-x 1 root root 73691062 Feb 26 00:48 docker-compose-v1
lrwxrwxrwx 1 root root 23 Jul 25 2024 gem5 -> /usr/local/bin/gem5-chi
-rwxr-xr-x 1 root root 1185554096 Jul 25 2024 gem5-chi
-rwxr-xr-x 1 root root 1184136040 Jul 25 2024 gem5-mesi
-rwxr-xr-x 1 root root 884400248 Jul 25 2024 gem5-vega
root@codespaces-e992b2:/workspaces/en525-712-81-sp25-amd-gpu-using-gem5-gem5bootcamp2024#
Since it returned with the binary, I assumed it would run. It did run after we compiled it explicitly.
As it turns out the stats.txt file has two sets of simulation runs documented. One completes after the CPU finishes, and the second completes after the CPU checks the result.
src/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
GPU Kernel Completed dump and reset
src/sim/simulate.cc:199: info: Entering event queue @ 105637104000. Starting simulation...
info: check result
PASSED!
breaking loop due to: exiting with last active thread context.
Ticks: 143303043500
Exiting because exiting with last active thread context
Nick
From: Matt Sinclair <mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com>
Sent: Sunday, March 2, 2025 4:17 PM
To: The gem5 Users mailing list <gem5-users@gem5.orgmailto:gem5-users@gem5.org>
Cc: Beser, Nicholas D. <Nick.Beser@jhuapl.edumailto:Nick.Beser@jhuapl.edu>; Jason Lowe-Power <jason@lowepower.commailto:jason@lowepower.com>
Subject: [EXT] Re: [gem5-users] Success with GPU
APL external email warning: Verify sender mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com before clicking links or attachments
Hi Nick,
I'm not sure why you believe gem5-vega should be in /usr/local/bin? It's been a few months since I last looked at this codespace, but looking at the instructions here: https://github.com/gem5bootcamp/gem5-bootcamp-env/blob/51590ae00b0e451c9b6a8854addbb94128ab4cac/materials/developing-gem5-models/11-gpu/README.md#to-run-square-in-gem5-static-register-allocator, they do not seem to assume /usr/local/bin. Instead, they are setting up the volume for docker for other folders. Have you tried this command? Of course, it's possible I'm wrong though about /usr/local/bin -- but Bobby or Jason would have to answer that.
Setting that aside, it looks like the instructions you have done are basically bypassing the prebuilt gem5-vega and building it yourself -- this is ultimately fine, and what my students do in my research group, but of course takes a bit longer.
What is the issue with the stats file exactly? I guess you wrote your own CPU version of square and that version is not behaving as expected? I am not an expert at the CPU part of gem5, but I'd need more information about how you disabled the CPU part to understand or try to look into this. Likewise, what stats are you looking at for the CPU?
Thanks,
Matt
On Sun, Mar 2, 2025 at 2:54 PM Beser, Nicholas D. via gem5-users <gem5-users@gem5.orgmailto:gem5-users@gem5.org> wrote:
Based on the discussion, It seems that docker can’t find the gem5-vega that is in the /usr/local/bin. I noticed that the instructions also had us building the VEGA_X86/gem5.opt binary with the following command:
I. docker run --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v24-0http://ghcr.io/gem5/gcn-gpu:v24-0 scons build/VEGA_X86/gem5.opt -j# (# is the number of cores on your X86 system)
I build VEGA_X86 in codespace. The following command afterwards was able to run the GPU square binary:
docker run --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v24-0http://ghcr.io/gem5/gcn-gpu:v24-0 gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 -c gem5-resources/src/gpu/square/bin/square
The program exsecuted correctly and create a stats.txt file. I have sent this instruction to my class so they could proceed with the experiments using the GPU.
I do have a question about the results in the stats.txt file. We noticed that the program computed the square operation using the GPU, and then compared the result using a CPU only code. When one of my students disabled the CPU only code, he did not see a drop in the cpu instructions that would have corresponded to that loop. I have them looking at the stats.txt file for indications about what resources the GPU had used to perform the operation.
Nick
gem5-users mailing list -- gem5-users@gem5.orgmailto:gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.orgmailto:gem5-users-leave@gem5.org