Hi all,
I'm trying to change dgpu_mem_size from the default 16GB to 8GB. The
system does boot properly but at the moment of executing the square program
it aborts the execution.
I attach here the log of the error.
em5.opt: src/mem/ruby/system/RubyPort.cc:273: virtual bool
gem5::ruby::RubyPort::MemResponsePort::recvTimingReq(gem5::PacketPtr):
Assertion `owner.memRequestPort.isConnected()' failed.
Program aborted at tick 10851562163265
--- BEGIN LIBC BACKTRACE ---
gem5/build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5557a2269ea0]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5557a228fd3c]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6cbcc42520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f6cbcc969fc]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f6cbcc42476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f6cbcc287f3]
/lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7f6cbcc2871b]
/lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f6cbcc39e96]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem54ruby8RubyPort15MemResponsePort13recvTimingReqEPNS_6PacketE+0x952)[0x5557a272d982]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem519AMDGPUMemoryManager12writeRequestEmPhiNS_5FlagsImEEPNS_5EventE+0x587)[0x5557a2ba6d77]
gem5/build/VEGA_X86/gem5.opt(_ZNSt17_Function_handlerIFvvEZN4gem511DmaCallback13getChunkEventEvEUlvE_E9_M_invokeERKSt9_Any_data+0x18)[0x5557a2998318]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0xc2)[0x5557a2280042]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x68)[0x5557a22a9458]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x283)[0x5557a22a9a53]
gem5/build/VEGA_X86/gem5.opt(+0x1c95220)[0x5557a206e220]
gem5/build/VEGA_X86/gem5.opt(+0xd7aab4)[0x5557a1153ab4]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x128023)[0x7f6cbdb28023]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyObject_Call+0x5c)[0x7f6cbdae1fec]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4b16)[0x7f6cbda76776]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(PyEval_EvalCode+0xbe)[0x7f6cbdbbd3de]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1bd96d)[0x7f6cbdbbd96d]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1287b3)[0x7f6cbdb287b3]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x69de)[0x7f6cbda7863e]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
gem5/build/VEGA_X86/gem5.opt(+0x1c7b507)[0x5557a2054507]
--- END LIBC BACKTRACE ---
For more info on how to address this issue, please visit
https://www.gem5.org/documentation/general_docs/common-errors/
Thank you!
Greetings,
Pau
[AMD Official Use Only - General]
Hi Pau,
The dgpu_mem_size parameter will only change the memory size for gem5 while the GPU driver uses an MMIO register value to determine the memory size. The issue you are seeing is the driver thinks there is still 16GB of memory and it attempts to write to the GPU page table which resides at the top of memory and grows down. To reduce the size for real you would need to modify the C code currently. I think this would not be too time consuming to implement if it’s a feature you want.
If you want to do this immediately, you will need to modify the file src/dev/amdgpu/amdgpu_device.cc. Around line 158, divide this value by two: ‘setRegVal(MI200_MEM_SIZE_REG, 0x3ff0 / 2);’ I am not sure if there is an equivalent register for Vega10 so you may have to use the MI200 config which is currently only on the develop branch and will be part of the v24-0 release. The 3ff0 value, by the way, is the memory size bit shifted right by 20 bits.
-Matt
From: Pau Galindo Figuerola via gem5-users gem5-users@gem5.org
Sent: Sunday, January 28, 2024 3:50 PM
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Pau Galindo Figuerola pau.galindo.figuerola@estudiantat.upc.edu
Subject: [gem5-users] Issues modifying parameters in GPU FS
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
Hi all,
I'm trying to change dgpu_mem_size from the default 16GB to 8GB. The system does boot properly but at the moment of executing the square program it aborts the execution.
I attach here the log of the error.
em5.opt: src/mem/ruby/system/RubyPort.cc:273: virtual bool gem5::ruby::RubyPort::MemResponsePort::recvTimingReq(gem5::PacketPtr): Assertion `owner.memRequestPort.isConnected()' failed.
Program aborted at tick 10851562163265
--- BEGIN LIBC BACKTRACE ---
gem5/build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5557a2269ea0]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5557a228fd3c]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6cbcc42520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f6cbcc969fc]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f6cbcc42476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f6cbcc287f3]
/lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7f6cbcc2871b]
/lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f6cbcc39e96]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem54ruby8RubyPort15MemResponsePort13recvTimingReqEPNS_6PacketE+0x952)[0x5557a272d982]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem519AMDGPUMemoryManager12writeRequestEmPhiNS_5FlagsImEEPNS_5EventE+0x587)[0x5557a2ba6d77]
gem5/build/VEGA_X86/gem5.opt(_ZNSt17_Function_handlerIFvvEZN4gem511DmaCallback13getChunkEventEvEUlvE_E9_M_invokeERKSt9_Any_data+0x18)[0x5557a2998318]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0xc2)[0x5557a2280042]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x68)[0x5557a22a9458]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x283)[0x5557a22a9a53]
gem5/build/VEGA_X86/gem5.opt(+0x1c95220)[0x5557a206e220]
gem5/build/VEGA_X86/gem5.opt(+0xd7aab4)[0x5557a1153ab4]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x128023)[0x7f6cbdb28023]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyObject_Call+0x5c)[0x7f6cbdae1fec]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4b16)[0x7f6cbda76776]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(PyEval_EvalCode+0xbe)[0x7f6cbdbbd3de]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1bd96d)[0x7f6cbdbbd96d]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1287b3)[0x7f6cbdb287b3]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x69de)[0x7f6cbda7863e]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
gem5/build/VEGA_X86/gem5.opt(+0x1c7b507)[0x5557a2054507]
--- END LIBC BACKTRACE ---
For more info on how to address this issue, please visit https://www.gem5.org/documentation/general_docs/common-errors/
Thank you!
Greetings,
Pau