gem5-users@gem5.org

The gem5 Users mailing list

View all threads

Issues modifying parameters in GPU FS

PG
Pau Galindo Figuerola
Sun, Jan 28, 2024 11:50 PM

Hi all,

I'm trying to change dgpu_mem_size from the default 16GB to 8GB. The
system does boot properly but at the moment of executing the square program
it aborts the execution.

I attach here the log of the error.

em5.opt: src/mem/ruby/system/RubyPort.cc:273: virtual bool
gem5::ruby::RubyPort::MemResponsePort::recvTimingReq(gem5::PacketPtr):
Assertion `owner.memRequestPort.isConnected()' failed.
Program aborted at tick 10851562163265
--- BEGIN LIBC BACKTRACE ---
gem5/build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5557a2269ea0]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5557a228fd3c]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6cbcc42520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f6cbcc969fc]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f6cbcc42476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f6cbcc287f3]
/lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7f6cbcc2871b]
/lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f6cbcc39e96]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem54ruby8RubyPort15MemResponsePort13recvTimingReqEPNS_6PacketE+0x952)[0x5557a272d982]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem519AMDGPUMemoryManager12writeRequestEmPhiNS_5FlagsImEEPNS_5EventE+0x587)[0x5557a2ba6d77]
gem5/build/VEGA_X86/gem5.opt(_ZNSt17_Function_handlerIFvvEZN4gem511DmaCallback13getChunkEventEvEUlvE_E9_M_invokeERKSt9_Any_data+0x18)[0x5557a2998318]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0xc2)[0x5557a2280042]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x68)[0x5557a22a9458]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x283)[0x5557a22a9a53]
gem5/build/VEGA_X86/gem5.opt(+0x1c95220)[0x5557a206e220]
gem5/build/VEGA_X86/gem5.opt(+0xd7aab4)[0x5557a1153ab4]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x128023)[0x7f6cbdb28023]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyObject_Call+0x5c)[0x7f6cbdae1fec]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4b16)[0x7f6cbda76776]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(PyEval_EvalCode+0xbe)[0x7f6cbdbbd3de]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1bd96d)[0x7f6cbdbbd96d]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1287b3)[0x7f6cbdb287b3]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x69de)[0x7f6cbda7863e]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
gem5/build/VEGA_X86/gem5.opt(+0x1c7b507)[0x5557a2054507]
--- END LIBC BACKTRACE ---
For more info on how to address this issue, please visit
https://www.gem5.org/documentation/general_docs/common-errors/

Thank you!

Greetings,
Pau

Hi all, I'm trying to change *dgpu_mem_size* from the default 16GB to 8GB. The system does boot properly but at the moment of executing the *square* program it aborts the execution. I attach here the log of the error. em5.opt: src/mem/ruby/system/RubyPort.cc:273: virtual bool gem5::ruby::RubyPort::MemResponsePort::recvTimingReq(gem5::PacketPtr): Assertion `owner.memRequestPort.isConnected()' failed. Program aborted at tick 10851562163265 --- BEGIN LIBC BACKTRACE --- gem5/build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5557a2269ea0] gem5/build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5557a228fd3c] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6cbcc42520] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f6cbcc969fc] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f6cbcc42476] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f6cbcc287f3] /lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7f6cbcc2871b] /lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f6cbcc39e96] gem5/build/VEGA_X86/gem5.opt(_ZN4gem54ruby8RubyPort15MemResponsePort13recvTimingReqEPNS_6PacketE+0x952)[0x5557a272d982] gem5/build/VEGA_X86/gem5.opt(_ZN4gem519AMDGPUMemoryManager12writeRequestEmPhiNS_5FlagsImEEPNS_5EventE+0x587)[0x5557a2ba6d77] gem5/build/VEGA_X86/gem5.opt(_ZNSt17_Function_handlerIFvvEZN4gem511DmaCallback13getChunkEventEvEUlvE_E9_M_invokeERKSt9_Any_data+0x18)[0x5557a2998318] gem5/build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0xc2)[0x5557a2280042] gem5/build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x68)[0x5557a22a9458] gem5/build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x283)[0x5557a22a9a53] gem5/build/VEGA_X86/gem5.opt(+0x1c95220)[0x5557a206e220] gem5/build/VEGA_X86/gem5.opt(+0xd7aab4)[0x5557a1153ab4] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x128023)[0x7f6cbdb28023] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyObject_Call+0x5c)[0x7f6cbdae1fec] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4b16)[0x7f6cbda76776] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(PyEval_EvalCode+0xbe)[0x7f6cbdbbd3de] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1bd96d)[0x7f6cbdbbd96d] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1287b3)[0x7f6cbdb287b3] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x69de)[0x7f6cbda7863e] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af] gem5/build/VEGA_X86/gem5.opt(+0x1c7b507)[0x5557a2054507] --- END LIBC BACKTRACE --- For more info on how to address this issue, please visit https://www.gem5.org/documentation/general_docs/common-errors/ Thank you! Greetings, Pau
PM
Poremba, Matthew
Mon, Jan 29, 2024 6:00 PM

[AMD Official Use Only - General]

Hi Pau,

The dgpu_mem_size parameter will only change the memory size for gem5 while the GPU driver uses an MMIO register value to determine the memory size.  The issue you are seeing is the driver thinks there is still 16GB of memory and it attempts to write to the GPU page table which resides at the top of memory and grows down. To reduce the size for real you would need to modify the C code currently.  I think this would not be too time consuming to implement if it’s a feature you want.

If you want to do this immediately, you will need to modify the file src/dev/amdgpu/amdgpu_device.cc.  Around line 158, divide this value by two:  ‘setRegVal(MI200_MEM_SIZE_REG, 0x3ff0 / 2);’ I am not sure if there is an equivalent register for Vega10 so you may have to use the MI200 config which is currently only on the develop branch and will be part of the v24-0 release.  The 3ff0 value, by the way, is the memory size bit shifted right by 20 bits.

-Matt

From: Pau Galindo Figuerola via gem5-users gem5-users@gem5.org
Sent: Sunday, January 28, 2024 3:50 PM
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Pau Galindo Figuerola pau.galindo.figuerola@estudiantat.upc.edu
Subject: [gem5-users] Issues modifying parameters in GPU FS

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

Hi all,

I'm trying to change dgpu_mem_size from the default 16GB to 8GB. The system does boot properly but at the moment of executing the square program it aborts the execution.

I attach here the log of the error.

em5.opt: src/mem/ruby/system/RubyPort.cc:273: virtual bool gem5::ruby::RubyPort::MemResponsePort::recvTimingReq(gem5::PacketPtr): Assertion `owner.memRequestPort.isConnected()' failed.
Program aborted at tick 10851562163265
--- BEGIN LIBC BACKTRACE ---
gem5/build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5557a2269ea0]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5557a228fd3c]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6cbcc42520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f6cbcc969fc]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f6cbcc42476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f6cbcc287f3]
/lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7f6cbcc2871b]
/lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f6cbcc39e96]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem54ruby8RubyPort15MemResponsePort13recvTimingReqEPNS_6PacketE+0x952)[0x5557a272d982]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem519AMDGPUMemoryManager12writeRequestEmPhiNS_5FlagsImEEPNS_5EventE+0x587)[0x5557a2ba6d77]
gem5/build/VEGA_X86/gem5.opt(_ZNSt17_Function_handlerIFvvEZN4gem511DmaCallback13getChunkEventEvEUlvE_E9_M_invokeERKSt9_Any_data+0x18)[0x5557a2998318]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0xc2)[0x5557a2280042]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x68)[0x5557a22a9458]
gem5/build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x283)[0x5557a22a9a53]
gem5/build/VEGA_X86/gem5.opt(+0x1c95220)[0x5557a206e220]
gem5/build/VEGA_X86/gem5.opt(+0xd7aab4)[0x5557a1153ab4]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x128023)[0x7f6cbdb28023]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyObject_Call+0x5c)[0x7f6cbdae1fec]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4b16)[0x7f6cbda76776]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(PyEval_EvalCode+0xbe)[0x7f6cbdbbd3de]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1bd96d)[0x7f6cbdbbd96d]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1287b3)[0x7f6cbdb287b3]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x69de)[0x7f6cbda7863e]
/lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af]
gem5/build/VEGA_X86/gem5.opt(+0x1c7b507)[0x5557a2054507]
--- END LIBC BACKTRACE ---
For more info on how to address this issue, please visit https://www.gem5.org/documentation/general_docs/common-errors/

Thank you!

Greetings,
Pau

[AMD Official Use Only - General] Hi Pau, The dgpu_mem_size parameter will only change the memory size for gem5 while the GPU driver uses an MMIO register value to determine the memory size. The issue you are seeing is the driver thinks there is still 16GB of memory and it attempts to write to the GPU page table which resides at the top of memory and grows down. To reduce the size for real you would need to modify the C code currently. I think this would not be too time consuming to implement if it’s a feature you want. If you want to do this immediately, you will need to modify the file src/dev/amdgpu/amdgpu_device.cc. Around line 158, divide this value by two: ‘setRegVal(MI200_MEM_SIZE_REG, 0x3ff0 / 2);’ I am not sure if there is an equivalent register for Vega10 so you may have to use the MI200 config which is currently only on the develop branch and will be part of the v24-0 release. The 3ff0 value, by the way, is the memory size bit shifted right by 20 bits. -Matt From: Pau Galindo Figuerola via gem5-users <gem5-users@gem5.org> Sent: Sunday, January 28, 2024 3:50 PM To: The gem5 Users mailing list <gem5-users@gem5.org> Cc: Pau Galindo Figuerola <pau.galindo.figuerola@estudiantat.upc.edu> Subject: [gem5-users] Issues modifying parameters in GPU FS Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Hi all, I'm trying to change dgpu_mem_size from the default 16GB to 8GB. The system does boot properly but at the moment of executing the square program it aborts the execution. I attach here the log of the error. em5.opt: src/mem/ruby/system/RubyPort.cc:273: virtual bool gem5::ruby::RubyPort::MemResponsePort::recvTimingReq(gem5::PacketPtr): Assertion `owner.memRequestPort.isConnected()' failed. Program aborted at tick 10851562163265 --- BEGIN LIBC BACKTRACE --- gem5/build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5557a2269ea0] gem5/build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5557a228fd3c] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6cbcc42520] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f6cbcc969fc] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f6cbcc42476] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f6cbcc287f3] /lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7f6cbcc2871b] /lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f6cbcc39e96] gem5/build/VEGA_X86/gem5.opt(_ZN4gem54ruby8RubyPort15MemResponsePort13recvTimingReqEPNS_6PacketE+0x952)[0x5557a272d982] gem5/build/VEGA_X86/gem5.opt(_ZN4gem519AMDGPUMemoryManager12writeRequestEmPhiNS_5FlagsImEEPNS_5EventE+0x587)[0x5557a2ba6d77] gem5/build/VEGA_X86/gem5.opt(_ZNSt17_Function_handlerIFvvEZN4gem511DmaCallback13getChunkEventEvEUlvE_E9_M_invokeERKSt9_Any_data+0x18)[0x5557a2998318] gem5/build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0xc2)[0x5557a2280042] gem5/build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x68)[0x5557a22a9458] gem5/build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x283)[0x5557a22a9a53] gem5/build/VEGA_X86/gem5.opt(+0x1c95220)[0x5557a206e220] gem5/build/VEGA_X86/gem5.opt(+0xd7aab4)[0x5557a1153ab4] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x128023)[0x7f6cbdb28023] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyObject_Call+0x5c)[0x7f6cbdae1fec] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4b16)[0x7f6cbda76776] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x9d68)[0x7f6cbda7b9c8] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(PyEval_EvalCode+0xbe)[0x7f6cbdbbd3de] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1bd96d)[0x7f6cbdbbd96d] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1287b3)[0x7f6cbdb287b3] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x69de)[0x7f6cbda7863e] /lib/x86_64-linux-gnu/libpython3.10.so.1.0(+0x1c23af)[0x7f6cbdbc23af] gem5/build/VEGA_X86/gem5.opt(+0x1c7b507)[0x5557a2054507] --- END LIBC BACKTRACE --- For more info on how to address this issue, please visit https://www.gem5.org/documentation/general_docs/common-errors/ Thank you! Greetings, Pau