Empathy List Archives

NS

Nazmus Sakib

Mon, Feb 5, 2024 3:41 PM

Hello.
I was trying to find how the virtual (logical) addresses are calculated and passed on to cpu.
In the load/store queee, after a request object is created, then the corresponding instruction is assigned a effective address from this request object, something like inst->effaddr=req->getVirt().I found setVirt(), the set virtual address function. But I cannot find who calls this setVirt() and where.

For example: ldr x0, [x1,#1024] // an ARM instruction
Here, the address would be x1+1024. So the content of x1 register plus immediate 1024.
How and where would this address calculation take place ? Where can I see the contents of x1 register is added with 1024 ? and who would call the setVirt() function ?
As I understand, address calculation is ISA specific, and the dynamic/static instruction classes works with ISA files to get this done. I wanted to know how this works, the interface for connecting ISA features to cpu pipeline.

Hello. I was trying to find how the virtual (logical) addresses are calculated and passed on to cpu. In the load/store queee, after a request object is created, then the corresponding instruction is assigned a effective address from this request object, something like inst->effaddr=req->getVirt().I found setVirt(), the set virtual address function. But I cannot find who calls this setVirt() and where. For example: ldr x0, [x1,#1024] // an ARM instruction Here, the address would be x1+1024. So the content of x1 register plus immediate 1024. How and where would this address calculation take place ? Where can I see the contents of x1 register is added with 1024 ? and who would call the setVirt() function ? As I understand, address calculation is ISA specific, and the dynamic/static instruction classes works with ISA files to get this done. I wanted to know how this works, the interface for connecting ISA features to cpu pipeline.

EM

Eliot Moss

Mon, Feb 5, 2024 4:47 PM

On 2/5/2024 10:41 AM, Nazmus Sakib via gem5-users wrote:

Hello.
I was trying to find how the virtual (logical) addresses are calculated and passed on to cpu.
In the load/store queee, after a request object is created, then the corresponding instruction is assigned a effective
address from this request object, something like inst->effaddr=req->getVirt().I found setVirt(), the set virtual address
function. But I cannot find who calls this setVirt() and where.

For example: ldr x0, [x1,#1024] // an ARM instruction
Here, the address would be x1+1024. So the content of x1 register plus immediate 1024.
How and where would this address calculation take place ? Where can I see the contents of x1 register is added with 1024
? and who would call the setVirt() function ?
As I understand, address calculation is ISA specific, and the dynamic/static instruction classes works with ISA files to
get this done. I wanted to know how this works, the interface for connecting ISA features to cpu pipeline.

src/arc/arm/insts/macromem.isa has definitions of micro-ops used for
memory instructions. In there you can find some of the effective
address calculation code being generated (look for eaCode and EA).
See also the instruction templates in src/arc/arm/isa/templates/mem.isa.

These isa files are processed by a custom macro processor to generate
the actual decoding, execution, etc., functions, which you can find
in the build hierarchy.

The whole construction is somewhat complex, but I hope I've answered
your question. Was it just a point of curiosity, or is there something
specific you're trying to do?

Eliot Moss

On 2/5/2024 10:41 AM, Nazmus Sakib via gem5-users wrote: > Hello. > I was trying to find how the virtual (logical) addresses are calculated and passed on to cpu. > In the load/store queee, after a request object is created, then the corresponding instruction is assigned a effective > address from this request object, something like inst->effaddr=req->getVirt().I found setVirt(), the set virtual address > function. But I cannot find who calls this setVirt() and where. > > For example: ldr x0, [x1,#1024] // an ARM instruction > Here, the address would be x1+1024. So the content of x1 register plus immediate 1024. > How and where would this address calculation take place ? Where can I see the contents of x1 register is added with 1024 > ? and who would call the setVirt() function ? > As I understand, address calculation is ISA specific, and the dynamic/static instruction classes works with ISA files to > get this done. I wanted to know how this works, the interface for connecting ISA features to cpu pipeline. src/arc/arm/insts/macromem.isa has definitions of micro-ops used for memory instructions. In there you can find some of the effective address calculation code being generated (look for eaCode and EA). See also the instruction templates in src/arc/arm/isa/templates/mem.isa. These isa files are processed by a custom macro processor to generate the actual decoding, execution, etc., functions, which you can find in the build hierarchy. The whole construction is somewhat complex, but I hope I've answered your question. Was it just a point of curiosity, or is there something specific you're trying to do? Eliot Moss

NS

Nazmus Sakib

Mon, Feb 5, 2024 6:39 PM

I am trying to see how small I can set the cacheline size (gem5 ARM, test binary is aarch64)
When I set it to 4 bytes, I get a page fault for address 0x400c00. By going through bunch of debugging (using print of my own and debug flags), I think the problem is, when trying to generate address 0x400c00, it is only generating 0x400, and since it is not the VMA list the fixfault() function cannot assign new page, nor can the page table lookip() access the already assigned page.
I am guessing somehow the MSB of 0x400c00 are lost in address generation. So I was trying to look where this is happening and why.
I know 4 byte cacheline is unrealistic, and also I am running 64 bit binary, but I want to find the exact reason for this page fault (I might also be missing some basic understanding of Compter System theory).
Note: cacheline size=8 (byte) works fine !!

From: Eliot Moss moss@cs.umass.edu
Sent: 05 February 2024 09:47
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Nazmus Sakib nsakib6@nmsu.edu
Subject: Re: [gem5-users] Effective address and ISA

[You don't often get email from moss@cs.umass.edu. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you are sure the content is safe.

On 2/5/2024 10:41 AM, Nazmus Sakib via gem5-users wrote:

Hello.
I was trying to find how the virtual (logical) addresses are calculated and passed on to cpu.
In the load/store queee, after a request object is created, then the corresponding instruction is assigned a effective
address from this request object, something like inst->effaddr=req->getVirt().I found setVirt(), the set virtual address
function. But I cannot find who calls this setVirt() and where.

For example: ldr x0, [x1,#1024] // an ARM instruction
Here, the address would be x1+1024. So the content of x1 register plus immediate 1024.
How and where would this address calculation take place ? Where can I see the contents of x1 register is added with 1024
? and who would call the setVirt() function ?
As I understand, address calculation is ISA specific, and the dynamic/static instruction classes works with ISA files to
get this done. I wanted to know how this works, the interface for connecting ISA features to cpu pipeline.

src/arc/arm/insts/macromem.isa has definitions of micro-ops used for
memory instructions. In there you can find some of the effective
address calculation code being generated (look for eaCode and EA).
See also the instruction templates in src/arc/arm/isa/templates/mem.isa.

These isa files are processed by a custom macro processor to generate
the actual decoding, execution, etc., functions, which you can find
in the build hierarchy.

The whole construction is somewhat complex, but I hope I've answered
your question. Was it just a point of curiosity, or is there something
specific you're trying to do?

Eliot Moss

I am trying to see how small I can set the cacheline size (gem5 ARM, test binary is aarch64) When I set it to 4 bytes, I get a page fault for address 0x400c00. By going through bunch of debugging (using print of my own and debug flags), I think the problem is, when trying to generate address 0x400c00, it is only generating 0x400, and since it is not the VMA list the fixfault() function cannot assign new page, nor can the page table lookip() access the already assigned page. I am guessing somehow the MSB of 0x400c00 are lost in address generation. So I was trying to look where this is happening and why. I know 4 byte cacheline is unrealistic, and also I am running 64 bit binary, but I want to find the exact reason for this page fault (I might also be missing some basic understanding of Compter System theory). Note: cacheline size=8 (byte) works fine !! ________________________________ From: Eliot Moss <moss@cs.umass.edu> Sent: 05 February 2024 09:47 To: The gem5 Users mailing list <gem5-users@gem5.org> Cc: Nazmus Sakib <nsakib6@nmsu.edu> Subject: Re: [gem5-users] Effective address and ISA [You don't often get email from moss@cs.umass.edu. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you are sure the content is safe. On 2/5/2024 10:41 AM, Nazmus Sakib via gem5-users wrote: > Hello. > I was trying to find how the virtual (logical) addresses are calculated and passed on to cpu. > In the load/store queee, after a request object is created, then the corresponding instruction is assigned a effective > address from this request object, something like inst->effaddr=req->getVirt().I found setVirt(), the set virtual address > function. But I cannot find who calls this setVirt() and where. > > For example: ldr x0, [x1,#1024] // an ARM instruction > Here, the address would be x1+1024. So the content of x1 register plus immediate 1024. > How and where would this address calculation take place ? Where can I see the contents of x1 register is added with 1024 > ? and who would call the setVirt() function ? > As I understand, address calculation is ISA specific, and the dynamic/static instruction classes works with ISA files to > get this done. I wanted to know how this works, the interface for connecting ISA features to cpu pipeline. src/arc/arm/insts/macromem.isa has definitions of micro-ops used for memory instructions. In there you can find some of the effective address calculation code being generated (look for eaCode and EA). See also the instruction templates in src/arc/arm/isa/templates/mem.isa. These isa files are processed by a custom macro processor to generate the actual decoding, execution, etc., functions, which you can find in the build hierarchy. The whole construction is somewhat complex, but I hope I've answered your question. Was it just a point of curiosity, or is there something specific you're trying to do? Eliot Moss

EM

Eliot Moss

Mon, Feb 5, 2024 6:46 PM

On 2/5/2024 1:39 PM, Nazmus Sakib wrote:

I am trying to see how small I can set the cacheline size (gem5 ARM, test binary is aarch64)
When I set it to 4 bytes, I get a page fault for address 0x400c00. By going through bunch of debugging (using print of
my own and debug flags), I think the problem is, when trying to generate address 0x400c00, it is only generating 0x400,
and since it is not the VMA list the fixfault() function cannot assign new page, nor can the page table lookip() access
the already assigned page.
I am guessing somehow the MSB of 0x400c00 are lost in address generation. So I was trying to look where this is
happening and why.
I know 4 byte cacheline is unrealistic, and also I am running 64 bit binary, but I want to find the exact reason for
this page fault (I might also be missing some basic understanding of Compter System theory).
Note: cacheline size=8 (byte) works fine !!

From: Eliot Moss moss@cs.umass.edu
Sent: 05 February 2024 09:47
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Nazmus Sakib nsakib6@nmsu.edu
Subject: Re: [gem5-users] Effective address and ISA
[You don't often get email from moss@cs.umass.edu. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification https://aka.ms/LearnAboutSenderIdentification ]

WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you
are sure the content is safe.

On 2/5/2024 10:41 AM, Nazmus Sakib via gem5-users wrote:

Hello.
I was trying to find how the virtual (logical) addresses are calculated and passed on to cpu.
In the load/store queee, after a request object is created, then the corresponding instruction is assigned a effective
address from this request object, something like inst->effaddr=req->getVirt().I found setVirt(), the set virtual address
function. But I cannot find who calls this setVirt() and where.

For example: ldr x0, [x1,#1024] // an ARM instruction
Here, the address would be x1+1024. So the content of x1 register plus immediate 1024.
How and where would this address calculation take place ? Where can I see the contents of x1 register is added with 1024
? and who would call the setVirt() function ?
As I understand, address calculation is ISA specific, and the dynamic/static instruction classes works with ISA files to
get this done. I wanted to know how this works, the interface for connecting ISA features to cpu pipeline.

src/arc/arm/insts/macromem.isa has definitions of micro-ops used for
memory instructions. In there you can find some of the effective
address calculation code being generated (look for eaCode and EA).
See also the instruction templates in src/arc/arm/isa/templates/mem.isa.

These isa files are processed by a custom macro processor to generate
the actual decoding, execution, etc., functions, which you can find
in the build hierarchy.

The whole construction is somewhat complex, but I hope I've answered
your question. Was it just a point of curiosity, or is there something
specific you're trying to do?

Eliot Moss

A guess would be that the code is not set up to expect an aligned 8 byte
quantity might break across cache lines. To make that work, the 8 byte
access would have to be broken into two 4 byte accesses, since each can
miss separately. It would likely take deeper changes to make that work,
though I would think it is possible with concomitant effort.

When you say it is generating 0x400, is that as the whole address? It
looks suspiciously like the page number (i.e., shift right by 12 bits).
But anyway, as I mentioned, a cache line size of 4 bytes has other
problems with it and that may somehow be leading to the behavior you see.

EM

On 2/5/2024 1:39 PM, Nazmus Sakib wrote: > I am trying to see how small I can set the cacheline size (gem5 ARM, test binary is aarch64) > When I set it to 4 bytes, I get a page fault for address 0x400c00. By going through bunch of debugging (using print of > my own and debug flags), I think the problem is, when trying to generate address 0x400c00, it is only generating 0x400, > and since it is not the VMA list the fixfault() function cannot assign new page, nor can the page table lookip() access > the already assigned page. > I am guessing somehow the MSB of 0x400c00 are lost in address generation. So I was trying to look where this is > happening and why. > I know 4 byte cacheline is unrealistic, and also I am running 64 bit binary, but I want to find the exact reason for > this page fault (I might also be missing some basic understanding of Compter System theory). > Note: cacheline size=8 (byte) works fine !! > > ------------------------------------------------------------------------------------------------------------------------ > *From:* Eliot Moss <moss@cs.umass.edu> > *Sent:* 05 February 2024 09:47 > *To:* The gem5 Users mailing list <gem5-users@gem5.org> > *Cc:* Nazmus Sakib <nsakib6@nmsu.edu> > *Subject:* Re: [gem5-users] Effective address and ISA > [You don't often get email from moss@cs.umass.edu. Learn why this is important at > https://aka.ms/LearnAboutSenderIdentification <https://aka.ms/LearnAboutSenderIdentification> ] > > WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you > are sure the content is safe. > > On 2/5/2024 10:41 AM, Nazmus Sakib via gem5-users wrote: >> Hello. >> I was trying to find how the virtual (logical) addresses are calculated and passed on to cpu. >> In the load/store queee, after a request object is created, then the corresponding instruction is assigned a effective >> address from this request object, something like inst->effaddr=req->getVirt().I found setVirt(), the set virtual address >> function. But I cannot find who calls this setVirt() and where. >> >> For example: ldr x0, [x1,#1024] // an ARM instruction >> Here, the address would be x1+1024. So the content of x1 register plus immediate 1024. >> How and where would this address calculation take place ? Where can I see the contents of x1 register is added with 1024 >> ? and who would call the setVirt() function ? >> As I understand, address calculation is ISA specific, and the dynamic/static instruction classes works with ISA files to >> get this done. I wanted to know how this works, the interface for connecting ISA features to cpu pipeline. > > src/arc/arm/insts/macromem.isa has definitions of micro-ops used for > memory instructions. In there you can find some of the effective > address calculation code being generated (look for eaCode and EA). > See also the instruction templates in src/arc/arm/isa/templates/mem.isa. > > These isa files are processed by a custom macro processor to generate > the actual decoding, execution, etc., functions, which you can find > in the build hierarchy. > > The whole construction is somewhat complex, but I hope I've answered > your question. Was it just a point of curiosity, or is there something > specific you're trying to do? > > Eliot Moss A guess would be that the code is not set up to expect an aligned 8 byte quantity might break across cache lines. To make that work, the 8 byte access would have to be broken into two 4 byte accesses, since each can miss separately. It would likely take deeper changes to make that work, though I would think it is possible with concomitant effort. When you say it is generating 0x400, is that as the whole address? It looks suspiciously like the page number (i.e., shift right by 12 bits). But anyway, as I mentioned, a cache line size of 4 bytes has other problems with it and that may somehow be leading to the behavior you see. EM

NS

Nazmus Sakib

Tue, Feb 6, 2024 4:13 PM

I think gem5 has this SplitDataRequest() method that breaks the request if it would need more than one cacheline.
In fact, the page fault is occurring before it goes to the cache. The panic message says the address is 0x400. By looking into the disassembly and the output log of -debug-flag=ExecAll, I think the address is an instruction address, as I have found addresses starting with 0x400, for example :
0x400bf8 @__libc_start_main+808 : movz x1, #0,

Although, the last instruction I see in the output from ExecAll flag is a store instruction:
0x41c0ec @_dl_debug_initialize+124 : stlr x0, [x3] : MemWrite : D=0x0000000000492000 A=0x498028
Right after this, the panic message occurs.
In fact, using debug-flag=LSQUnit, I can see the message that "Fault on store pc" which points to this store instruction.

From: Eliot Moss moss@cs.umass.edu
Sent: 05 February 2024 11:46
To: Nazmus Sakib nsakib6@nmsu.edu; The gem5 Users mailing list gem5-users@gem5.org
Subject: Re: [gem5-users] Effective address and ISA

[You don't often get email from moss@cs.umass.edu. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you are sure the content is safe.

On 2/5/2024 1:39 PM, Nazmus Sakib wrote:

I am trying to see how small I can set the cacheline size (gem5 ARM, test binary is aarch64)
When I set it to 4 bytes, I get a page fault for address 0x400c00. By going through bunch of debugging (using print of
my own and debug flags), I think the problem is, when trying to generate address 0x400c00, it is only generating 0x400,
and since it is not the VMA list the fixfault() function cannot assign new page, nor can the page table lookip() access
the already assigned page.
I am guessing somehow the MSB of 0x400c00 are lost in address generation. So I was trying to look where this is
happening and why.
I know 4 byte cacheline is unrealistic, and also I am running 64 bit binary, but I want to find the exact reason for
this page fault (I might also be missing some basic understanding of Compter System theory).
Note: cacheline size=8 (byte) works fine !!

From: Eliot Moss moss@cs.umass.edu
Sent: 05 February 2024 09:47
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Nazmus Sakib nsakib6@nmsu.edu
Subject: Re: [gem5-users] Effective address and ISA
[You don't often get email from moss@cs.umass.edu. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification https://aka.ms/LearnAboutSenderIdentification ]

WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you
are sure the content is safe.

On 2/5/2024 10:41 AM, Nazmus Sakib via gem5-users wrote:

Hello.
I was trying to find how the virtual (logical) addresses are calculated and passed on to cpu.
In the load/store queee, after a request object is created, then the corresponding instruction is assigned a effective
address from this request object, something like inst->effaddr=req->getVirt().I found setVirt(), the set virtual address
function. But I cannot find who calls this setVirt() and where.

For example: ldr x0, [x1,#1024] // an ARM instruction
Here, the address would be x1+1024. So the content of x1 register plus immediate 1024.
How and where would this address calculation take place ? Where can I see the contents of x1 register is added with 1024
? and who would call the setVirt() function ?
As I understand, address calculation is ISA specific, and the dynamic/static instruction classes works with ISA files to
get this done. I wanted to know how this works, the interface for connecting ISA features to cpu pipeline.

src/arc/arm/insts/macromem.isa has definitions of micro-ops used for
memory instructions. In there you can find some of the effective
address calculation code being generated (look for eaCode and EA).
See also the instruction templates in src/arc/arm/isa/templates/mem.isa.

These isa files are processed by a custom macro processor to generate
the actual decoding, execution, etc., functions, which you can find
in the build hierarchy.

The whole construction is somewhat complex, but I hope I've answered
your question. Was it just a point of curiosity, or is there something
specific you're trying to do?

Eliot Moss

A guess would be that the code is not set up to expect an aligned 8 byte
quantity might break across cache lines. To make that work, the 8 byte
access would have to be broken into two 4 byte accesses, since each can
miss separately. It would likely take deeper changes to make that work,
though I would think it is possible with concomitant effort.

When you say it is generating 0x400, is that as the whole address? It
looks suspiciously like the page number (i.e., shift right by 12 bits).
But anyway, as I mentioned, a cache line size of 4 bytes has other
problems with it and that may somehow be leading to the behavior you see.

EM

I think gem5 has this SplitDataRequest() method that breaks the request if it would need more than one cacheline. In fact, the page fault is occurring before it goes to the cache. The panic message says the address is 0x400. By looking into the disassembly and the output log of -debug-flag=ExecAll, I think the address is an instruction address, as I have found addresses starting with 0x400, for example : 0x400bf8 @__libc_start_main+808 : movz x1, #0, Although, the last instruction I see in the output from ExecAll flag is a store instruction: 0x41c0ec @_dl_debug_initialize+124 : stlr x0, [x3] : MemWrite : D=0x0000000000492000 A=0x498028 Right after this, the panic message occurs. In fact, using debug-flag=LSQUnit, I can see the message that "Fault on store pc" which points to this store instruction. ________________________________ From: Eliot Moss <moss@cs.umass.edu> Sent: 05 February 2024 11:46 To: Nazmus Sakib <nsakib6@nmsu.edu>; The gem5 Users mailing list <gem5-users@gem5.org> Subject: Re: [gem5-users] Effective address and ISA [You don't often get email from moss@cs.umass.edu. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you are sure the content is safe. On 2/5/2024 1:39 PM, Nazmus Sakib wrote: > I am trying to see how small I can set the cacheline size (gem5 ARM, test binary is aarch64) > When I set it to 4 bytes, I get a page fault for address 0x400c00. By going through bunch of debugging (using print of > my own and debug flags), I think the problem is, when trying to generate address 0x400c00, it is only generating 0x400, > and since it is not the VMA list the fixfault() function cannot assign new page, nor can the page table lookip() access > the already assigned page. > I am guessing somehow the MSB of 0x400c00 are lost in address generation. So I was trying to look where this is > happening and why. > I know 4 byte cacheline is unrealistic, and also I am running 64 bit binary, but I want to find the exact reason for > this page fault (I might also be missing some basic understanding of Compter System theory). > Note: cacheline size=8 (byte) works fine !! > > ------------------------------------------------------------------------------------------------------------------------ > *From:* Eliot Moss <moss@cs.umass.edu> > *Sent:* 05 February 2024 09:47 > *To:* The gem5 Users mailing list <gem5-users@gem5.org> > *Cc:* Nazmus Sakib <nsakib6@nmsu.edu> > *Subject:* Re: [gem5-users] Effective address and ISA > [You don't often get email from moss@cs.umass.edu. Learn why this is important at > https://aka.ms/LearnAboutSenderIdentification <https://aka.ms/LearnAboutSenderIdentification> ] > > WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you > are sure the content is safe. > > On 2/5/2024 10:41 AM, Nazmus Sakib via gem5-users wrote: >> Hello. >> I was trying to find how the virtual (logical) addresses are calculated and passed on to cpu. >> In the load/store queee, after a request object is created, then the corresponding instruction is assigned a effective >> address from this request object, something like inst->effaddr=req->getVirt().I found setVirt(), the set virtual address >> function. But I cannot find who calls this setVirt() and where. >> >> For example: ldr x0, [x1,#1024] // an ARM instruction >> Here, the address would be x1+1024. So the content of x1 register plus immediate 1024. >> How and where would this address calculation take place ? Where can I see the contents of x1 register is added with 1024 >> ? and who would call the setVirt() function ? >> As I understand, address calculation is ISA specific, and the dynamic/static instruction classes works with ISA files to >> get this done. I wanted to know how this works, the interface for connecting ISA features to cpu pipeline. > > src/arc/arm/insts/macromem.isa has definitions of micro-ops used for > memory instructions. In there you can find some of the effective > address calculation code being generated (look for eaCode and EA). > See also the instruction templates in src/arc/arm/isa/templates/mem.isa. > > These isa files are processed by a custom macro processor to generate > the actual decoding, execution, etc., functions, which you can find > in the build hierarchy. > > The whole construction is somewhat complex, but I hope I've answered > your question. Was it just a point of curiosity, or is there something > specific you're trying to do? > > Eliot Moss A guess would be that the code is not set up to expect an aligned 8 byte quantity might break across cache lines. To make that work, the 8 byte access would have to be broken into two 4 byte accesses, since each can miss separately. It would likely take deeper changes to make that work, though I would think it is possible with concomitant effort. When you say it is generating 0x400, is that as the whole address? It looks suspiciously like the page number (i.e., shift right by 12 bits). But anyway, as I mentioned, a cache line size of 4 bytes has other problems with it and that may somehow be leading to the behavior you see. EM

EM

Eliot Moss

Tue, Feb 6, 2024 5:38 PM

On 2/6/2024 11:13 AM, Nazmus Sakib via gem5-users wrote:

I think gem5 has this SplitDataRequest() method that breaks the request if it would need more than one cacheline.
In fact, the page fault is occurring before it goes to the cache. The panic message says the address is 0x400. By
looking into the disassembly and the output log of -debug-flag=ExecAll, I think the address is an instruction address,
as I have found addresses starting with 0x400, for example :
0x400bf8 @__libc_start_main+808 : movz x1, #0,

Although, the last instruction I see in the output from ExecAll flag is a store instruction:
0x41c0ec @_dl_debug_initialize+124 : stlr x0, [x3] : MemWrite : D=0x0000000000492000 A=0x498028
Right after this, the panic message occurs.
In fact, using debug-flag=LSQUnit, I can see the message that "Fault on store pc" which points to this store instruction.

Yes, it can split requests; I'm just not sure it's prepared
to do in this case.

Since you have mentioned instructions, I'm now wondering if there
could be an issue with really small instruction cache lines.

The store might be faulting because of address translation, but it
might also be faulting because of ordering constraints (it's a
store release instruction). It is 64-bit, which means it would
cross cache lines. Maybe that's disallowed for such ordering
ops?

EM

On 2/6/2024 11:13 AM, Nazmus Sakib via gem5-users wrote: > I think gem5 has this SplitDataRequest() method that breaks the request if it would need more than one cacheline. > In fact, the page fault is occurring before it goes to the cache. The panic message says the address is 0x400. By > looking into the disassembly and the output log of -debug-flag=ExecAll, I think the address is an instruction address, > as I have found addresses starting with 0x400, for example : > 0x400bf8 @__libc_start_main+808 : movz x1, #0, > > Although, the last instruction I see in the output from ExecAll flag is a store instruction: > 0x41c0ec @_dl_debug_initialize+124 : stlr x0, [x3] : MemWrite : D=0x0000000000492000 *A=0x498028* > Right after this, the panic message occurs. > In fact, using debug-flag=LSQUnit, I can see the message that "Fault on store pc" which points to this store instruction. Yes, it *can* split requests; I'm just not sure it's prepared to do in this case. Since you have mentioned instructions, I'm now wondering if there could be an issue with really small instruction cache lines. The store might be faulting because of address translation, but it might also be faulting because of ordering constraints (it's a store release instruction). It is 64-bit, which means it would cross cache lines. Maybe that's disallowed for such ordering ops? EM

NS

Nazmus Sakib

Tue, Feb 6, 2024 6:10 PM

So you are saying, it is not the address that the store instruction is supposed to store the value, but rather the address of the instruction itself ? The left-most boldface below:

0x41c0ec @_dl_debug_initialize+124 : stlr x0, [x3] : MemWrite : D=0x0000000000492000 *A=0x498028
That is, the program counter, when trying to fetch 0x41c0ec, could not find this address in one cacheline ?
But, it was able to execute some previous instructions, and if cacheline size of instruction cache was the issue, the program should have terminated way before it reached this instruction.
Here are some debug info from debug flag=LSQunit
122942000: system.cpu.iew.lsq.thread0: Inserting store PC (0x41c0ec=>0x41c0f0).(0=>1), idx:933 [sn:6577]
122942500: system.cpu.iew.lsq.thread0: Executing store PC (0x41c0ec=>0x41c0f0).(0=>1) [sn:6577]
122942500: system.cpu.iew.lsq.thread0: Fault on Store PC (0x41c0ec=>0x41c0f0).(0=>1), [sn:6577], Size = 0
Then I see the squashing messages and right after that the page fault messages appear.

I looked into page table, and it receives the addres 0x400, cannot find page for this (and consequently the physical address). Then the fixfault() method in mem_state.cc is called, which cannot assign a new page, I am guessing because it is not in the VMA list.
Thus I was thinking, maybe it is an address generation issue. I looked into (just skimmed through) arch/arm/insts/macromem.cc and mem.cc, it does have separate conditions for 32 bit and 64 bit (although the EA is not here, the are in ISA files). My best guess is, 0x400 is a partial address, the later parts of this are lost because of 64/32 bit issue.

From: Eliot Moss moss@cs.umass.edu
Sent: 06 February 2024 10:38
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Nazmus Sakib nsakib6@nmsu.edu
Subject: Re: [gem5-users] Re: Effective address and ISA

[You don't often get email from moss@cs.umass.edu. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you are sure the content is safe.

On 2/6/2024 11:13 AM, Nazmus Sakib via gem5-users wrote:

I think gem5 has this SplitDataRequest() method that breaks the request if it would need more than one cacheline.
In fact, the page fault is occurring before it goes to the cache. The panic message says the address is 0x400. By
looking into the disassembly and the output log of -debug-flag=ExecAll, I think the address is an instruction address,
as I have found addresses starting with 0x400, for example :
0x400bf8 @__libc_start_main+808 : movz x1, #0,

Although, the last instruction I see in the output from ExecAll flag is a store instruction:
0x41c0ec @_dl_debug_initialize+124 : stlr x0, [x3] : MemWrite : D=0x0000000000492000 A=0x498028
Right after this, the panic message occurs.
In fact, using debug-flag=LSQUnit, I can see the message that "Fault on store pc" which points to this store instruction.

Yes, it can split requests; I'm just not sure it's prepared
to do in this case.

Since you have mentioned instructions, I'm now wondering if there
could be an issue with really small instruction cache lines.

The store might be faulting because of address translation, but it
might also be faulting because of ordering constraints (it's a
store release instruction). It is 64-bit, which means it would
cross cache lines. Maybe that's disallowed for such ordering
ops?

EM

So you are saying, it is not the address that the store instruction is supposed to store the value, but rather the address of the instruction itself ? The left-most boldface below: 0x41c0ec @_dl_debug_initialize+124 : stlr x0, [x3] : MemWrite : D=0x0000000000492000 *A=0x498028 That is, the program counter, when trying to fetch 0x41c0ec, could not find this address in one cacheline ? But, it was able to execute some previous instructions, and if cacheline size of instruction cache was the issue, the program should have terminated way before it reached this instruction. Here are some debug info from debug flag=LSQunit 122942000: system.cpu.iew.lsq.thread0: Inserting store PC (0x41c0ec=>0x41c0f0).(0=>1), idx:933 [sn:6577] 122942500: system.cpu.iew.lsq.thread0: Executing store PC (0x41c0ec=>0x41c0f0).(0=>1) [sn:6577] 122942500: system.cpu.iew.lsq.thread0: Fault on Store PC (0x41c0ec=>0x41c0f0).(0=>1), [sn:6577], Size = 0 Then I see the squashing messages and right after that the page fault messages appear. I looked into page table, and it receives the addres 0x400, cannot find page for this (and consequently the physical address). Then the fixfault() method in mem_state.cc is called, which cannot assign a new page, I am guessing because it is not in the VMA list. Thus I was thinking, maybe it is an address generation issue. I looked into (just skimmed through) arch/arm/insts/macromem.cc and mem.cc, it does have separate conditions for 32 bit and 64 bit (although the EA is not here, the are in ISA files). My best guess is, 0x400 is a partial address, the later parts of this are lost because of 64/32 bit issue. ________________________________ From: Eliot Moss <moss@cs.umass.edu> Sent: 06 February 2024 10:38 To: The gem5 Users mailing list <gem5-users@gem5.org> Cc: Nazmus Sakib <nsakib6@nmsu.edu> Subject: Re: [gem5-users] Re: Effective address and ISA [You don't often get email from moss@cs.umass.edu. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you are sure the content is safe. On 2/6/2024 11:13 AM, Nazmus Sakib via gem5-users wrote: > I think gem5 has this SplitDataRequest() method that breaks the request if it would need more than one cacheline. > In fact, the page fault is occurring before it goes to the cache. The panic message says the address is 0x400. By > looking into the disassembly and the output log of -debug-flag=ExecAll, I think the address is an instruction address, > as I have found addresses starting with 0x400, for example : > 0x400bf8 @__libc_start_main+808 : movz x1, #0, > > Although, the last instruction I see in the output from ExecAll flag is a store instruction: > 0x41c0ec @_dl_debug_initialize+124 : stlr x0, [x3] : MemWrite : D=0x0000000000492000 *A=0x498028* > Right after this, the panic message occurs. > In fact, using debug-flag=LSQUnit, I can see the message that "Fault on store pc" which points to this store instruction. Yes, it *can* split requests; I'm just not sure it's prepared to do in this case. Since you have mentioned instructions, I'm now wondering if there could be an issue with really small instruction cache lines. The store might be faulting because of address translation, but it might also be faulting because of ordering constraints (it's a store release instruction). It is 64-bit, which means it would cross cache lines. Maybe that's disallowed for such ordering ops? EM