gem5-users@gem5.org

The gem5 Users mailing list

View all threads

Squashing Instructions after Page Table Fault

RG
reverent.green@web.de
Fri, Sep 29, 2023 10:04 AM

Hello,

I am currently trying to locate the code that is used to squash instructions if a Page Table Fault is triggered in the O3 CPU.

After using the PageTableWalker Debug Flags, my current guess would be gem5/src/arch/x86/pagetable_walker.cc in line 199.

Furthermore I inspected the files in the src/cpu/o3 directory, but couldn't find anything specific to squashing instructions after a fault.

Is my assumption correct, that the O3 CPU implementation does not handle these things on its own, but the architectural part of the implementation does it? I am missing something, feel free to point it out.

Thank you in advance for your help.

Kind regards

Robin

RG
reverent.green@web.de
Fri, Sep 29, 2023 10:28 AM

A short addition. I also couldn't find a specific check for the user/supervisor Page Table Attribute anywhere.

Are there parts in the code, where specific bits are checked or does gem5 uses some other kind of implementation here?

Gesendet: Freitag, 29. September 2023 um 12:04 Uhr
Von: "reverent.green--- via gem5-users" <gem5-users@gem5.org>
An: gem5-users@gem5.org
Cc: reverent.green@web.de
Betreff: [gem5-users] Squashing Instructions after Page Table Fault

Hello,

I am currently trying to locate the code that is used to squash instructions if a Page Table Fault is triggered in the O3 CPU.

After using the PageTableWalker Debug Flags, my current guess would be gem5/src/arch/x86/pagetable_walker.cc in line 199.

Furthermore I inspected the files in the src/cpu/o3 directory, but couldn't find anything specific to squashing instructions after a fault.

Is my assumption correct, that the O3 CPU implementation does not handle these things on its own, but the architectural part of the implementation does it? I am missing something, feel free to point it out.

Thank you in advance for your help.

Kind regards

Robin

_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-leave@gem5.org

YY
Yuan Yao
Fri, Sep 29, 2023 11:28 PM

Hi Robin,

If I understand it correctly, a Page Table Fault instruction is not squashed but *not executed*. The instruction generating a fault is marked ready to commit. Then, during the commit phase, the fault generated by the instruction is handled.

To explain this in more detail let me I take an example of how Page Fault of a load is handled with gem5:

1, DefaultIEW<Impl>::executeInsts() => ldstQueue.executeLoad(Inst) => inst->InitiateAcc() (dynamic inst) => staticInst->initiateAcc() (static inst) => initiateMemRead (dynamic inst again) => cpu->pushRequest() => LSQ->pushRequest() => Follow this function chain, gem5 will ultimately start the translation via the MMU module.

2, Later after the translation is done, the page fault and the faulty instruction is marked by *translation->finish(...)* in pagetable_walker.cc (via walker:recevTimingResp, assuming that there is a page walk). The 'finish()' function is defined in the O3 pipeline components. In this case: LSQ<Impl>::SingleDataRequest::finish.

3, Because the faulty instruction is not yet committed, DefaultIEW<Impl>::executeInsts() will check the instruction again, but this time the instruction is marked as 'TranslationCompleted'. However since <fault != NoFault>, so the instruction will be marked as executed and is forward to the commit stage (iewState->instToCommit(inst)).

4, As the instruction moves to the head of ROB, the commitInst() function of the commit unit will call commitHead(), which further calls cpu->trap(), then fault->invoke() to handle the fault. Different faults have different invoke functions. To your question, please take a look at PageFault::invoke() at arch/x86/faults.cc. The CPU then setup the CR2 register etc and will read the ROM to launch the procedure to transfer control to OS fault handler. (The microrom is defined in romutil.py)

5, And after the page handler is finished the fault instruction (still at the head of ROB) will be re-executed.

The above is based on gem5 21.0.0.0 but I don't think the code changes much for the above discussions.

Hope this helps.

PS. Page access write is checked at the translate function in tlb.cc.

Br,

Yuan

On 9/29/23 12:28, reverent.green--- via gem5-users wrote:
A short addition. I also couldn't find a specific check for the user/supervisor Page Table Attribute anywhere.
Are there parts in the code, where specific bits are checked or does gem5 uses some other kind of implementation here?

Gesendet: Freitag, 29. September 2023 um 12:04 Uhr
Von: "reverent.green--- via gem5-users" gem5-users@gem5.orgmailto:gem5-users@gem5.org
An: gem5-users@gem5.orgmailto:gem5-users@gem5.org
Cc: reverent.green@web.demailto:reverent.green@web.de
Betreff: [gem5-users] Squashing Instructions after Page Table Fault
Hello,

I am currently trying to locate the code that is used to squash instructions if a Page Table Fault is triggered in the O3 CPU.
After using the PageTableWalker Debug Flags, my current guess would be gem5/src/arch/x86/pagetable_walker.cc in line 199.
Furthermore I inspected the files in the src/cpu/o3 directory, but couldn't find anything specific to squashing instructions after a fault.

Is my assumption correct, that the O3 CPU implementation does not handle these things on its own, but the architectural part of the implementation does it? I am missing something, feel free to point it out.

Thank you in advance for your help.
Kind regards
Robin

_______________________________________________ gem5-users mailing list -- gem5-users@gem5.orgmailto:gem5-users@gem5.org To unsubscribe send an email to gem5-users-leave@gem5.orgmailto:gem5-users-leave@gem5.org

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.


gem5-users mailing list -- gem5-users@gem5.orgmailto:gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.orgmailto:gem5-users-leave@gem5.org

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

Hi Robin, If I understand it correctly, a Page Table Fault instruction is not squashed but *not executed*. The instruction generating a fault is marked ready to commit. Then, during the commit phase, the fault generated by the instruction is handled. To explain this in more detail let me I take an example of how Page Fault of a load is handled with gem5: 1, DefaultIEW<Impl>::executeInsts() => ldstQueue.executeLoad(Inst) => inst->InitiateAcc() (dynamic inst) => staticInst->initiateAcc() (static inst) => initiateMemRead (dynamic inst again) => cpu->pushRequest() => LSQ->pushRequest() => Follow this function chain, gem5 will ultimately start the translation via the MMU module. 2, Later after the translation is done, the page fault and the faulty instruction is marked by *translation->finish(...)* in pagetable_walker.cc (via walker:recevTimingResp, assuming that there is a page walk). The 'finish()' function is defined in the O3 pipeline components. In this case: LSQ<Impl>::SingleDataRequest::finish. 3, Because the faulty instruction is not yet committed, DefaultIEW<Impl>::executeInsts() will check the instruction again, but this time the instruction is marked as 'TranslationCompleted'. However since <fault != NoFault>, so the instruction will be marked as executed and is forward to the commit stage (iewState->instToCommit(inst)). 4, As the instruction moves to the head of ROB, the commitInst() function of the commit unit will call commitHead(), which further calls cpu->trap(), then fault->invoke() to handle the fault. Different faults have different invoke functions. To your question, please take a look at PageFault::invoke() at arch/x86/faults.cc. The CPU then setup the CR2 register etc and will read the ROM to launch the procedure to transfer control to OS fault handler. (The microrom is defined in romutil.py) 5, And after the page handler is finished the fault instruction (still at the head of ROB) will be re-executed. The above is based on gem5 21.0.0.0 but I don't think the code changes much for the above discussions. Hope this helps. PS. Page access write is checked at the translate function in tlb.cc. Br, Yuan On 9/29/23 12:28, reverent.green--- via gem5-users wrote: A short addition. I also couldn't find a specific check for the user/supervisor Page Table Attribute anywhere. Are there parts in the code, where specific bits are checked or does gem5 uses some other kind of implementation here? Gesendet: Freitag, 29. September 2023 um 12:04 Uhr Von: "reverent.green--- via gem5-users" <gem5-users@gem5.org><mailto:gem5-users@gem5.org> An: gem5-users@gem5.org<mailto:gem5-users@gem5.org> Cc: reverent.green@web.de<mailto:reverent.green@web.de> Betreff: [gem5-users] Squashing Instructions after Page Table Fault Hello, I am currently trying to locate the code that is used to squash instructions if a Page Table Fault is triggered in the O3 CPU. After using the PageTableWalker Debug Flags, my current guess would be gem5/src/arch/x86/pagetable_walker.cc in line 199. Furthermore I inspected the files in the src/cpu/o3 directory, but couldn't find anything specific to squashing instructions after a fault. Is my assumption correct, that the O3 CPU implementation does not handle these things on its own, but the architectural part of the implementation does it? I am missing something, feel free to point it out. Thank you in advance for your help. Kind regards Robin _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-leave@gem5.org<mailto:gem5-users-leave@gem5.org> VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-leave@gem5.org<mailto:gem5-users-leave@gem5.org> När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
RG
reverent.green@web.de
Wed, Oct 4, 2023 2:03 PM

Hi Yuan,

thank you very much for your detailed response. My understanding of the fault handling in gem5 is getting better and better. Using debug flags, I can trace the control flow during the execution of my code.

I am currently inspecting tlb.cc in further detail, but I am still searching for the exact check for my problem.

To further specify my question:

During the attempt to access kernel memory, the “user/supervisor” (U/S) pagetable attribute is used to check whether this page table belongs to kernel memory or not. If I want to access the memory, it should raise the page table fault. I am looking for this specific check. My goal is, to experiment with gem5 and to customize it. Currently, the instruction is not executed when raising a Page Table Fault. In a first step, I want to change the check in order to execute the instruction although it wants to access kernel memory. So I explicitly search for this check inside this command chain during the Page Fault handling.

Thank you very much in advance.

Best regards

Robin

Gesendet: Samstag, 30. September 2023 um 01:28 Uhr
Von: "Yuan Yao via gem5-users" <gem5-users@gem5.org>
An: "The gem5 Users mailing list" <gem5-users@gem5.org>
Cc: "Yuan Yao" <yuan.yao@it.uu.se>
Betreff: [gem5-users] Re: Squashing Instructions after Page Table Fault

Hi Robin,

If I understand it correctly, a Page Table Fault instruction is not squashed but *not executed*. The instruction generating a fault is marked ready to commit. Then, during the commit phase, the fault generated by the instruction is handled.

To explain this in more detail let me I take an example of how Page Fault of a load is handled with gem5:

1, DefaultIEW<Impl>::executeInsts() => ldstQueue.executeLoad(Inst) => inst->InitiateAcc() (dynamic inst) => staticInst->initiateAcc() (static inst) => initiateMemRead (dynamic inst again) => cpu->pushRequest() => LSQ->pushRequest() => Follow this function chain, gem5 will ultimately start the translation via the MMU module.

2, Later after the translation is done, the page fault and the faulty instruction is marked by *translation->finish(...)* in pagetable_walker.cc (via walker:recevTimingResp, assuming that there is a page walk). The 'finish()' function is defined in the O3 pipeline components. In this case: LSQ<Impl>::SingleDataRequest::finish.

3, Because the faulty instruction is not yet committed, DefaultIEW<Impl>::executeInsts() will check the instruction again, but this time the instruction is marked as 'TranslationCompleted'. However since <fault != NoFault>, so the instruction will be marked as executed and is forward to the commit stage (iewState->instToCommit(inst)).

4, As the instruction moves to the head of ROB, the commitInst() function of the commit unit will call commitHead(), which further calls cpu->trap(), then fault->invoke() to handle the fault. Different faults have different invoke functions. To your question, please take a look at PageFault::invoke() at arch/x86/faults.cc. The CPU then setup the CR2 register etc and will read the ROM to launch the procedure to transfer control to OS fault handler. (The microrom is defined in romutil.py)

5, And after the page handler is finished the fault instruction (still at the head of ROB) will be re-executed.

The above is based on gem5 21.0.0.0 but I don't think the code changes much for the above discussions.

Hope this helps.

PS. Page access write is checked at the translate function in tlb.cc.

Br,

Yuan

On 9/29/23 12:28, reverent.green--- via gem5-users wrote:

A short addition. I also couldn't find a specific check for the user/supervisor Page Table Attribute anywhere.

Are there parts in the code, where specific bits are checked or does gem5 uses some other kind of implementation here?

Gesendet: Freitag, 29. September 2023 um 12:04 Uhr
Von: "reverent.green--- via gem5-users" <gem5-users@gem5.org>
An: gem5-users@gem5.org
Cc: reverent.green@web.de
Betreff: [gem5-users] Squashing Instructions after Page Table Fault

Hello,

I am currently trying to locate the code that is used to squash instructions if a Page Table Fault is triggered in the O3 CPU.

After using the PageTableWalker Debug Flags, my current guess would be gem5/src/arch/x86/pagetable_walker.cc in line 199.

Furthermore I inspected the files in the src/cpu/o3 directory, but couldn't find anything specific to squashing instructions after a fault.

Is my assumption correct, that the O3 CPU implementation does not handle these things on its own, but the architectural part of the implementation does it? I am missing something, feel free to point it out.

Thank you in advance for your help.

Kind regards

Robin

_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-leave@gem5.org

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.

<pre class="moz-quote-pre">_______________________________________________
gem5-users mailing list -- <a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:gem5-users@gem5.org" onclick="parent.window.location.href='mailto:gem5-users@gem5.org'; return false;" target="_blank">gem5-users@gem5.org</a>
To unsubscribe send an email to <a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:gem5-users-leave@gem5.org" onclick="parent.window.location.href='mailto:gem5-users-leave@gem5.org'; return false;" target="_blank">gem5-users-leave@gem5.org</a>

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-leave@gem5.org

EM
Eliot Moss
Wed, Oct 4, 2023 3:00 PM

On 10/4/2023 10:03 AM, reverent.green--- via gem5-users wrote:

Hi Yuan,

thank you very much for your detailed response. My understanding of the
fault handling in gem5 is getting better and better. Using debug flags, I
can trace the control flow during the execution of my code.

I am currently inspecting tlb.cc in further detail, but I am still searching
for the exact check for my problem.  To further specify my question:

During the attempt to access kernel memory, the “user/supervisor” (U/S)
pagetable attribute is used to check whether this page table belongs to
kernel memory or not. If I want to access the memory, it should raise the
page table fault. I am looking for this specific check. My goal is, to
experiment with gem5 and to customize it. Currently, the instruction is not
executed when raising a Page Table Fault. In a first step, I want to change
the check in order to execute the instruction although it wants to access
kernel memory. So I explicitly search for this check inside this command
chain during the Page Fault handling.

Thank you very much in advance.

Best regards

Robin

Assuming we're talking about the x86 architecture, line 471 in tlb.cc is where
the check in question happens:

https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/x86/tlb.cc#L471

Note that the raw bits of the PTE have been abstracted out in the gem5 TLB
entry data structure, hence properties such as entry->user.

HTH

Eliot Moss

On 10/4/2023 10:03 AM, reverent.green--- via gem5-users wrote: > Hi Yuan, > thank you very much for your detailed response. My understanding of the > fault handling in gem5 is getting better and better. Using debug flags, I > can trace the control flow during the execution of my code. > I am currently inspecting tlb.cc in further detail, but I am still searching > for the exact check for my problem. To further specify my question: > During the attempt to access kernel memory, the “user/supervisor” (U/S) > pagetable attribute is used to check whether this page table belongs to > kernel memory or not. If I want to access the memory, it should raise the > page table fault. I am looking for this specific check. My goal is, to > experiment with gem5 and to customize it. Currently, the instruction is not > executed when raising a Page Table Fault. In a first step, I want to change > the check in order to execute the instruction although it wants to access > kernel memory. So I explicitly search for this check inside this command > chain during the Page Fault handling. > Thank you very much in advance. > Best regards > Robin Assuming we're talking about the x86 architecture, line 471 in tlb.cc is where the check in question happens: https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/x86/tlb.cc#L471 Note that the raw bits of the PTE have been abstracted out in the gem5 TLB entry data structure, hence properties such as entry->user. HTH Eliot Moss
RG
reverent.green@web.de
Mon, Oct 9, 2023 11:37 AM

Hey Eliot,

thank you for your help. I experimented with the checks and I was a bit suprised, that the Page Fault seems not to be raised after a unsuccessful user/supervisor check. After enabling the necessary debug flags and including more Debug statements into the code, I observed that the Page Fault is not raised after entering the If-statement, but before it. Here is a short snippet of my outputs:

14442496349500: system.repeat_switch_cpus5.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0 (Line 470)

14442496349500: system.repeat_switch_cpus5.mmu.dtb: Checks done! (Line 485)

14442496350000: system.repeat_switch_cpus5.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0

14442496350000: system.repeat_switch_cpus5.mmu.dtb: Checks done!

14442496361000: Page-Fault: RIP 0x402da9: vector 14: #PF(0x4) at 0xffff880019688110

14442496387000: system.repeat_switch_cpus5.mmu.itb: inUser = 1 | entry_user = 0 | badWrite = 1

14442496387000: system.repeat_switch_cpus5.mmu.itb: ***************************** If [Line 471]. *****************************************

14442496424000: system.repeat_switch_cpus5.mmu.dtb: inUser = 0 | entry_user = 0 | badWrite = 1

14442496424000: system.repeat_switch_cpus5.mmu.dtb: Checks done!

14442496464000: system.repeat_switch_cpus5.mmu.dtb: inUser = 0 | entry_user = 0 | badWrite = 1

14442496464000: system.repeat_switch_cpus5.mmu.dtb: Checks done!

I expected, that the Page Fault is raised at line 476, but it doesn't seem so.

For further context, my goal is to get this code (https://github.com/IAIK/meltdown/blob/master/reliability.c) working in gem5. Currently, "libkdump_read" (https://github.com/IAIK/meltdown/blob/master/libkdump/libkdump.c#L528) only returns 0 in gem5.

My guess is, that I need to change much more than initially thought. With reference to the answer of Yuan, I guess that I also need to change stuff in the function chain for handling a fault. Can anyone confirm this?

Best regards,

Robin

Gesendet: Mittwoch, 04. Oktober 2023 um 17:00 Uhr
Von: "Eliot Moss via gem5-users" <gem5-users@gem5.org>
An: "The gem5 Users mailing list" <gem5-users@gem5.org>, yuan.yao@it.uu.se
Cc: reverent.green@web.de, "Eliot Moss" <moss@cs.umass.edu>
Betreff: [gem5-users] Re: Squashing Instructions after Page Table Fault

On 10/4/2023 10:03 AM, reverent.green--- via gem5-users wrote:
> Hi Yuan,

> thank you very much for your detailed response. My understanding of the
> fault handling in gem5 is getting better and better. Using debug flags, I
> can trace the control flow during the execution of my code.

> I am currently inspecting tlb.cc in further detail, but I am still searching
> for the exact check for my problem. To further specify my question:

> During the attempt to access kernel memory, the “user/supervisor” (U/S)
> pagetable attribute is used to check whether this page table belongs to
> kernel memory or not. If I want to access the memory, it should raise the
> page table fault. I am looking for this specific check. My goal is, to
> experiment with gem5 and to customize it. Currently, the instruction is not
> executed when raising a Page Table Fault. In a first step, I want to change
> the check in order to execute the instruction although it wants to access
> kernel memory. So I explicitly search for this check inside this command
> chain during the Page Fault handling.

> Thank you very much in advance.

> Best regards

> Robin

Assuming we're talking about the x86 architecture, line 471 in tlb.cc is where
the check in question happens:

https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/x86/tlb.cc#L471

Note that the raw bits of the PTE have been abstracted out in the gem5 TLB
entry data structure, hence properties such as entry->user.

HTH

Eliot Moss


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

YY
Yuan Yao
Mon, Oct 9, 2023 12:29 PM

Hi Robin,

The "Page-Fault" message is printed out on the constructor of a fault, so gdb that line and move up frames can help.

By the way, a page fault can also be generated during page walks (see here<https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/x86/pagetable_walker.cc#L491C22-L491C22>). The faulty PTE is not inserted into TLB. Debug flag PageTableWalker tracks all these errands.

Hope this helps.

Br,

Y.

On 10/9/23 13:37, reverent.green--- via gem5-users wrote:
Hey Eliot,

thank you for your help. I experimented with the checks and I was a bit suprised, that the Page Fault seems not to be raised after a unsuccessful user/supervisor check. After enabling the necessary debug flags and including more Debug statements into the code, I observed that the Page Fault is not raised after entering the If-statement, but before it. Here is a short snippet of my outputs:

14442496349500: system.repeat_switch_cpus5.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0            (Line 470)
14442496349500: system.repeat_switch_cpus5.mmu.dtb: Checks done!                                                      (Line 485)
14442496350000: system.repeat_switch_cpus5.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0
14442496350000: system.repeat_switch_cpus5.mmu.dtb: Checks done!
14442496361000: Page-Fault: RIP 0x402da9: vector 14: #PF(0x4) at 0xffff880019688110
14442496387000: system.repeat_switch_cpus5.mmu.itb: inUser = 1 | entry_user = 0 | badWrite = 1
14442496387000: system.repeat_switch_cpus5.mmu.itb: ***************************** If [Line 471]. *****************************************
14442496424000: system.repeat_switch_cpus5.mmu.dtb: inUser = 0 | entry_user = 0 | badWrite = 1
14442496424000: system.repeat_switch_cpus5.mmu.dtb: Checks done!
14442496464000: system.repeat_switch_cpus5.mmu.dtb: inUser = 0 | entry_user = 0 | badWrite = 1
14442496464000: system.repeat_switch_cpus5.mmu.dtb: Checks done!

I expected, that the Page Fault is raised at line 476, but it doesn't seem so.

For further context, my goal is to get this code (https://github.com/IAIK/meltdown/blob/master/reliability.c) working in gem5. Currently, "libkdump_read" (https://github.com/IAIK/meltdown/blob/master/libkdump/libkdump.c#L528) only returns 0 in gem5.

My guess is, that I need to change much more than initially thought. With reference to the answer of Yuan, I guess that I also need to change stuff in the function chain for handling a fault. Can anyone confirm this?

Best regards,
Robin

Gesendet: Mittwoch, 04. Oktober 2023 um 17:00 Uhr
Von: "Eliot Moss via gem5-users" gem5-users@gem5.orgmailto:gem5-users@gem5.org
An: "The gem5 Users mailing list" gem5-users@gem5.orgmailto:gem5-users@gem5.org, yuan.yao@it.uu.semailto:yuan.yao@it.uu.se
Cc: reverent.green@web.demailto:reverent.green@web.de, "Eliot Moss" moss@cs.umass.edumailto:moss@cs.umass.edu
Betreff: [gem5-users] Re: Squashing Instructions after Page Table Fault
On 10/4/2023 10:03 AM, reverent.green--- via gem5-users wrote:

Hi Yuan,

thank you very much for your detailed response. My understanding of the
fault handling in gem5 is getting better and better. Using debug flags, I
can trace the control flow during the execution of my code.

I am currently inspecting tlb.cc in further detail, but I am still searching
for the exact check for my problem. To further specify my question:

During the attempt to access kernel memory, the “user/supervisor” (U/S)
pagetable attribute is used to check whether this page table belongs to
kernel memory or not. If I want to access the memory, it should raise the
page table fault. I am looking for this specific check. My goal is, to
experiment with gem5 and to customize it. Currently, the instruction is not
executed when raising a Page Table Fault. In a first step, I want to change
the check in order to execute the instruction although it wants to access
kernel memory. So I explicitly search for this check inside this command
chain during the Page Fault handling.

Thank you very much in advance.

Best regards

Robin

Assuming we're talking about the x86 architecture, line 471 in tlb.cc is where
the check in question happens:

https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/x86/tlb.cc#L471

Note that the raw bits of the PTE have been abstracted out in the gem5 TLB
entry data structure, hence properties such as entry->user.

HTH

Eliot Moss


gem5-users mailing list -- gem5-users@gem5.orgmailto:gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.orgmailto:gem5-users-leave@gem5.org

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

Hi Robin, The "Page-Fault" message is printed out on the constructor of a fault, so gdb that line and move up frames can help. By the way, a page fault can also be generated during page walks (see here<https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/x86/pagetable_walker.cc#L491C22-L491C22>). The faulty PTE is not inserted into TLB. Debug flag PageTableWalker tracks all these errands. Hope this helps. Br, Y. On 10/9/23 13:37, reverent.green--- via gem5-users wrote: Hey Eliot, thank you for your help. I experimented with the checks and I was a bit suprised, that the Page Fault seems not to be raised after a unsuccessful user/supervisor check. After enabling the necessary debug flags and including more Debug statements into the code, I observed that the Page Fault is not raised after entering the If-statement, but before it. Here is a short snippet of my outputs: 14442496349500: system.repeat_switch_cpus5.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0 (Line 470) 14442496349500: system.repeat_switch_cpus5.mmu.dtb: Checks done! (Line 485) 14442496350000: system.repeat_switch_cpus5.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0 14442496350000: system.repeat_switch_cpus5.mmu.dtb: Checks done! 14442496361000: Page-Fault: RIP 0x402da9: vector 14: #PF(0x4) at 0xffff880019688110 14442496387000: system.repeat_switch_cpus5.mmu.itb: inUser = 1 | entry_user = 0 | badWrite = 1 14442496387000: system.repeat_switch_cpus5.mmu.itb: ***************************** If [Line 471]. ***************************************** 14442496424000: system.repeat_switch_cpus5.mmu.dtb: inUser = 0 | entry_user = 0 | badWrite = 1 14442496424000: system.repeat_switch_cpus5.mmu.dtb: Checks done! 14442496464000: system.repeat_switch_cpus5.mmu.dtb: inUser = 0 | entry_user = 0 | badWrite = 1 14442496464000: system.repeat_switch_cpus5.mmu.dtb: Checks done! I expected, that the Page Fault is raised at line 476, but it doesn't seem so. For further context, my goal is to get this code (https://github.com/IAIK/meltdown/blob/master/reliability.c) working in gem5. Currently, "libkdump_read" (https://github.com/IAIK/meltdown/blob/master/libkdump/libkdump.c#L528) only returns 0 in gem5. My guess is, that I need to change much more than initially thought. With reference to the answer of Yuan, I guess that I also need to change stuff in the function chain for handling a fault. Can anyone confirm this? Best regards, Robin Gesendet: Mittwoch, 04. Oktober 2023 um 17:00 Uhr Von: "Eliot Moss via gem5-users" <gem5-users@gem5.org><mailto:gem5-users@gem5.org> An: "The gem5 Users mailing list" <gem5-users@gem5.org><mailto:gem5-users@gem5.org>, yuan.yao@it.uu.se<mailto:yuan.yao@it.uu.se> Cc: reverent.green@web.de<mailto:reverent.green@web.de>, "Eliot Moss" <moss@cs.umass.edu><mailto:moss@cs.umass.edu> Betreff: [gem5-users] Re: Squashing Instructions after Page Table Fault On 10/4/2023 10:03 AM, reverent.green--- via gem5-users wrote: > Hi Yuan, > thank you very much for your detailed response. My understanding of the > fault handling in gem5 is getting better and better. Using debug flags, I > can trace the control flow during the execution of my code. > I am currently inspecting tlb.cc in further detail, but I am still searching > for the exact check for my problem. To further specify my question: > During the attempt to access kernel memory, the “user/supervisor” (U/S) > pagetable attribute is used to check whether this page table belongs to > kernel memory or not. If I want to access the memory, it should raise the > page table fault. I am looking for this specific check. My goal is, to > experiment with gem5 and to customize it. Currently, the instruction is not > executed when raising a Page Table Fault. In a first step, I want to change > the check in order to execute the instruction although it wants to access > kernel memory. So I explicitly search for this check inside this command > chain during the Page Fault handling. > Thank you very much in advance. > Best regards > Robin Assuming we're talking about the x86 architecture, line 471 in tlb.cc is where the check in question happens: https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/x86/tlb.cc#L471 Note that the raw bits of the PTE have been abstracted out in the gem5 TLB entry data structure, hence properties such as entry->user. HTH Eliot Moss _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-leave@gem5.org<mailto:gem5-users-leave@gem5.org> VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe. När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
EM
Eliot Moss
Mon, Oct 9, 2023 12:51 PM

You observed that the check on line 471 in tlb.cc did not seem to be the one
causing the fault in the case you were looking at.  It occurs to me that the
line 471 check is for a resident page.  If the page is not resident, some
other check would apply, and the fault might be raised when the OS examines
the PTE to determine what to do with a disallowed access to a non-resident
page.

Could that be the scenario you were looking at?  That would indeed seem to be
more involved, though at the point gem5 does the interrupt for a non-resident
page (one not in the TLB) you might be able to more directly do a check of the
PTE.  To do that you would need to emulate walking the page tables (hoping
that all the relevant page table pages are themselves resident).

Yes, possibly a bit of a mess ...

EM

You observed that the check on line 471 in tlb.cc did not seem to be the one causing the fault in the case you were looking at. It occurs to me that the line 471 check is for a *resident* page. If the page is *not* resident, some other check would apply, and the fault might be raised when the OS examines the PTE to determine what to do with a disallowed access to a non-resident page. Could that be the scenario you were looking at? That would indeed seem to be more involved, though at the point gem5 does the interrupt for a non-resident page (one not in the TLB) you might be able to more directly do a check of the PTE. To do that you would need to emulate walking the page tables (hoping that all the relevant page table pages are themselves resident). Yes, possibly a bit of a mess ... EM
RG
reverent.green@web.de
Wed, Oct 25, 2023 10:58 AM

I have used more debug flags, which increased the execution time by a lot, but I got some new information out of it:

Addresses : var = 39b765b0, start = 198325b0, phys = 198325b0 (output in meltdown "reliability.c" code, after line 39)

O3CPU: Ticking main, O3CPU.

15059411234500: system.repeat_switch_cpus1.mmu.dtb: Translating vaddr 0x7ffe39b765b0.

15059411234500: system.repeat_switch_cpus1.mmu.dtb: In protected mode.

15059411234500: system.repeat_switch_cpus1.mmu.dtb: Paging enabled.

15059411234500: system.repeat_switch_cpus1.mmu.dtb: pageAlignedVaddr for lookup: 0x7ffe39b76000

15059411234500: system.repeat_switch_cpus1.mmu.dtb: Handling a TLB miss for address 0x7ffe39b765b0 at pc 0x401b34. <--- First a TLB miss

15059411234500: system.repeat_switch_cpus1: Scheduling next tick!

[...]

O3CPU: Ticking main, O3CPU.

15059411262000: system.repeat_switch_cpus1: Scheduling next tick!

15059411262500: system.repeat_switch_cpus1.mmu.dtb.walker: Got long mode PTE entry 0x00000019832067.

15059411262500: system.repeat_switch_cpus1.mmu.dtb: Translating vaddr 0x7ffe39b765b0.

15059411262500: system.repeat_switch_cpus1.mmu.dtb: In protected mode.

15059411262500: system.repeat_switch_cpus1.mmu.dtb: Paging enabled.

15059411262500: system.repeat_switch_cpus1.mmu.dtb: pageAlignedVaddr for lookup: 0x7ffe39b76000

15059411262500: system.repeat_switch_cpus1.mmu.dtb: Entry found with paddr 0x19832000, doing protection checks.

15059411262500: system.repeat_switch_cpus1.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0

15059411262500: system.repeat_switch_cpus1.mmu.dtb: Translated 0x7ffe39b765b0 -> 0x198325b0. <--- Translated virt to phys

[...]

O3CPU: Ticking main, O3CPU.

15059514670500: system.repeat_switch_cpus1.mmu.dtb: Translating vaddr 0xffff8800198325b0.

15059514670500: system.repeat_switch_cpus1.mmu.dtb: In protected mode.

15059514670500: system.repeat_switch_cpus1.mmu.dtb: Paging enabled.

15059514670500: system.repeat_switch_cpus1.mmu.dtb: pageAlignedVaddr for lookup: 0xffff880019832000

15059514670500: system.repeat_switch_cpus1.mmu.dtb: Handling a TLB miss for address 0xffff8800198325b0 at pc 0x402e09.

15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e09=>0x402e10).(1=>2) [sn:251369]

15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e10=>0x402e13).(0=>1) [sn:251370]

15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e13=>0x402e15).(0=>1) [sn:251371]

15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e15=>0x402e17).(0=>1) [sn:251372]

15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e15=>0x402e17).(1=>2) [sn:251373]

15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e15=>0x402e17).(2=>3) [sn:251374]

15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e17=>0x402e1e).(0=>1) [sn:251375]

15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251369] PC (0x402e09=>0x402e10).(1=>2)

15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251370] PC (0x402e10=>0x402e13).(0=>1)

15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251371] PC (0x402e13=>0x402e15).(0=>1)

15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251372] PC (0x402e15=>0x402e17).(0=>1)

15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251373] PC (0x402e15=>0x402e17).(1=>2)

15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251374] PC (0x402e15=>0x402e17).(2=>3)

15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251375] PC (0x402e17=>0x402e1e).(0=>1)

15059514670500: system.repeat_switch_cpus1: Scheduling next tick!

[...]

O3CPU: Ticking main, O3CPU.

15059514683000: system.repeat_switch_cpus1: Scheduling next tick!

15059514683500: system.repeat_switch_cpus1.mmu.dtb.walker: Got long mode PML4 entry 0x00000000000000.

15059514683500: system.repeat_switch_cpus1.mmu.dtb.walker: Raising page fault.

[...]

O3CPU: Ticking main, O3CPU.

15059514688500: Page-Fault: RIP 0x402e1e: vector 14: #PF(0x4) at 0xffff8800198325b0

15059514688500: system.repeat_switch_cpus1: Scheduling next tick!

>>> This is a snippet of the debugging output.

For more context: https://github.com/IAIK/meltdown/blob/master/reliability.c (kaslr disabled in gem5 full-system simulation kernel command line)

  • First, the address is translated from virt to phys without a problem (line 30)

  • Next, the code wants to access the translated kernel address (line 49). Here seems to be the problem. It gets a TLB miss for the address, but after that the PageTableWalker gets the PML4 entry 0x00000000000000 and raises a Page fault.

  • My expectation (and goal) is, that during the read of the kernel address, the Page Table Walk is successfull until the Page Table Entry.

Now I have a few questions:

  1. After the TLB miss at tick 15059514670500, the CPU removes many commited instructions at the PC the miss occured. Why are these instructions commited, although the Page Fault is being raised?

  2. Does anyone have an idea, why the Page Fault already occurs at the PML4 entry level? And why this entry is only 0x0?

Thank you again in advance. I am very happy if someone could help or clarify this.

Kind regards,

Robin

Gesendet: Montag, 09. Oktober 2023 um 14:29 Uhr
Von: "Yuan Yao via gem5-users" <gem5-users@gem5.org>
An: "The gem5 Users mailing list" <gem5-users@gem5.org>
Cc: "Yuan Yao" <yuan.yao@it.uu.se>
Betreff: [gem5-users] Re: Squashing Instructions after Page Table Fault

Hi Robin,

The "Page-Fault" message is printed out on the constructor of a fault, so gdb that line and move up frames can help.

By the way, a page fault can also be generated during page walks (see here). The faulty PTE is not inserted into TLB. Debug flag PageTableWalker tracks all these errands.

Hope this helps.

Br,

Y.

On 10/9/23 13:37, reverent.green--- via gem5-users wrote:

Hey Eliot,

thank you for your help. I experimented with the checks and I was a bit suprised, that the Page Fault seems not to be raised after a unsuccessful user/supervisor check. After enabling the necessary debug flags and including more Debug statements into the code, I observed that the Page Fault is not raised after entering the If-statement, but before it. Here is a short snippet of my outputs:

14442496349500: system.repeat_switch_cpus5.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0 (Line 470)

14442496349500: system.repeat_switch_cpus5.mmu.dtb: Checks done! (Line 485)

14442496350000: system.repeat_switch_cpus5.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0

14442496350000: system.repeat_switch_cpus5.mmu.dtb: Checks done!

14442496361000: Page-Fault: RIP 0x402da9: vector 14: #PF(0x4) at 0xffff880019688110

14442496387000: system.repeat_switch_cpus5.mmu.itb: inUser = 1 | entry_user = 0 | badWrite = 1

14442496387000: system.repeat_switch_cpus5.mmu.itb: ***************************** If [Line 471]. *****************************************

14442496424000: system.repeat_switch_cpus5.mmu.dtb: inUser = 0 | entry_user = 0 | badWrite = 1

14442496424000: system.repeat_switch_cpus5.mmu.dtb: Checks done!

14442496464000: system.repeat_switch_cpus5.mmu.dtb: inUser = 0 | entry_user = 0 | badWrite = 1

14442496464000: system.repeat_switch_cpus5.mmu.dtb: Checks done!

I expected, that the Page Fault is raised at line 476, but it doesn't seem so.

For further context, my goal is to get this code (https://github.com/IAIK/meltdown/blob/master/reliability.c) working in gem5. Currently, "libkdump_read" (https://github.com/IAIK/meltdown/blob/master/libkdump/libkdump.c#L528) only returns 0 in gem5.

My guess is, that I need to change much more than initially thought. With reference to the answer of Yuan, I guess that I also need to change stuff in the function chain for handling a fault. Can anyone confirm this?

Best regards,

Robin

Gesendet: Mittwoch, 04. Oktober 2023 um 17:00 Uhr
Von: "Eliot Moss via gem5-users" <gem5-users@gem5.org>
An: "The gem5 Users mailing list" <gem5-users@gem5.org>, yuan.yao@it.uu.se
Cc: reverent.green@web.de, "Eliot Moss" <moss@cs.umass.edu>
Betreff: [gem5-users] Re: Squashing Instructions after Page Table Fault

On 10/4/2023 10:03 AM, reverent.green--- via gem5-users wrote:
> Hi Yuan,

> thank you very much for your detailed response. My understanding of the
> fault handling in gem5 is getting better and better. Using debug flags, I
> can trace the control flow during the execution of my code.

> I am currently inspecting tlb.cc in further detail, but I am still searching
> for the exact check for my problem. To further specify my question:

> During the attempt to access kernel memory, the “user/supervisor” (U/S)
> pagetable attribute is used to check whether this page table belongs to
> kernel memory or not. If I want to access the memory, it should raise the
> page table fault. I am looking for this specific check. My goal is, to
> experiment with gem5 and to customize it. Currently, the instruction is not
> executed when raising a Page Table Fault. In a first step, I want to change
> the check in order to execute the instruction although it wants to access
> kernel memory. So I explicitly search for this check inside this command
> chain during the Page Fault handling.

> Thank you very much in advance.

> Best regards

> Robin

Assuming we're talking about the x86 architecture, line 471 in tlb.cc is where
the check in question happens:

https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/x86/tlb.cc#L471

Note that the raw bits of the PTE have been abstracted out in the gem5 TLB
entry data structure, hence properties such as entry->user.

HTH

Eliot Moss


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-leave@gem5.org