BSOD ntoskrnl.exe IRQL_NOT_LESS_OR_EQUAL

Channji

Member
Just take a restore point then.

If its BSODing that often there's something else I'd like you to try. Can you do one of the following:

Either go into the BIOS and disable C-States completely.

Or in the active Windows power policy, expand the Processor Power Management section and set both the maximum and minimum processor power state to 99%.

Both of these steps will stop the processors entering a low power state when idle. Most of your dumps show failures coming out of an idle state, so let's see whether this stops them.
Took a restore point and have the verifier running now, will try and see if i can get one of the bsod codes you mentioned.

Will try your suggestion also thanks!
 

Channji

Member
Just take a restore point then.

If its BSODing that often there's something else I'd like you to try. Can you do one of the following:

Either go into the BIOS and disable C-States completely.

Or in the active Windows power policy, expand the Processor Power Management section and set both the maximum and minimum processor power state to 99%.

Both of these steps will stop the processors entering a low power state when idle. Most of your dumps show failures coming out of an idle state, so let's see whether this stops them.
Have been using your suggestion of setting the min and max power state of the processor to 99% and havent had a bsod yet. Was wondering, would it be an idea to revert the change as we want it to bsod while the verifier is running? Also, could this change have actually been a solution to the problem ive been experiencing or is it just sort of putting a blanket over it?
 

ubuysa

The BSOD Doctor
Leave the power settings at 99%. If you have a flaky driver then Driver Verifier will find it regardless. After 48 hours with no BSODs you can disable Driver Verifier. However, do be sure to load every game, use every device, open every app, because drivers are only check when they are loaded.

You'll need to run with this power change for about a week with no BSODs to be sure that it has resolved the BSODs. However, this is a workaround not a solution. Your blanket analogy is a good one. If this was the problem, and we don't know that yet, then the cause is a flaky CPU that doesn't handle power transitioning from the idle C-State well. I've seen this several times before, more often with AMD CPUs but also on Intel CPUs.

Don't jump the gun though, we need much more running to have any confidence that this has stopped the BSODs. It's possible that when you disable Driver Verifier the BSODs may return. With some flaky drivers the extra checking that Driver Verifier does masks the flakiness of the driver and all seems well. When you disable Driver Verifier the flakiness returns. I've seen that happen before as well.
 

Channji

Member
Leave the power settings at 99%. If you have a flaky driver then Driver Verifier will find it regardless. After 48 hours with no BSODs you can disable Driver Verifier. However, do be sure to load every game, use every device, open every app, because drivers are only check when they are loaded.

You'll need to run with this power change for about a week with no BSODs to be sure that it has resolved the BSODs. However, this is a workaround not a solution. Your blanket analogy is a good one. If this was the problem, and we don't know that yet, then the cause is a flaky CPU that doesn't handle power transitioning from the idle C-State well. I've seen this several times before, more often with AMD CPUs but also on Intel CPUs.

Don't jump the gun though, we need much more running to have any confidence that this has stopped the BSODs. It's possible that when you disable Driver Verifier the BSODs may return. With some flaky drivers the extra checking that Driver Verifier does masks the flakiness of the driver and all seems well. When you disable Driver Verifier the flakiness returns. I've seen that happen before as well.
Okay will leave everything as it is at the moment. still have not experienced a bsod as of yet!

With the processor running at close to maximum, if this were to be the solution, would it be sustainable? Will the cpu not die sooner?
 

ubuysa

The BSOD Doctor
The power settings don't control how hard the CPU works, only how much power it uses. What you've done is to tell the CPU to stay in the C0 (running) power state permanently. That will use a tad more power and generate a tad more heat but it shouldn't noticeably shorten the life of your CPU.

That said, since this is a PCS build that's still in warranty, IF we determine that low power C-States is the problem then I would advise contacting PCS for an RMA of the CPU. That would involve an RMA of the whole PC because I doubt they'd let you change the CPU. BUT WE'RE NOT THERE YET!
 

Channji

Member
The power settings don't control how hard the CPU works, only how much power it uses. What you've done is to tell the CPU to stay in the C0 (running) power state permanently. That will use a tad more power and generate a tad more heat but it shouldn't noticeably shorten the life of your CPU.

That said, since this is a PCS build that's still in warranty, IF we determine that low power C-States is the problem then I would advise contacting PCS for an RMA of the CPU. That would involve an RMA of the whole PC because I doubt they'd let you change the CPU. BUT WE'RE NOT THERE YET!
I understand. Adhering to your previous reply will keep verifier on until Monday morning and will post an update. Thank you !!
 

Channji

Member
Saturday, Sunday and Monday experienced no bsods whatsoever. Was using the PC as I regularly would. Had Driver Verifier turned off since Monday morning and was using the PC pretty much all day Monday with no issues. Yesterday I was away from the computer for a few hours, usually I would turn it off but I didn't this occasion. Noticed it rebooted when I came back, so I checked the minidump file and one was created. I'm unaware if a minidump is exclusively written in the instance of a bsod but i have ran the sysnative program again: SysnativeFileCollectionApp.zip
 

ubuysa

The BSOD Doctor
OneDrive seems to be having issues. I'll try and download the file later.

Are the power setting mitigations still active (C_States disabled, processor power both at 99%)?
 

Channji

Member
OneDrive seems to be having issues. I'll try and download the file later.

Are the power setting mitigations still active (C_States disabled, processor power both at 99%)?
i never touched the c_states, but yes the processor power remains at 99%
 

ubuysa

The BSOD Doctor
Can you upload the Sysnative file output again to a different cloud service? OneDrive still reports problems for me on any browser.
 
Last edited:

ubuysa

The BSOD Doctor
Yep, got that. Thanks.

The most recent BSOD (16th April) is very similar to the earlier dumps in that the processor involved (processor 12) is coming out of idle. To try to explain in detail what (I think) is happening I'll go into some details. If you're not intertested then skip to the Summary heading lower down. If I show you the call stack, the list of function calls leading up to the bugcheck (and which you read from the bottom up) you'll see what was going on...
Code:
12: kd> knL
 # Child-SP          RetAddr               Call Site
00 fffff003`d694f258 fffff805`1f82e269     nt!KeBugCheckEx
01 fffff003`d694f260 fffff805`1f829705     nt!KiBugCheckDispatch+0x69
02 fffff003`d694f3a0 fffff805`1f8235a0     nt!KiPageFault+0x485
03 fffff003`d694f538 fffff805`1f6e1b08     nt!guard_dispatch_icall
04 fffff003`d694f540 fffff805`1f6e121b     nt!PpmIdleExecuteTransition+0x888
05 fffff003`d694f990 fffff805`1f81d3a4     nt!PoIdle+0x68b
06 fffff003`d694fb80 00000000`00000000     nt!KiIdleLoop+0x54
You see we start in the idle loop (nt!KiIdleLoop+0x54 and then nt!PoIdle+0x68b) and then we see a processor power management call (nt!PpmIdleExecuteTransition+0x888) to transition the processor power to the running state. The next function call, nt!guard_dispatch_icall, is related to the (relatively new) Control Flow Guard feature. This is designed to protect against code injections, buffer overuns, and similar exploits in trusted modules by controlling indirect calls, and this is where the bugcheck actually happens...
Code:
TRAP_FRAME:  fffff003d694f3a0 -- (.trap 0xfffff003d694f3a0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=fffff8051f680120 rbx=0000000000000000 rcx=0000000000000001
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8051f8235a0 rsp=fffff003d694f538 rbp=0000000000000000
 r8=0000000000000000  r9=ffffd083fafb3000 r10=0000000000000002
r11=0000000000000018 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up di ng nz ac pe cy
nt!guard_dispatch_icall:
fffff805`1f8235a0 4c8b1d59e39d00  mov     r11,qword ptr [nt!guard_icall_bitmap (fffff805`20201900)] ds:fffff805`20201900=????????????????
Resetting default scope
The trap frame above shows the failing instruction. You can see it's in the nt!guard_dispatch_icall function and what we see is a MOV instruction copying the addess of the CFG bitmap into register11, but the address of the CFG bitmap is invalid (???????????????). That's HOW the BSOD happened, the question is WHY.

The entire WIndows kernel is compiled with CFG enabled and the kernel CFG bitmap is loaded at boot time. BTW. We know that it's the kernel CFG bitmap here, rather than a user-mode app CFG bitmap, because the bitmap address is in kernel space (ie. it's addresss starts with 0xFFFF). It seems that there has either been a problem loading the CFG bitmap, with the RAM in which the bitmap is stored, or with the CFG feature itself.

Summary
I'd suggest modifying some BIOS settings if you're comfortable doing that?
  • If you are able to disable just processor #12 then do so and see how things are. Processor #12 is the active processor in ALL these BSODs.
  • I think your BIOS has the Intel EIST (extended speed-step technology) control, if so please disable this (it can drop the processors into an even lower power state when idle).
  • You might also try disabling the Race To Halt/Energy Efficient Turbo setting. This also controls CPU power saving settings.
I'm not sure whether it's a flaky CPU, flaky RAM, or just a corrupted Windows (CFG) system. My BIOS suggestions are to try and see whether it's a flaky CPU. The Memtest86 tests suggest that your RAM is probably good, but if all the above doesn't help then do the following...
  • Remove one RAM stick for at least three days - or until you get a BSOD
  • Then swap RAM sticks and run on just the other stick for at least three days - or until you get a BSOD.
This is because Memtes86 isnt perfect, no RAM tester can be.

TBH I still think this is a bad CPU, mainly because it's always processor #12 that fails and becuse the low power mitigations do seem to have helped.
 
Last edited:

Channji

Member
Yep, got that. Thanks.

The most recent BSOD (16th April) is very similar to the earlier dumps in that the processor involved (processor 12) is coming out of idle. To try to explain in detail what (I think) is happening I'll go into some details. If you're not intertested then skip to the Summary heading lower down. If I show you the call stack, the list of function calls leading up to the bugcheck (and which you read from the bottom up) you'll see what was going on...
Code:
12: kd> knL
 # Child-SP          RetAddr               Call Site
00 fffff003`d694f258 fffff805`1f82e269     nt!KeBugCheckEx
01 fffff003`d694f260 fffff805`1f829705     nt!KiBugCheckDispatch+0x69
02 fffff003`d694f3a0 fffff805`1f8235a0     nt!KiPageFault+0x485
03 fffff003`d694f538 fffff805`1f6e1b08     nt!guard_dispatch_icall
04 fffff003`d694f540 fffff805`1f6e121b     nt!PpmIdleExecuteTransition+0x888
05 fffff003`d694f990 fffff805`1f81d3a4     nt!PoIdle+0x68b
06 fffff003`d694fb80 00000000`00000000     nt!KiIdleLoop+0x54
You see we start in the idle loop (nt!KiIdleLoop+0x54 and then nt!PoIdle+0x68b) and then we see a processor power management call (nt!PpmIdleExecuteTransition+0x888) to transition the processor power to the running state. The next function call, nt!guard_dispatch_icall, is related to the (relatively new) Control Flow Guard feature. This is designed to protect against code injections, buffer overuns, and similar exploits in trusted modules by controlling indirect calls, and this is where the bugcheck actually happens...
Code:
TRAP_FRAME:  fffff003d694f3a0 -- (.trap 0xfffff003d694f3a0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=fffff8051f680120 rbx=0000000000000000 rcx=0000000000000001
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8051f8235a0 rsp=fffff003d694f538 rbp=0000000000000000
 r8=0000000000000000  r9=ffffd083fafb3000 r10=0000000000000002
r11=0000000000000018 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up di ng nz ac pe cy
nt!guard_dispatch_icall:
fffff805`1f8235a0 4c8b1d59e39d00  mov     r11,qword ptr [nt!guard_icall_bitmap (fffff805`20201900)] ds:fffff805`20201900=????????????????
Resetting default scope
The trap frame above shows the failing instruction. You can see it's in the nt!guard_dispatch_icall function and what we see is a MOV instruction copying the addess of the CFG bitmap into register11, but the address of the CFG bitmap is invalid (???????????????). That's HOW the BSOD happened, the question is WHY.

The entire WIndows kernel is compiled with CFG enabled and the kernel CFG bitmap is loaded at boot time. BTW. We know that it's the kernel CFG bitmap here, rather than a user-mode app CFG bitmap, because the bitmap address is in kernel space (ie. it's addresss starts with 0xFFFF). It seems that there has either been a problem loading the CFG bitmap, with the RAM in which the bitmap is stored, or with the CFG feature itself.

Summary
I'd suggest modifying some BIOS settings if you're comfortable doing that?
  • If you are able to disable just processor #12 then do so and see how things are. Processor #12 is the active processor in ALL these BSODs.
  • I think your BIOS has the Intel EIST (extended speed-step technology) control, if so please disable this (it can drop the processors into an even lower power state when idle).
  • You might also try disabling the Race To Halt/Energy Efficient Turbo setting. This also controls CPU power saving settings.
I'm not sure whether it's a flaky CPU, flaky RAM, or just a corrupted Windows (CFG) system. My BIOS suggestions are to try and see whether it's a flaky CPU. The Memtest86 tests suggest that your RAM is probably good, but if all the above doesn't help then do the following...
  • Remove one RAM stick for at least three days - or until you get a BSOD
  • Then swap RAM sticks and run on just the other stick for at least three days - or until you get a BSOD.
This is because Memtes86 isnt perfect, no RAM tester can be.

TBH I still think this is a bad CPU, mainly because it's always processor #12 that fails and becuse the low power mitigations do seem to have helped.
Hi apologies for not replying haven’t had much time to do anything the past week!

The past couple days i have been using one of your suggestions of disabling c states, haven’t had a crash since. Before that a friend recommended disabling hyper threading, this just caused bsods as soon as i logged into user.

I’m afraid i was unable to find an option to disable a specific processor, i just found an option to disable a number of the p cores and e cores i think? I might be misremembering the exact terminology there

Will give your suggestions here a try over the next couple days ! Thank you !!
 
Last edited:

ubuysa

The BSOD Doctor
Disabling C-States is a workaround not a solution. If that has eliminated, or even significantly reduced, the BSODs then that's a very strong indication that your CPU struggles with low power states. In that case, and in my opinion, you want it replaced under warranty. You'll have to RAM the entire PC for that however. Give PCS a call, point them at this thread and ask to RMA the PC for a replacement CPU.
 
Top