I hadn't upgraded in like a month, so there were hundreds of package changes between the Fedora versions. I "manually bisected" by installing different Fedora versions and seeing if they booted or not. The version numbers are based on year-month-day, so I installed different days' versions. I did this using:
sudo rpm-ostree deploy 40.20240617.0
Eventually I found out that 40.20240616.0 booted and 40.20240617.0 did not boot, so it must have been due to package changes in the June 17th version.
I was doing this on June 24th, which is one week after June 17th. Since the June 24th version still did not boot, I figured that whatever the problem was was likely to be niche or only affect specific systems. If this problem was making everybody's computers unbootable, it probably would have already been resolved or reverted upstream already. A more niche problem is less likely to have been solved.
I checked the package changes between 40.20240616.0 and 40.20240617.0. The only package changed was Linux kernel, which was upgraded from 6.8.11 to 6.9.4.
I'm probably not the only person with this problem, so I started searching the web to figure out if other people had problems with the June 17th version or with kernel 6.9.4. I found at least 4 problems reported with this version.
Since the 39.20240617.0 and 40.20240617.0 updates for Atomic Desktops and the 40.20240617.0 update for IoT, systems with Secure Boot enabled may fail to boot if they have been installed before Fedora Linux 40.
Oh, that sounds like my problem! I'm using Atomic Desktop that I installed before Fedora 40, and that's the exact version that doesn't boot for me!
I followed the instructions in some Fedora Discussion threads. These threads have since been summarised into this article on Fedora Magazine. The solution involves manually copying some files into the EFI partition. I dutifully do this and reboot. However, there is no change in behaviour. I still see the black screen like before.
I was thinking that I'm just bad at following instructions and I did it wrong. The other workaround is to turn off Secure Boot, so I went into the BIOS and found the option. It is already off. I guess this problem didn't apply to me in the first place. Oops.
Encrypted file system not mounted on boot? cryptsetup issues? Known issue in SELinux-policy
Oh, that sounds like my problem! My root file system is encrypted, so if there was a new issue where it wasn't getting decrypted on boot, my computer wouldn't be able to do anything at all. The theory here is that initramfs, loaded from unencrypted /boot, normally displays the decryption password prompt. A new Linux kernel version would have a new initramfs, and maybe the new one is broken and not displaying the password prompt. Sure, seems plausible.
This points to a redhat bugzilla with a bunch of logs and suggested commands that I don't understand. As it seemed to be related to selinux, I tried setting selinux to permissive (i.e. not enforcing the potentially broken security policies) but that didn't change the behaviour at all.
There's a reddit post talking about how their computer didn't boot after upgrading kernel 6.8.11 to 6.9.4, and how this wasn't related to the SELinux issue. Oh, that sounds like my problem! I have all of those things! Except... that I don't use Nvidia. So maybe not my problem. Well, no harm in trying anyway.
I tried various video kernel parameters including removing quiet rhgb
, adding nomodeset
, adding rd.debug
, but none of these changed the behaviour at all. It was still a black screen after picking the entry in Grub.
Oh, that sounds like my problem! I'm using amdgpu and kernel 6.9. The thread doesn't seem to have a definitive solution though, so even if this was my problem, I can't do anything about it...
I went to the Koji build server and download the RPM package files for kernel, kernelh-core, kernel-modules, kernel-modules-core, kernel-modules-extra 6.8.11. I then updated to the latest Fedora Kinoite release and installed the older kernel over that before rebooting. I used these commands:
sudo ostree deploy fedora:fedora/40/x86_64/kinoite sudo rpm-ostree update sudo rpm-ostree override replace kernel*6.8.11-300.fc40.x86_64.rpm
Then I rebooted, and this booted up fine. OK, so it's DEFINITELY a kernel change that caused this problem, it wasn't another component of Fedora.
Or was it Fedora? After all, when I search online for just kernel 6.9 not booting, most of the links point back to Fedora in some way. There's only one true way to test this. I could boot a different distro that has kernel 6.9 and see what happens. Ideally I don't want to install that distro on my computer, so I'd rather just boot a live USB. Since I can't do package upgrades on a live USB, I'll need to find a recent live USB that already has kernel 6.9 built in to it. I checked DistroWatch and found that EndeavorOS got a new release on June 30th and includes kernel 6.9.6 - sounds great.
I created a live USB using Fedora Media Writer and booted from it. Black screen. OK, this is TOTALLY the kernel's fault, not the distro's.
But what can I do about it? While ostree rollback or the kernel 6.8.11 manual override keep my computer usable, those probably aren't sustainable solutions.
Also, isn't it weird
As mentioned before I had already happened to try a BIOS update and switching GPU, which didn't help. But clearly the new kernel is working for other people's computers. So I guess it's hardware specific?
Indeed, I plugged the USB in to another computer and it can boot just fine there.
OK, so, what have we learned? What can I even try next? It's definitely an interaction between my computer hardware and changes in kernel 6.9. Well if it doesn't work on my computer, does it work in a virtual machine on my computer?
I download Gnome Boxes and select the same EndeavourOS ISO that I had installed on the live USB. Boxes says that KVM isn't enabled. I think KVM is a BIOS setting relating to virtualisation? After looking up what that option is called in my BIOS, I find that it's called SVM Mode, tucked away in the Advanced CPU Frequency submenu. I turn on the option.
I don't know why, but before going back into my regular OS to try setting up the virtual machine again, I decide to try the live USB. Just in case, I guess. I plug it in. It boots.
What.
I go back to my main Fedora installation. I update and remove my kernel override.
sudo rpm-ostree update sudo rpm-ostree override reset -a
I reboot. Selecting the new OSTree option in Grub, I hold my breath.
It boots.
» uname -a Linux glimmer 6.9.8-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jul 5 16:20:11 UTC 2024 x86_64 GNU/Linux
What? SVM is the virtualisation option. Did that really make the difference? I go back to BIOS and turn off SVM Mode. The new OSTree entry does not boot. I turn on SVM Mode. It boots again.
Okay... I have no idea why it's doing this. So when I tried the live USB on that other computer and it worked fine, I guess the other computer did have SVM Mode enabled already?
I go back to the other computer and dig through the ASUS-brand BIOS. SVM Mode: Disabled.
Then how did it...
You know what, I don't care. I will never know. I'm happy kernel 6.9 works on my computer. Please just never do that again.
— Cadence