We have an Azure Linux Virtual Machine that has suddenly become inaccessible.
Symptoms:
- SSH connection to the VM times out.
- HTTP/HTTPS services hosted on the VM are not accessible.
- Azure Run Command hangs and does not complete.
- Redeploy + Reapply has already been attempted but did not resolve the issue.
- A snapshot of the OS disk has been taken before troubleshooting.
Network checks performed:
- VM power state is "Running".
- NSG inbound rules allow ports 22, 80, and 443.
- Public IP is attached to the VM.
- Connectivity from the internet still times out.
Serial Console / Boot Diagnostics:
- The Serial Console shows a Linux kernel panic with: "Kernel panic - not syncing: Attempted to kill init!"
Azure Agent logs also repeatedly report:
"WALinuxAgent launched with command 'python3 -u /usr/sbin/waagent -run-exthandlers' failed with exception: [Errno 2] No such file or directory: 'python3'"
Business impact:
- This VM hosts live production data and services.
- The application is currently unavailable to users.
- We need to recover the existing VM and data with minimal downtime and without data loss if possible.
We request assistance with:
- Determining the root cause of the kernel panic.
- Restoring access to the VM (SSH or console).
- Recovering the operating system and Azure VM Agent.
- Advising on the safest recovery procedure while preserving existing data.
2 answers
-
SUNOJ KUMAR YELURU 18,336 Reputation points âĸ MVP âĸ Volunteer Moderator
Hello @Vipul Om,
Thank you for reaching out Q&A forum.
Step 1: Check the VM Power State
In the Azure Portal â Virtual Machines â your VM â Overview, confirm the Status field.
Status Meaning
Running OS-level or network issue
Stopped (deallocated) VM was stopped â just start it
Failed Platform-level provisioning failure
Step 2: Open Boot Diagnostics (Serial Console)
Azure Portal â VM â Support + troubleshooting â Boot diagnostics â Serial log
This is the fastest way to determine the root cause:
What you see in the serial log Likely cause
Kernel panic - not syncing OS/kernel corruption â needs rescue VM
GRUB menu stuck Boot loader issue
Give root password for maintenance fstab mount failure
Started Session / login prompt OS booted fine â network/NSG issue
cloud-init errors VM agent / provisioning failure
Blank / no output Host-level issue or GPU/disk failure
Step 3:
Kernel Panic or filesystem errors â OS-level recovery needed
Detach the OS disk and attach it to a Rescue VM as a data disk
Run fsck, audit /etc/fstab, check /boot for missing initramfs
See the detailed recovery steps in my previous response above
VM appears booted but SSH/HTTP still times out â Network issue
Check these in order:
NSG rules â Does an inbound rule allow port 22 (SSH) or 80/443 from your source IP? (Not just "any")
Azure Firewall / UDR â Is there a route table sending traffic to a firewall that may be blocking it?
sshd service â Use Azure Run Command (if available) to check:
bash
systemctl status sshdHost-based firewall â Check iptables or firewalld rules inside the VM via Run Command
VM was recently resized, updated, or a disk was attached â Configuration change issue
Review Azure Activity Log (Portal â VM â Activity log) for any recent changes
Check if a failed apt upgrade / yum update broke a kernel or removed Python 3
If this answers your query, do click
Accept Answerand Up-Vote for the same. And, if you have any further query do let us know. -
LÊo 0 Reputation points
Hello,
The issue may come from a broken initramfs or python3 not installed (which can break initramfs)
You can try to access the VM via Serial Console (bypasses NSG and SSH timeouts) then apply these steps:
- When VM is booting up, repeatedly press ESC or Shift to show up GRUB
- Select Recovery Mode if available, if not, select Previous Kernel Version
- Fix python3/initramfs installation :
apt updateapt install --reinstall python3 python3-minimal initramfs-tools - Update initramfs:
update-initramfs -u -freboot
If kernel panic still appears after trying to enter Recovery Mode/Previous Kernel in GRUB, the initramfs is fully broken. You should create a rescue VM since you made a snapshot of the disk.
