SSH access blocked- locked out of the system
I am completely locked out of my Azure servers.
My SSH port is open from the config. I also tried to run the troubleshooting config, and it only keeps telling me that my server is very much healthy.
But all attempts to log in to the system fail, through SSH using different tools, and also through the serial console.
I have tried logging in with username and password and using PEM and also tried resetting and regenerating both.
All efforts in vain.
Desperately in need of help. My server remains inaccessible and it happens for 2 of my servers.
-
Manish Deshpande 7,010 Reputation points ⢠Microsoft External Staff ⢠Moderator
Thanks for reaching out to Microsoft Q&A we are looking into the issue and will get back shortly with an update.
-
Manish Deshpande 7,010 Reputation points ⢠Microsoft External Staff ⢠Moderator
Hello @Sharvil Khamkar
Thanks for the detailed context this definitely sounds frustrating, especially when the standard reset steps aren't sticking. The fact that password resets aren't applying (even after regenerating) is a strong signal that the Azure VM Agent may not be running or responding on your VM. The VM Agent is what actually processes those reset extension commands behind the scenes if it's unhealthy, resets will appear to succeed in the portal but never take effect on the OS.
Since SSH and serial console are both out, here's a practical path forward:
1.Use Run Command to check the VM from within (no SSH needed) Even without SSH, Azure's Run Command feature lets you execute shell commands directly on the VM through the Azure fabric. Try this in the portal:
Azure Portal > Your VM > Run Command > RunShellScript
Command :
systemctl status walinuxagentjournalctl -u walinuxagent -n 50 --no-pagerRun these to check agent and auth status:
This will tell you whether the agent is alive and what's actually rejecting logins.
- Check Boot Diagnostics for OS-level errors:
Head to Boot Diagnostics > Serial log under your VM's Support + Troubleshooting section. Look for filesystem errors, kernel panics, or PAM/SSH daemon messages. If the OS is failing to fully boot, that would explain why serial console also isn't letting you in.
https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/boot-diagnostics-overview3.If VM Agent is broken, use Azure VM repair to recover the disk:
If Run Command also fails, the cleanest path is to use the Azure Virtual Machine repair CLI extension. It automatically creates a recovery VM, attaches your OS disk, and lets you repair the filesystem or re-enable the agent offline without losing your data.az extension add -n vm-repair && \ az vm repair create -g <ResourceGroup> -n <VMName> --repair-username <user> --repair-password <pass> && \ az vm repair restore -g <ResourceGroup> -n <VMName>since this is happening on 2 servers simultaneously, I'd also check whether a recent custom script extension, policy, or automation job ran across both VMs around the time this started. That would point to a config-level change rather than independent failures.
Let me know what the Run Command output shows and we can narrow it down from there. Happy to help you through the next steps.
Thanks,
Manish.
-
Sharvil Khamkar 0 Reputation points
Thanks Manish for writing back.
We do have the similar jobs and automation scripts on other servers too and it does not fail or crash.
The first simple "systemctl status walinuxagent" command stays in the "Script execution in progress... " and does not do anything, even when we try "Azure Portal > Your VM > Run Command > RunShellScript" route.
We tried to download the boot logs, and it does not show anything besides the failed attempts to login.š User's image
Here is also a quick snapshot of the config check performed within Azure.
-
Manish Deshpande 7,010 Reputation points ⢠Microsoft External Staff ⢠Moderator
Hello @Sharvil Khamkar
Thank you for the update this is actually very helpful. The fact that Run Command is stuck at 'Script execution in progress... is a critical clue. Run Command depends on the Azure VM Agent to relay scripts to the guest OS. If it hangs indefinitely, it almost certainly means the VM Agent (waagent) is completely unresponsive not just unhealthy, but fully down. This also explains why password and key resets never actually applied those extensions go through the same agent.
1.Try Entra ID-based SSH (your screenshot shows this is already configured
Since your VM has the Microsoft Entra ID SSH extension set up, you can try bypassing traditional password/key auth entirely using:az ssh vm -n <VMName> -g <ResourceGroup>This authenticates via your Azure AD/Entra ID token ā completely independent of the OS password or PEM key. If the SSH daemon is running, this has a good chance of getting you in.
https://learn.microsoft.com/en-us/azure/active-directory/devices/howto-vm-sign-in-azure-ad-linux
2.Force restart the VM (different from a normal restart)
A standard restart still goes through the guest OS. A forced restart power-cycles the VM at the hypervisor level it can revive a hung VM agent:
az vm restart -g <ResourceGroup> -n <VMName> --forceAfter this, wait 3ā5 minutes and retry Run Command. If the agent comes back, you can reset credentials cleanly.
https://learn.microsoft.com/en-us/cli/azure/vm#az-vm-restart3.Offline repair via az vm repair (guaranteed path if above fail)
Since Run Command is dead, this is now your most reliable recovery option. It attaches your OS disk to a healthy repair VM so you can fix things without needing the agent at all:start with Option 1 (Entra ID SSH) since it's already configured on your VM and takes 30 seconds to try. If that doesn't work, go straight to Option 3 ā the offline repair. Given that both VMs are in the same state, something likely changed on the OS side (disk full, a failed update, or a script that inadvertently stopped waagent).
Thanks,
Manish. -
Manish Deshpande 7,010 Reputation points ⢠Microsoft External Staff ⢠Moderator
I wanted to check if my last response made sense. Iād be glad to assist further or explain anything in more detail
Sign in to comment
1 answer
-
Marcin Policht 92,630 Reputation points ⢠MVP ⢠Volunteer Moderator
Make sure to step through the full troubleshooting sequence described at https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection
If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.
hth
Marcin
-
Sharvil Khamkar 0 Reputation points
Thank you for your reply @Marcin Policht
I have already gone over all the troubleshooting steps mentioned in the article multiple times.
It simply doesn't work. I have tried restarting, redeploying, connecting through serial console and Azure CLI.
It only keeps me locked out by saying that the password is incorrect.
I have tried to reset the keys, configuration and password and tried with all the passwords reset to make sure that we are using the right one and nothing seems to work. We are completly locked out of the system
Sign in to comment -
