Virtual System fails to start on ESXi with “no space left on the device” message.

  • By :
  • Category : VMware

In a very rare situation VM can fails to start because of the swap file issue.

Basically, VMware philosophy is to maintain for each virtual machine the swap file. This file is being used while the ESXi memory contention. Hypervisor should always guarantee that your system will always be up and “running” and other reclamation techniques wont be able to regain the memory bact to VM. Using swap file will always slow down your system, how much it depends on your scenario, but usually it is a traumatic situation for your system. 

The vswp file is called vmname-randomString.vswp. This file has the size equal to the VM memory and can be reduced by setting the VM memory reservation. If you configure full memory reservation, the swap file will be empty. In case of 20% of memory reservation, swap file will have 80% and so on. 

Additionally the swap file can be placed in different locations. Default it is located with all the virtual system files. And from my observation it is usually the best place because we limit the failure vector. We depend just from one datastore to keep the system running instead of several datastories that can fail or fill up. 

Another option is to keep swap file on dedicated (specified on host or cluster level) datastore. This way you can have a little order (i.e. for reporting purposes) and dedicate the fastest storage disks. Another approach will be to use the low cost storage if we think we never do not use the swap file. But we do not want to use system memory reservation because of the licensing model (in some scenarios, VMware can charge you by the memory you are using).

Third option is to configure swap location in virtual system configuration file (.vmx – advanced option):

sched.swap.dir = /vmfs/volumes/datastore_name/dir_name

So, back to the topic, there can be a situation when you won’t be able to start a virtual system because of the swap issue. This can be the result of some general problem with the datastore you are keeping the swap file or maybe free space problem.

In log file you can see something similar to this:

  • Could not power on VM : Invalid metadata. Failed to power on VM.
  • Could not power on VM : No space left on device
    Failed to power on VM.
    Failed to extend swap file from 0 KB to yyyyy KB.

In this situation possible solution:

  1. try to do full memory reservation for the affected virtual system (just for testing) and start the vm
  2. try to reconfigure swap location using the sched.swap.dir parameter
  3. try to migrate VM to another datastore – and I was put in situation, I wasn’t able to proceed with 1 and 2. Just system migration helped to start the system. At the end it turns out that vmfs filesystem was broken and we had to migrate all vms running on affected datastore.

Resources:

No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

VMware
VCF, backup configuration

Backup implementation for VMware component is fairly easy. Just the requirements is to configure SFTP server in proper way and make it network available to the VMware components. SDDC Manager and NSX Manager backup In VCF Operations it is possible to configure backup for SDDC Manager and NSX Manager. Go …

VMware
VCF Automation, fresh environment configuration with identity providers and access control.

Introduction Login Login as user admin to the Organization name: system or if selected manual: Check the connections (in Administration section), where you should see connection to the vCenter and NSX-t manager as those are provided automatically via VCF Operations: the same for VCF Instances: Also check your networking: Identity …

VMware
VCF SoS

SoS (Supportability and Serviceability) command can be used for troubleshooting purpose to generate VCF (per component) log bundle, massively enable/disable ssh service on ESXi, vCenter, password and certificates expiry status, verify cluster health and many other. while troubleshooting, the following commands can be helpful: