VMware Cloud Foundation Bringup – Resolving VCF_ERROR_INTERNAL_SERVER_ERROR

Overview

I was rebuilding my nested VCF Lab and after only four items of validation, I received a Failed status. Well okay, sometimes that happens. I re-ran it and received the same error. I checked the JSON file for anything out of sorts and didn’t see anything, so I thought maybe I will re-use the Excel file. Same thing happened. I went as far as just re-deploying the Cloud Builder appliance (after checking the file hash value), but the same error occurred.

Log Time

Okay, what is going on here? Let’s review the log and see what’s up by tailing the in-flight validation.

tail -f /var/log/vmware/vcf/bringup/vcf-bringup.log
Collecting processing task errors: VCF_ERROR_INTERNAL_SERVER_ERROR for validation aggregation

I started scrolling backwards for more details and see this line:

Error occurred while validating ESX host esxi-104.aaronrombaut.com

I realized all of my hosts had the same message. Naturally, this led me to verify that the Cloud Builder appliance could even communicate with the hosts.

root@cb-254 [ /var/log/vmware/vcf/bringup ]# ping esxi-101.aaronrombaut.com
ping: esxi-101.aaronrombaut.com: Temporary failure in name resolution
root@cb-254 [ /var/log/vmware/vcf/bringup ]#

Ah ha! Found the issue…name resolution is not working.

Troubleshooting

Let’s check the current configuration of resolvectl, the service responsible for resolving DNS queries.

root@cb-254 [ /var/log/vmware/vcf/bringup ]# resolvectl status
Global
       Protocols: -LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: uplink

Link 2 (eth0)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Ok, let’s check /etc/resolv.conf as well.

root@cb-254 [ /var/log/vmware/vcf/bringup ]# cat /etc/resolv.conf
# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

# No DNS servers known.
search .

Yikes, what, “No DNS servers known“? Let’s check the vApp Options because surely we added those values when we deployed the appliance.

I am using an external vCenter to my VCF lab, so I navigate to the Cloud Builder appliance and choose Configure > Settings > vApp Options. Scroll down in the main window to the Properties section and look for the following Keys.

  • guestinfo.domain
  • guestinfo.searchpath
  • guestinfo.DNS

While looking for DNS related keys, I also noticed the guestinfo.ntp key was not set!

Power down the Cloud Builder virtual machine and Set Values for the above settings. Once the values are set, power on the virtual machine and wait a few moments before connecting to the SSH console to test.

Welp, none of that worked! Guess we will have to force it.

Resolution

NTP

Let’s start with configuring the NTP configuration.

root@cb-254 [ /home/admin ]# vi /etc/ntp.conf

tinker panic 0
restrict default kod nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict -6 ::1
driftfile /var/lib/ntp/drift/ntp.drift

# Added by Aaron Rombaut
server 10.10.92.10 

Restart ntpd.service.

systemctl restart ntpd.service

Query the NTP configuration.

root@cb-254 [ /home/admin ]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 10.10.92.10     91.189.91.157    3 u   57   64    1    2.465  +395.17   0.000

Quick notes:
st is the stratum level of the remote host.
t is the type, u is for a unicast server.
offset is the difference in time from client to host. This should be close to zero (0).

If the time is really far off, greater than 30 seconds, there will be issues with the bring-up. Wait for the offset to indicate the server and client synchronized.


resolvectl (DNS Resolution)

Back in the day, we could just simply edit the /etc/resolv.conf file for name resolution. Now, things are only slightly more complicated. We need to rely on the resolvectl service for DNS resolution.

Let’s start with telling it what server or servers to query. I only have one, but if you need to add a second DNS server, just add the two servers with a space in-between.

resolvectl dns eth0 192.168.92.10

Next, let’s provide the domain which provide the search path. If provided only a hostname, the domain will be appended.

resolvectl domain eth0 aaronrombaut.com

Finally, let’s check our progress by checking the /etc/resolv.conf file again.

root@cb-254 [ /home/admin ]# cat /etc/resolv.conf
# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 192.168.92.10
nameserver 192.168.92.11
search aaronrombaut.com

Additionally, let’s try to ping a host, again, with a fully qualified domain name.

root@cb-254 [ /home/admin ]# ping esxi-101.aaronrombaut.com
PING esxi-101.aaronrombaut.com (172.16.11.101) 56(84) bytes of data.
64 bytes from esxi-101.aaronrombaut.com (172.16.11.101): icmp_seq=1 ttl=63 time=0.321 ms
64 bytes from esxi-101.aaronrombaut.com (172.16.11.101): icmp_seq=2 ttl=63 time=0.395 ms
^C
--- esxi-101.aaronrombaut.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.321/0.358/0.395/0.037 ms

Nice work! We can now go back to our bring-up and hopefully continue without any further issues.


Leave a Reply

Your email address will not be published. Required fields are marked *