Issue fixed in latest vSAN 8 release

Note that findings on Entry Persistence Daemon below were (first) privately reported to VMware, and resolved in later vSAN 8 releases. Notes below from troubleshooting steps I took to carve out the details. Hopefully, it can help.

vSAN 8 Entry Persistence Daemon, epdDom and securityDom

After upgrading my vSAN-cluster from 7U3.x to ESXi to 8.0.0 (build 20513097) yesterday, the vSAN Health check reported issues on β€œvSAN Daemon Liveness”, being EPD Status was β€œAbnormal”. Quick check on /etc/init.d/epd status, indeed verified that the EPD was not running.

Trying to manually start the Entry Persistence Daemon service using /etc/init.d/epd start, the script outputs epd started, but re-checking status using /etc/init.d/epd status, it is actually not running (see below).

[root@esx-11:~] vmware -vl
VMware ESXi 8.0.0 build-20513097
VMware ESXi 8.0 GA
 
[root@esx-11:~] date
Wed Nov 23 15:45:49 UTC 2022
 
[root@esx-11:~] /etc/init.d/epd status
epd is not running
 
[root@esx-11:~] /etc/init.d/epd start
INIT: EPD uses a ramdisk for the db file
INIT: Using /locker as persistent storage
INIT: Using existing EPD ramdisk at /epd.
INIT: EPD using Security domain:  ID:
epd started
 
[root@esx-11:~] /etc/init.d/epd status
epd is not running

Checking the logs from /var/log/epd.log (see attached file for full output), I noticed

2022-11-23T15:46:56.495Z No(13) epd[1057623]: EntryDBPrintDBFileInfo: Using db file '/scratch/epd-storeV2.db'
2022-11-23T15:46:56.495Z No(13) epd[1057623]: EntryDB_Open: Failed to open db '/scratch/epd-storeV2.db' : unable to open database file (14)
2022-11-23T15:46:56.495Z No(13) epd[1057623]: EPDStoreOpen: Failed to open db (/scratch/epd-storeV2.db): Failure
2022-11-23T15:46:56.495Z No(13) epd[1057623]: EPDModuleInit: init for store-mgmt failed: Failure
2022-11-23T15:46:56.495Z No(13) epd[1057623]: main: initialization failed: Failure
2022-11-23T15:46:56.495Z No(13) epd[1057623]: main: exiting..

Info

I initially ivestigated issues related to scratch-area (in ramdisk), due to the EntryDB_Open: Failed to open db '/scratch/epd-storeV2.db' : unable to open database file (14)-error, but did not find any issues.

  • Q: Maybe the file never got created(?)

Also, at the very top

2022-11-23T15:45:56.962Z In(9) epd[1057396]: INIT: EPD uses a ramdisk for the db file
2022-11-23T15:45:57.103Z In(9) epd[1057400]: INIT: Using /locker as persistent storage
2022-11-23T15:45:57.282Z In(9) epd[1057406]: INIT: Using existing EPD ramdisk at /epd.
2022-11-23T15:45:57.674Z In(9) epd[1057413]: INIT: EPD using Security domain: ID:

Notice that parameters $EPD_SEC_DOM_NAME and $EPD_SEC_DOM_ID are blank (EPD using Security domain: ID:)

A: Makes sense, since it’s never populated in the variable in /etc/init.d/epd (see below)

[root@esx-11:~] grep DOM /etc/init.d/epd
   syslog "EPD using Security domain: $EPD_SEC_DOM_NAME ID: $EPD_SEC_DOM_ID"
[root@esx-11:~] grep -i security /etc/init.d/epd
   EPD_SEC_PARAM="++securitydom=epdDom"
   syslog "EPD using Security domain: $EPD_SEC_DOM_NAME ID: $EPD_SEC_DOM_ID"
[root@esx-11:~] grep -i epdDom /etc/init.d/epd
   EPD_SEC_PARAM="++securitydom=epdDom"

Info

My background for checking on the securityDom in the first place, was I did a quick diff on the /etc/init.d/epd init.d-script from ESXi 703 <> 800

diff epd-esx-11-800 epd-esx-23-703
271,274c271
<    EPD_SEC_PARAM="++securitydom=epdDom"
<
<    syslog "EPD using Security domain: $EPD_SEC_DOM_NAME ID: $EPD_SEC_DOM_ID"
<    /sbin/watchdog.sh ++memreliable,group=${EPD_RP} -d -s "${EPD_TAG}" "${EPD}" "${EPD_SEC_PARAM}" "${EPD_PARAM}" >/dev/null 2>&1
---
>    /sbin/watchdog.sh ++memreliable,group=${EPD_RP} -d -s "${EPD_TAG}" "${EPD}" "${EPD_PARAM}" >/dev/null 2>&1

Question

Based on the diff above, I guess that the ""${EPD_SEC_PARAM}" is a new parameter in 8.x, as it does not exist in the very same init.d-script in 703

Removing the securityDom parameter (workaround)

A bit desperate for a β€œquick fix” yesterday, I tried modifying the script, back to like it was in 703; I simply copied the init.d script to /tmp, removed the "${EPD_SEC_PARAM}" from the init.d script (line 274), and retried starting the EPD service, this time using the modified script. To my surprise (well, a little), it worked!

[root@esx-11:~] /tmp/epd start
INIT: EPD uses a ramdisk for the db file
INIT: Using /locker as persistent storage
INIT: Using existing EPD ramdisk at /epd.
epd started
[root@esx-11:~] /tmp/epd status
epd is running

Checking for securityDom in other init.d-scripts

Doing some additional research this morning (2022-11-24)

[root@esx-11:~] /bin/secpolicytools -D "epdDom"
9
  • Lookup for domain using /bin/secpolicytools -D "epdDom" works.
    • Q: Maybe the init.d-script needs to evaluate this as a parameter/variable?

Checking /etc/init.d/sandboxd (did a quick check on all init.d-scripts, using grep -Hinr securityDom /etc/init.d/* to get some examples)

Snippet.

SANDBOXD_SECURITY_DOM=$(/bin/secpolicytools -D "${SANDBOXD_SECURITY_DOM_NAME}")
   if [ $? -eq 0 ]; then
      SANDBOXD_SECURITY_DOM_PARAM="++securitydom=${SANDBOXD_SECURITY_DOM}"
   else
      SANDBOXD_SECURITY_DOM_PARAM=""
   fi

Notice!

Notice it’s using /bin/secpolicytools -D "${SANDBOXD_SECURITY_DOM_NAME}", which I did not find in the EPD init.d-script (e.g. /bin/secpolicytools -D epdDom, that is)

Checking ALL init.d-scripts for a similar function

Notice!

The EPD init.d-script is NOT on this list.

[root@esx-11:~] grep -Hinr secpolicytools /etc/init.d/*
/etc/init.d/attestd:36:   SEC_DOM=$(/bin/secpolicytools -D attestdDom)
/etc/init.d/cdp:7:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/cdp:9:CDP_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${CDP_SECURITY_DOM_NAME}")
/etc/init.d/dcbd:14:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/dcbd:18:      SECURITY_DOM_ID=$("${SECPOLICYTOOLS}" -D "${DCBD_SECURITY_DOM_NAME}")
/etc/init.d/dpd:25:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/dpd:151:   DPD_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${DPD_SECURITY_DOM_NAME}")
/etc/init.d/esxTokenCPS:15:   SECURITY_DOM=$(/bin/secpolicytools -D esxtokendDom)
/etc/init.d/health:16:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/health:41:      ${SECPOLICYTOOLS} -p
/etc/init.d/health:64:   HEALTHD_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${HEALTHD_SECURITY_DOM_NAME}")
/etc/init.d/kmxa:13:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/kmxa:28:   KMXA_SECURITY_DOM=$($SECPOLICYTOOLS -D $KMXA_SECURITY_DOM_NAME)
/etc/init.d/kmxd:35:   SEC_DOM=$(/bin/secpolicytools -D kmxdDom)
/etc/init.d/lacp:8:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/lacp:13:      LACP_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${LACP_SECURITY_DOM_NAME}")
/etc/init.d/nfcd:7:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/nfcd:27:      NFCD_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${NFCD_SECURITY_DOM_NAME}")
/etc/init.d/nfsgssd:32:GSSD_SECURITY_DOM=$(/bin/secpolicytools -D nfsgssdDom)
/etc/init.d/nicmgmtd:22:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/nicmgmtd:31:      SECURITY_DOM_ID=$($SECPOLICYTOOLS -D $SECURITY_DOM_NAME)
/etc/init.d/nscd:16:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/nscd:19:   NSCD_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${NSCD_SECURITY_DOM_NAME}")
/etc/init.d/osfsd:13:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/osfsd:18:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/pcscd:24:      PCSCD_SECURITY_DOM_PARAM="securitydom=$("/bin/secpolicytools" -D "${PCSCD_SECURITY_DOM_NAME}")"
/etc/init.d/pmemGarbageCollection:56:      SECURITY_DOM_ID=$(/bin/secpolicytools -D "${SECURITY_DOM_NAME}")
/etc/init.d/sandboxd:38:   SANDBOXD_SECURITY_DOM=$(/bin/secpolicytools -D "${SANDBOXD_SECURITY_DOM_NAME}")
/etc/init.d/sdrsInjector:13:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/sdrsInjector:23:   INJECTOR_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${INJECTOR_SECURITY_DOM_NAME}")
/etc/init.d/settingsd:8:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/settingsd:50:   SETTINGSD_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${SETTINGSD_SECURITY_DOM_NAME}")
/etc/init.d/smartd:14:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/smartd:18:      SECURITY_DOM_ID=$("${SECPOLICYTOOLS}" -D "${SMARTD_SECURITY_DOM_NAME}")
/etc/init.d/storageRM:13:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/storageRM:23:   STORAGERM_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${STORAGERM_SECURITY_DOM_NAME}")
/etc/init.d/swapobjd:12:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/usbarbitrator:15:   USBARB_SECURITY_DOM_PARAM="securitydom=$(/bin/secpolicytools -D "${USBARB_SECURITY_DOM_NAME}")"
/etc/init.d/vaai-nasd:29:VAAI_NASD_SECURITY_DOM=$(/bin/secpolicytools -D vaainasdDom)
/etc/init.d/vdtc:17:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/vdtc:40:   SECURITY_DOM_ID=$("${SECPOLICYTOOLS}" -D "${SECURITY_DOM_NAME}")
/etc/init.d/vltd:25:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/vltd:133:   SECURITY_DOM=$(/bin/secpolicytools -D "${SEC_DOM_NAME}")
/etc/init.d/vmfstraced:76:   VMFS_SECURITY_DOM=$(/bin/secpolicytools -D "${VMFS_SEC_DOM_NAME}")
/etc/init.d/vsanmgmtd:113:   SECURITY_DOM=$(/bin/secpolicytools -D "${SEC_DOM_NAME}")

Trying to β€œpatch” the EPD init.d-script adding securityDom

After comparing other init.d-scripts and how to look up the domain, I simply added the following the my custom init.d-script for Entry Persistence Daemon

EPD_SEC_DOM_NAME="epdDom"
EPD_SEC_DOM_ID=$(/bin/secpolicytools -D "${EPD_SEC_DOM_NAME}")
   if [ $? -eq 0 ]; then
      EPD_SEC_PARAM="++securitydom=${EPD_SEC_DOM_ID}"
   else
      EPD_SEC_PARAM=""
   fi

I then reverted to the original line on 274

/sbin/watchdog.sh ++memreliable,group=${EPD_RP} -d -s "${EPD_TAG}" "${EPD}" "${EPD_SEC_PARAM}" "${EPD_PARAM}" >/dev/null 2>&1

I then retried starting Entry Persistence Daemon, using my (updated) custom init.d-script, and verified the logs.

[root@esx-11:~] /tmp/epd start
INIT: EPD uses a ramdisk for the db file
INIT: Using /locker as persistent storage
INIT: Using existing EPD ramdisk at /epd.
INIT: EPD using Security domain: epdDom ID: 9
epd started
[root@esx-11:~] /tmp/epd status
epd is not running

Notice INIT: EPD using Security domain: epdDom ID: 9 from the output.

From logs (same)

[root@esx-11:~] cat /var/log/epd.log |grep domain|tail -n1
2022-11-24T09:21:13.666Z In(9) epd[1111551]: INIT: EPD using Security domain: epdDom ID: 9
[root@esx-11:~] /tmp/epd status
epd is not running