Issue fixed in latest vSAN 8 release
Note that findings on Entry Persistence Daemon below were (first) privately reported to VMware, and resolved in later vSAN 8 releases. Notes below from troubleshooting steps I took to carve out the details. Hopefully, it can help.
vSAN 8 Entry Persistence Daemon, epdDom and securityDom
After upgrading my vSAN-cluster from 7U3.x to ESXi to 8.0.0 (build 20513097) yesterday, the vSAN Health check reported issues on βvSAN Daemon Livenessβ, being EPD Status was βAbnormalβ. Quick check on /etc/init.d/epd status, indeed verified that the EPD was not running.
Trying to manually start the Entry Persistence Daemon service using /etc/init.d/epd start
, the script outputs epd started, but re-checking status using /etc/init.d/epd status
, it is actually not running (see below).
[root@esx-11:~] vmware -vl
VMware ESXi 8.0.0 build-20513097
VMware ESXi 8.0 GA
[root@esx-11:~] date
Wed Nov 23 15:45:49 UTC 2022
[root@esx-11:~] /etc/init.d/epd status
epd is not running
[root@esx-11:~] /etc/init.d/epd start
INIT: EPD uses a ramdisk for the db file
INIT: Using /locker as persistent storage
INIT: Using existing EPD ramdisk at /epd.
INIT: EPD using Security domain: ID:
epd started
[root@esx-11:~] /etc/init.d/epd status
epd is not running
Checking the logs from /var/log/epd.log
(see attached file for full output), I noticed
2022-11-23T15:46:56.495Z No(13) epd[1057623]: EntryDBPrintDBFileInfo: Using db file '/scratch/epd-storeV2.db'
2022-11-23T15:46:56.495Z No(13) epd[1057623]: EntryDB_Open: Failed to open db '/scratch/epd-storeV2.db' : unable to open database file (14)
2022-11-23T15:46:56.495Z No(13) epd[1057623]: EPDStoreOpen: Failed to open db (/scratch/epd-storeV2.db): Failure
2022-11-23T15:46:56.495Z No(13) epd[1057623]: EPDModuleInit: init for store-mgmt failed: Failure
2022-11-23T15:46:56.495Z No(13) epd[1057623]: main: initialization failed: Failure
2022-11-23T15:46:56.495Z No(13) epd[1057623]: main: exiting..
Info
I initially ivestigated issues related to scratch-area (in ramdisk), due to the
EntryDB_Open: Failed to open db '/scratch/epd-storeV2.db' : unable to open database file (14)
-error, but did not find any issues.
- Q: Maybe the file never got created(?)
Also, at the very top
2022-11-23T15:45:56.962Z In(9) epd[1057396]: INIT: EPD uses a ramdisk for the db file
2022-11-23T15:45:57.103Z In(9) epd[1057400]: INIT: Using /locker as persistent storage
2022-11-23T15:45:57.282Z In(9) epd[1057406]: INIT: Using existing EPD ramdisk at /epd.
2022-11-23T15:45:57.674Z In(9) epd[1057413]: INIT: EPD using Security domain: ID:
Notice that parameters
$EPD_SEC_DOM_NAME
and$EPD_SEC_DOM_ID
are blank (EPD using Security domain: ID:)
A: Makes sense, since itβs never populated in the variable in
/etc/init.d/epd
(see below)
[root@esx-11:~] grep DOM /etc/init.d/epd
syslog "EPD using Security domain: $EPD_SEC_DOM_NAME ID: $EPD_SEC_DOM_ID"
[root@esx-11:~] grep -i security /etc/init.d/epd
EPD_SEC_PARAM="++securitydom=epdDom"
syslog "EPD using Security domain: $EPD_SEC_DOM_NAME ID: $EPD_SEC_DOM_ID"
[root@esx-11:~] grep -i epdDom /etc/init.d/epd
EPD_SEC_PARAM="++securitydom=epdDom"
Info
My background for checking on the securityDom in the first place, was I did a quick diff on the
/etc/init.d/epd
init.d-script from ESXi 703 <> 800
diff epd-esx-11-800 epd-esx-23-703
271,274c271
< EPD_SEC_PARAM="++securitydom=epdDom"
<
< syslog "EPD using Security domain: $EPD_SEC_DOM_NAME ID: $EPD_SEC_DOM_ID"
< /sbin/watchdog.sh ++memreliable,group=${EPD_RP} -d -s "${EPD_TAG}" "${EPD}" "${EPD_SEC_PARAM}" "${EPD_PARAM}" >/dev/null 2>&1
---
> /sbin/watchdog.sh ++memreliable,group=${EPD_RP} -d -s "${EPD_TAG}" "${EPD}" "${EPD_PARAM}" >/dev/null 2>&1
Question
Based on the diff above, I guess that the
""${EPD_SEC_PARAM}"
is a new parameter in 8.x, as it does not exist in the very same init.d-script in 703
Removing the securityDom parameter (workaround)
A bit desperate for a βquick fixβ yesterday, I tried modifying the script, back to like it was in 703; I simply copied the init.d script to /tmp
, removed the "${EPD_SEC_PARAM}"
from the init.d script (line 274), and retried starting the EPD service, this time using the modified script. To my surprise (well, a little), it worked!
[root@esx-11:~] /tmp/epd start
INIT: EPD uses a ramdisk for the db file
INIT: Using /locker as persistent storage
INIT: Using existing EPD ramdisk at /epd.
epd started
[root@esx-11:~] /tmp/epd status
epd is running
- β Quickfix. Check.
- β Entry Persistence Daemon now seems happy.
Checking for securityDom in other init.d-scripts
Doing some additional research this morning (2022-11-24)
[root@esx-11:~] /bin/secpolicytools -D "epdDom"
9
- Lookup for domain using
/bin/secpolicytools -D "epdDom"
works.- Q: Maybe the init.d-script needs to evaluate this as a parameter/variable?
Checking /etc/init.d/sandboxd
(did a quick check on all init.d-scripts, using grep -Hinr securityDom /etc/init.d/*
to get some examples)
Snippet.
SANDBOXD_SECURITY_DOM=$(/bin/secpolicytools -D "${SANDBOXD_SECURITY_DOM_NAME}")
if [ $? -eq 0 ]; then
SANDBOXD_SECURITY_DOM_PARAM="++securitydom=${SANDBOXD_SECURITY_DOM}"
else
SANDBOXD_SECURITY_DOM_PARAM=""
fi
Notice!
Notice itβs using
/bin/secpolicytools -D "${SANDBOXD_SECURITY_DOM_NAME}"
, which I did not find in the EPD init.d-script (e.g./bin/secpolicytools -D epdDom
, that is)
Checking ALL init.d-scripts for a similar function
Notice!
The EPD init.d-script is NOT on this list.
[root@esx-11:~] grep -Hinr secpolicytools /etc/init.d/*
/etc/init.d/attestd:36: SEC_DOM=$(/bin/secpolicytools -D attestdDom)
/etc/init.d/cdp:7:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/cdp:9:CDP_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${CDP_SECURITY_DOM_NAME}")
/etc/init.d/dcbd:14:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/dcbd:18: SECURITY_DOM_ID=$("${SECPOLICYTOOLS}" -D "${DCBD_SECURITY_DOM_NAME}")
/etc/init.d/dpd:25:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/dpd:151: DPD_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${DPD_SECURITY_DOM_NAME}")
/etc/init.d/esxTokenCPS:15: SECURITY_DOM=$(/bin/secpolicytools -D esxtokendDom)
/etc/init.d/health:16:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/health:41: ${SECPOLICYTOOLS} -p
/etc/init.d/health:64: HEALTHD_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${HEALTHD_SECURITY_DOM_NAME}")
/etc/init.d/kmxa:13:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/kmxa:28: KMXA_SECURITY_DOM=$($SECPOLICYTOOLS -D $KMXA_SECURITY_DOM_NAME)
/etc/init.d/kmxd:35: SEC_DOM=$(/bin/secpolicytools -D kmxdDom)
/etc/init.d/lacp:8:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/lacp:13: LACP_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${LACP_SECURITY_DOM_NAME}")
/etc/init.d/nfcd:7:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/nfcd:27: NFCD_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${NFCD_SECURITY_DOM_NAME}")
/etc/init.d/nfsgssd:32:GSSD_SECURITY_DOM=$(/bin/secpolicytools -D nfsgssdDom)
/etc/init.d/nicmgmtd:22:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/nicmgmtd:31: SECURITY_DOM_ID=$($SECPOLICYTOOLS -D $SECURITY_DOM_NAME)
/etc/init.d/nscd:16:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/nscd:19: NSCD_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${NSCD_SECURITY_DOM_NAME}")
/etc/init.d/osfsd:13:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/osfsd:18:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/pcscd:24: PCSCD_SECURITY_DOM_PARAM="securitydom=$("/bin/secpolicytools" -D "${PCSCD_SECURITY_DOM_NAME}")"
/etc/init.d/pmemGarbageCollection:56: SECURITY_DOM_ID=$(/bin/secpolicytools -D "${SECURITY_DOM_NAME}")
/etc/init.d/sandboxd:38: SANDBOXD_SECURITY_DOM=$(/bin/secpolicytools -D "${SANDBOXD_SECURITY_DOM_NAME}")
/etc/init.d/sdrsInjector:13:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/sdrsInjector:23: INJECTOR_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${INJECTOR_SECURITY_DOM_NAME}")
/etc/init.d/settingsd:8:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/settingsd:50: SETTINGSD_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${SETTINGSD_SECURITY_DOM_NAME}")
/etc/init.d/smartd:14:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/smartd:18: SECURITY_DOM_ID=$("${SECPOLICYTOOLS}" -D "${SMARTD_SECURITY_DOM_NAME}")
/etc/init.d/storageRM:13:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/storageRM:23: STORAGERM_SECURITY_DOM=$("${SECPOLICYTOOLS}" -D "${STORAGERM_SECURITY_DOM_NAME}")
/etc/init.d/swapobjd:12:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/usbarbitrator:15: USBARB_SECURITY_DOM_PARAM="securitydom=$(/bin/secpolicytools -D "${USBARB_SECURITY_DOM_NAME}")"
/etc/init.d/vaai-nasd:29:VAAI_NASD_SECURITY_DOM=$(/bin/secpolicytools -D vaainasdDom)
/etc/init.d/vdtc:17:SECPOLICYTOOLS="/bin/secpolicytools"
/etc/init.d/vdtc:40: SECURITY_DOM_ID=$("${SECPOLICYTOOLS}" -D "${SECURITY_DOM_NAME}")
/etc/init.d/vltd:25:SECPOLICYTOOLS="/sbin/secpolicytools"
/etc/init.d/vltd:133: SECURITY_DOM=$(/bin/secpolicytools -D "${SEC_DOM_NAME}")
/etc/init.d/vmfstraced:76: VMFS_SECURITY_DOM=$(/bin/secpolicytools -D "${VMFS_SEC_DOM_NAME}")
/etc/init.d/vsanmgmtd:113: SECURITY_DOM=$(/bin/secpolicytools -D "${SEC_DOM_NAME}")
Trying to βpatchβ the EPD init.d-script adding securityDom
After comparing other init.d-scripts and how to look up the domain, I simply added the following the my custom init.d-script for Entry Persistence Daemon
EPD_SEC_DOM_NAME="epdDom"
EPD_SEC_DOM_ID=$(/bin/secpolicytools -D "${EPD_SEC_DOM_NAME}")
if [ $? -eq 0 ]; then
EPD_SEC_PARAM="++securitydom=${EPD_SEC_DOM_ID}"
else
EPD_SEC_PARAM=""
fi
I then reverted to the original line on 274
/sbin/watchdog.sh ++memreliable,group=${EPD_RP} -d -s "${EPD_TAG}" "${EPD}" "${EPD_SEC_PARAM}" "${EPD_PARAM}" >/dev/null 2>&1
I then retried starting Entry Persistence Daemon, using my (updated) custom init.d-script, and verified the logs.
[root@esx-11:~] /tmp/epd start
INIT: EPD uses a ramdisk for the db file
INIT: Using /locker as persistent storage
INIT: Using existing EPD ramdisk at /epd.
INIT: EPD using Security domain: epdDom ID: 9
epd started
[root@esx-11:~] /tmp/epd status
epd is not running
Notice INIT: EPD using Security domain: epdDom ID: 9
from the output.
From logs (same)
[root@esx-11:~] cat /var/log/epd.log |grep domain|tail -n1
2022-11-24T09:21:13.666Z In(9) epd[1111551]: INIT: EPD using Security domain: epdDom ID: 9
- CON: But Entry Persistence Daemon is not running.
- QUESTION: Typo? Permissions? Re-check variables later.
[root@esx-11:~] /tmp/epd status
epd is not running