Two critical diagnostic/correctness improvements after user field report:
- '20s latency between inventory syncs with a full test inventory'
- 'duplication on throw + deposit in chest'
- 'bad sync on fast inter-server transfer if disconnect too quickly after modification'
(1) Real durations — 'completed in 0ms' was a lie
Every SyncLogger.saveCompleted / restoreCompleted call hardcoded 0 for the
duration field. The log line always showed 'in 0ms' regardless of actual
latency, making the user's 20s-latency reports impossible to reproduce from
logs alone. Fixed across all 4 save paths (LOGOUT / SHUTDOWN / DEATH /
EMERGENCY_FLUSH) and the RESTORE path. Durations are measured from the
start of the BG task (or the start of the restore lock acquisition) to
just before the success log line.
New info log 'Logout save completed for {uuid} in {n}ms'
New warn log '[perf-restore] slow restore for {uuid} ({n}ms)' above 1s
New info log '[perf-logout] core=Xms backpacks=Yms ss=Zms rs2=Wms total=Nms'
above 200 ms — breakdown so we can pinpoint which downstream write
takes the time in the reported 20s cases.
(2) Phase 9 force-takeover could CAUSE duplication
Phase 9 aimed to fix 30-60s join waits when the previous server was alive
but the player was ghost-online there. It force-claimed after 5s. But if
the peer was mid-way through a LEGITIMATE logout save (which is atomic
with online=1 -> online=0 via writeSnapshotToDB setOffline=true), force-
claiming before that commit read STALE DB data and restored the player
from the PRE-disconnect state — e.g., an item the player dropped just
before disconnect came back in inventory, duplicating with the ItemEntity
the peer had already spawned in the world.
Fix: the wait cap is now ADVISORY, not a hard force-claim. Past the cap,
we only force-claim when the peer's heartbeat has FROZEN (age > cap ms)
— meaning the peer's process is actually dead or stuck mid-tick, not
just slow to flush. If the peer is still heartbeating normally, we keep
waiting: writeSnapshotToDB + online=0 is an atomic UPDATE, so the flush
WILL land, we just need to be patient. A warn line every 20 attempts
(10s at default interval) tells admins the save is taking a long time
so they can profile the peer's DB connection.
New helper peerHeartbeatAgeMs(id) returns age in ms, Long.MAX_VALUE if
the peer has no heartbeat row. Used to decide force-claim vs keep-waiting.