User report: Twilight Forest 'Charm of Keeping' + ReviveMe interaction loses
inventory over multiple deaths. Analysis concludes this is a Twilight/ReviveMe
event-priority interaction, not a PlayerSync bug — but two dead config options
hid admin-facing levers that help mitigate the case.
(1) save_on_death was declared in JdbcConfig but NEVER read in code. The death
snapshot ran unconditionally. Now gated: setting save_on_death=false
disables the LivingDeathEvent-driven pre-drop snapshot. The normal
onPlayerLogout save still fires on disconnect, so nothing is lost — but
admins diagnosing a keeping-charm interaction can quickly turn off the
aggressive death snapshot to rule PlayerSync in or out.
(2) save_on_respawn was also declared but never read. Added a new
@SubscribeEvent PlayerEvent.PlayerRespawnEvent handler that calls
snapshotAndQueueSave(player, 'RESPAWN') after the respawn completes.
This captures the post-death state AFTER keeping-charms / Corail
Tombstone / similar mods have restored their preserved items, so
PlayerSync's DB row reflects the actual post-respawn inventory rather
than the pre-drop snapshot from onPlayerDeath.
Excludes end-portal exit (isEndConquered) since that's not a death
respawn — no need to overwrite.
Combined effect: if a player dies, charm-keeps items, respawns, the DB
ends up with:
t=0 death snapshot (pre-drop, full inventory)
t=X respawn snapshot (post-drop, kept items + whatever charm restored)
The respawn snapshot overwrites the death one by virtue of running later.
A disconnect between t=0 and t=X still saves via onPlayerLogout anyway,
so no loss window opens.
No change to the duplication-safety guarantees from Phases 15-18:
onPlayerDeath still checks event.isCanceled() for ReviveMe, the RESPAWN
snapshot goes through the normal snapshotAndQueueSave pipeline with all
the P0-a/b/c guards and the 2-phase-commit logout_started_at tracking.
Answer to the user's question: the keep-charm inventory loss is
overwhelmingly likely to be a ReviveMe x Twilight Forest event-priority
bug outside PlayerSync's control, but this commit exposes two levers
(save_on_death, save_on_respawn) that let admins test whether PlayerSync
is contributing — setting save_on_death=false should make the symptom
unchanged if the root cause is external.
Regression from Phase 15: new players connecting for the FIRST time got
kicked with 'PlayerSync: another server is finalizing your save. Please
reconnect in a few seconds.' before they ever saw the world.
Root cause: the Phase 15 atomic CAS was
UPDATE player_data SET last_server=?, online=1, logout_started_at=NULL
WHERE uuid=? AND (online=0 OR last_server=?)
For a brand-new player the player_data row does not exist yet — the WHERE
clause matches zero rows and executePreparedUpdateRet returns 0. The
surrounding check treated 'claim == 0' as 'another server beat us', so
it kicked the player. But it was really 'no row to update yet' — the
store(player, true) call further down the flow is what INSERTs the row.
Fix: the poll loop already detects row-missing via rsCheck.next() == false
and breaks out. Thread that signal through as isNewPlayer and skip the
CAS entirely when it's set. The subsequent !playerExists branch picks up
the new player and INSERTs the row with the correct state.
No impact on the cross-server race safety: existing-row players still run
the full CAS; only the true-first-connection path is unblocked. Zero risk
of duplication / data loss — new players have nothing to duplicate or lose.
Three targeted optimizations that cut main-thread work per connect/disconnect
from ~200-300ms down to ~20-50ms. No semantic change: data on disk is bit-
identical to before, the same bytes just get serialized on a background thread
instead of the server thread.
(1) DeferredPlayerSnapshot — move item NBT serialization off main thread
snapshotPlayerData() previously serialized 69+ ItemStacks (inventory × 36
+ armor × 4 + enderchest × 27 + offhand + cursor) via NBT → SNBT → Base64
SYNCHRONOUSLY on main thread. For a player with a full inventory of modded
items (Apotheosis attributes, Curios, Sophisticated containers) that was
100-300ms of tick freeze on every logout / SaveToFile / periodic save.
New record DeferredPlayerSnapshot holds ItemStack.copy() clones + already-
serialized strings for the small fields (effects, curios, accessories,
cosmetic armor, attachments — they either need live entity state or are
small). Its materialize() method performs the heavy NBT work and returns
a fully-populated PlayerDataSnapshot — callers now invoke it from the BG
executor immediately before writeSnapshotToDB, so main thread returns in
milliseconds.
All 6 callers updated: onPlayerSaveToFile, onServerShutdown per-player,
emergencyFlushAll (shutdown hook), onPlayerLogout, onServerTick staggered
auto-save, onPlayerDeath. The shutdown-hook path materializes inline
(single-threaded by design) which is fine — the pool is already draining.
(2) Container-close loop early-return
onPlayerLogout force-closes any other player's menu that references the
disconnecting player's inventory (anti-dup safeguard). Previously we
iterated the full player list + their menu slots unconditionally. Now
a fast any-foreign-menu-open? probe exits the loop before the slot scan
when the server is empty or nobody has someone else's container open
(overwhelmingly the common case). Saves 1-5ms per logout on idle servers.
(3) PeriodicSaveService now feeds the staggered queue
Previously PeriodicSaveService.tick() called snapshotAndQueueSave for
every online player inside a single server.execute block — dumping
35 snapshots into one tick every 10 minutes and causing the visible
periodic lag spike.
New flow: the tick handler calls VanillaSync.enqueueAllOnlineForStaggered
Save(server) which appends online players to the SAME autoSaveQueue that
onServerTick drains one player per tick. 35 players now snapshot over
35 ticks (1.75s at 20 TPS) with ~30-50ms peak per-tick cost (after
Phase 18 #1). Dedupe check keeps duplicate triggers from double-enqueuing.
Anti-dup / anti-loss guarantees (Phase 15 / 2-phase commit) unchanged.
Behavior is bit-for-bit identical; only the timeline of work shifts from
foreground to background. Observability logs kept at INFO for periodic
ticks, DEBUG for per-player enqueue details.
Two quality-of-life peaufinages.
(1) Advancements file mtime cache [VanillaSync.snapshotPlayerData]
Each snapshot previously called Files.readAllBytes() on the player's
advancement JSON — a main-thread disk read of 1-50ms depending on
storage. On a 35-player server with periodic saves + SaveToFile every
autosave tick, this adds up.
New advancementsFileCache (ConcurrentHashMap<absPath, (mtime, content)>):
check the file's lastModified() first; reuse the cached string when
mtime is unchanged. PlayerAdvancements.save() still flushes pending
changes, and Minecraft only touches the file when something actually
changes — so mtime is a reliable staleness signal. Cache is
process-wide (paths include player UUID so no cross-contamination).
Expected impact: -5 to -30ms off main-thread snapshot for idle-ish
players, zero for players who just earned advancements.
(2) Log spam reduction
The restore/save paths chatter one INFO line per item (backpack / SS /
RS2 disk / accessories / cosmetic / attachments). On a server with
multiple players, each with multiple storage items, this floods
sync.log with per-UUID noise that has zero diagnostic value once the
'it's working' phase is past.
Demoted to DEBUG:
- [restore-backpack] uuid=X nbt_keys=N cleared_via=api
- [restore-ss] uuid=X nbt_keys=N
- Storing backpack data for player X
- Saved backpack data for UUID X
- Scanning inventory for Sophisticated Storage items for player X
- Saved Sophisticated Storage item data for UUID X
- Saved RS2 disk data for UUID X via save() NBT
- Saved RS2 disk data for UUID X via codec reflection
- Restored RS2 disk data for UUID X
- Restored Accessories data for player X
- Restored CosmeticArmor data for player X
- Restored NeoForge attachments for player X
Kept at INFO (per-save summaries):
- Saved N RS2 disk(s) via direct codec (player-scoped)
- Saved N RS2 disk(s) via legacy full-save fallback
- Logout save completed for player X in Nms
- Sync data for player X completed in Nms
- [perf-logout] core=Xms backpacks=Yms ss=Zms rs2=Wms total=Nms
- [emergency-flush] flushed N players
Net effect: sync.log goes from ~10 lines per cross-server transfer
to ~3. Still full diagnostic trace available with log_level=DEBUG.
Unchanged behavior, faster snapshots, cleaner logs.
User report: 'Je veut juste que ça prenne en compte les disks que le joueur à
dans l inventaire' — confirming the rs2=1000ms+ observed in [perf-logout]
breakdowns. The old path serialized every disk registered in the world's RS2
SavedData via sd.save() then searched the resulting blob for the player's
UUIDs. On a populated server with hundreds of disks across storage networks
this single call dominated logout latency.
New path (Phase 16):
- Call repo.get(uuid) for each disk the player carries — Optional<SerializableStorage>.
- Encode the single disk via the SAME map codec RS2 uses for its full
save, but with a one-entry Map<UUID, SerializableStorage>. Extract the
inner {type, capacity, resources} CompoundTag — same format the existing
restoreRefinedStorageDisks decodes back into repo.set().
- Complexity drops from O(world disks) to O(player disks carried).
Codec caching:
- Added RS2_MAP_CODEC_CACHE (volatile, double-checked) and a
getOrCreateRS2MapCodec helper. Resolution via reflection happens once
per JVM; both save and restore now share the same cached instance.
Fallback preserved:
- If codec resolution fails (different RS2 version) or produces no entries,
falls through to the old sd.save() path. No regression for existing
deployments that worked before.
Expected impact:
- Player with 4 disks on a server with 200 disks in networks:
rs2= ~1000ms (full sd.save) -> rs2= ~60-100ms (4 repo.get + encode)
- Zero behavior change for the wire format — restore path reads exactly
the same {type, capacity, resources} inner tag.
Unchanged: anti-dup guards, batching via saveBackpackSnapshots, all other
mod-compat paths.
The definitive fix. Previous phases played whack-a-mole with races because
the DB schema lacked the one signal needed to distinguish the four cross-
server join states: clean, save-in-progress, active-session, ghost-session.
New column: player_data.logout_started_at BIGINT NULL
- Set to System.currentTimeMillis() by the peer server when it submits
its async logout save.
- Cleared (to NULL) atomically by writeSnapshotToDB(setOffline=true)
inside the same UPDATE that sets online=0. So a joining server sees
either 'save in progress' (recent timestamp) or 'no save' (NULL) with
no race window.
Auto-migration: PlayerSync.onServerStarting ADD COLUMN IF NOT EXISTS at
boot. Existing deployments pick it up transparently; rows written by an
older version simply have NULL (treated the same as 'no save in progress').
doPlayerJoin poll rewritten as a decision matrix:
online=0 -> CLEAN. Claim instantly.
online=1 AND last_server=self -> already ours. Proceed.
peer heartbeat stale -> peer process dead. Force-claim.
logout_started_at recent (< 10s) -> peer saving, wait briefly.
logout_started_at stale (> 10s) -> save thread died. Force-claim.
online=1 AND logout_started_at NULL -> active session OR rare ghost.
Brief 2s grace, then force-claim.
Claim is an atomic CAS:
UPDATE SET last_server=?, online=1, logout_started_at=NULL
WHERE uuid=? AND (online=0 OR last_server=?)
If two servers race, the loser sees 0 rows affected, logs, and kicks its
own connection with 'another server is finalizing your save, please
reconnect' — the winner's data stays intact.
Behavior promises:
- Clean logout -> rejoin: poll exits on first iteration, claim succeeds.
Restore starts immediately. Typical latency < 200ms end-to-end.
- Logout mid-save: peer commits within ~1s. Poll sees logout_started_at
set, waits one or two cycles, sees online=0 + NULL, proceeds with
FRESH data. Zero duplication.
- Ghost session (crash, network drop, proxy bypass): poll sees
logout_started_at NULL on a live peer -> force-claim after 2s.
Peer can never overwrite us later thanks to the last_server guard.
- Truly dead peer: stale heartbeat short-circuit, instant force-claim.
- Two servers joining the same UUID: CAS ensures only one claim sticks.
Side effect: RACE log spam reduced further (poll almost always exits in
one or two iterations).
Unchanged: kick_when_already_online cached-check logic (still uses the
doPlayerConnect pre-cache). RS2 batching (Phase 13). Heartbeat, pool
stats, admin commands, inventory viewer.
The real root cause of 'inventory appears 30-60s after connect' — and it
had nothing to do with ghost sessions or heartbeat thresholds.
Reproduction (2026-04-22 07:43-07:45 production logs):
07:43:41 Server 1: LOGOUT 95d0db86 completed in 959ms
-> DB state: online=0, last_server=1708833664 (atomic UPDATE)
07:44:00 Server 2: player 95d0db86 connects
07:44:00.x onPlayerLoggedInKickCheck executed
-> executor.execute(UPDATE SET online=1 WHERE uuid=?)
-> DB state: online=1, last_server=1708833664 <-- BUG: we wrote 1
07:44:00.y doPlayerJoin poll: SELECT online, last_server
-> sees online=1, last_server=1708833664
-> 'Waiting for server 1708833664 to finish saving' for 60s
07:45:01 poll times out at 120/120, restore completes in 61219ms
Server 2 was waiting for Server 1 to flush its save — but Server 1 had
ALREADY flushed 19s earlier. Server 2's own kick-check UPDATE had
overwritten the online=0 flag with online=1, then the poll misread that
same flag as proof the peer hadn't finished.
Fix:
- onPlayerLoggedInKickCheck no longer writes online=1. The kick
decision itself (based on cached state from doPlayerConnect) is
preserved — only the trailing 'mark this player as on our server'
UPDATE is removed (it ran via executor.execute and raced the poll).
- doPlayerJoin's claim UPDATE now sets BOTH last_server=self AND
online=1 atomically:
UPDATE player_data SET last_server=?, online=1 WHERE uuid=?
This is the single source of truth for 'player is now here'. It
runs AFTER the poll has observed the true peer state, so no
race is possible.
Net effect: cross-server joins complete in ~1s (the peer's save duration)
instead of 60s. Zero behavior change for kick_when_already_online=true
rejection — that path uses the cached state, not the flag.
The two earlier knobs (join_peer_alive_max_wait_seconds, Phase 13 RS2
batching) are unrelated and still apply.
User report: Phase 13's 15s force-claim default reopened a rare duplication
scenario. If the peer's async save is slow (DB under load, big batch) and
commits AFTER we force-claim at 15s, the peer's pre-logout data change (item
drop, deposit) is read STALE by our side while the ItemEntity it spawned is
already in the peer's world. The player can re-interact with the peer's
world and pick up the duplicate.
Fix: raise join_peer_alive_max_wait_seconds default from 15 to 600, which
is longer than the natural 60s poll loop. Net effect: never force-claim on
an alive peer — wait the full poll for online=0, which only comes after
the peer's atomic data+online=0 UPDATE commits. Zero duplication window.
Admins who specifically want faster ghost-session handling can lower the
value in config and accept the trade-off.
Stale-heartbeat peers (no ping for > peer_stale_threshold_seconds = 60s)
still short-circuit instantly via isPeerServerStale() at the top of the
poll — that path is unaffected and remains safe (heartbeat freeze means
the peer process is actually gone).
The RS2 batching from Phase 13 remains (unrelated pure perf). Logout now
collapses N sequential REPLACE INTO calls into one batched transaction,
dropping rs2=500ms to rs2=~50ms in [perf-logout] breakdowns.
Two targeted fixes based on the 2026-04-22 06:26+ production log run.
(1) RS2 disk writes: one batched transaction instead of N sequential REPLACE INTOs
Every logout [perf-logout] line showed the same pattern:
core=72ms backpacks=6ms ss=5ms rs2=523ms total=606ms
core=56ms backpacks=4ms ss=1ms rs2=391ms total=452ms
core=77ms backpacks=3ms ss=1ms rs2=409ms total=490ms
RS2 dominated the save path. Backpacks + SS were already batched via
saveBackpackSnapshots since Phase 7, but saveRS2DisksByLevel still
looped saveStorageContents (one REPLACE INTO per disk).
Fix: collect every disk's NBT into Map<UUID, CompoundTag> first, then
delegate to saveBackpackSnapshots (same table, same batched transaction
path with per-entry fallback on failure). Expected ~10x reduction in
rs2= duration for players with 3-4 disks.
(2) Ghost-session force-claim: absolute 15s cap instead of stale-heartbeat-only
Fresh field logs showed the exact scenario Phase 10 left unsolved:
06:26:43 RESTORE started for 95d0db86
06:27:44 RESTORE completed in 60627ms (full poll timeout)
06:58:16 RESTORE started for 5d582bbc
06:59:17 RESTORE completed in 61630ms (full poll timeout)
The peer's heartbeat was always fresh (age 2-28s, well under the 60s
stale threshold), so Phase 11's 'only force-claim if stale' gate never
fired — the loop ran the full 120 attempts. Meanwhile [perf-logout]
proves real saves commit in < 1s, so a peer that hasn't flushed after
15s is a ghost session (player disconnected uncleanly, flag stuck at
online=1). Waiting another 45s for a save that isn't coming is pure
UX cost.
Fix: after join_peer_alive_max_wait_seconds (default raised from 5 to
15), force-claim unconditionally. Safe because:
- 15s is 15x the max observed save time — real saves are always
committed to DB by then.
- Phase 2's last_server guard already blocks any late write from
the ghost session (the guard logs [GUARD] on the peer's side).
- Phase 10 duplication scenario (force-claim before peer's async
save commits) can no longer happen with this safer threshold.
Peer-truly-stale short-circuit (heartbeat > 60s old) still triggers
instantly via the isPeerServerStale() check at the top of the loop —
only the 'peer alive but player ghost' path changed semantics.
Plugs Phase 12 helpers into the restore path. The apply phase now:
1. Before calling doBackPackRestore / restoreSophisticatedStorageItems /
restoreRefinedStorageDisks, scans the player's inventory to collect
every storage UUID (backpacks + SS + RS2 disks) — gated by the
sync_backpacks and sync_refined_storage toggles.
2. Issues ONE batched SELECT via prefetchStorageContents(uuids)
returning Map<UUID, CompoundTag>.
3. Installs the map in ThreadLocal PREFETCH_CACHE via
setStoragePrefetchCache().
4. Runs the existing restore methods unchanged. Inside, the shared
restoreStorageContents() helper consults PREFETCH_CACHE first — a
hit skips the DB round-trip entirely.
5. Always clears the cache in a finally block to avoid leaking stale
data to subsequent restores on the same executor thread.
Measured impact (from Spark profile + log timestamps):
- Player with 3 backpacks + 2 shulkers + 4 RS2 disks: 9 sequential
MySQL SELECTs collapsed into 1 batched query.
- Main-thread blocking on DB during apply drops from ~150-300ms to
~20-40ms on typical HikariCP + local MySQL latency.
- Zero behavior change: cache miss falls back to the same DB query
path as before, and clear-before-restore / setContents logic is
unchanged.
restoreStorageContents() now transparent: the prefetch cache is a
performance layer under the same public API. No downstream code
needed to change.
Spark profile confirmed 'restoreSophisticatedStorageItems' and its single-item
helpers as hot paths on the server main thread. The prior restore did:
for each backpack/shulker/disk in the player's inventory:
SELECT backpack_nbt FROM backpack_data WHERE uuid = ?
deserialize
apply
With a player carrying 3 backpacks + 2 shulkers + 4 RS2 disks this was
9 sequential blocking SELECTs on the main thread — adding ~9 round-trips
of MySQL latency to the restore window.
Adds two helpers:
ModsSupport.prefetchStorageContents(Collection<UUID>)
→ single SELECT with WHERE uuid IN (?,?,?,...) returning a
Map<UUID, CompoundTag>. Shares the parsing path (BNBT: prefix,
legacy Base64, snbt fallback) with restoreStorageContents so
any serialization quirk handled there is handled here.
ModsSupport.collectBackpackUuids(Player, includeEnderChest)
→ UUID-only scan without any DB work, used by the restore path
to build the prefetch list.
No behavior change yet — the helpers are wired in a follow-up commit
that plugs them into doPlayerJoin's apply phase.
Production logs (2026-04-22 05:41-05:44) revealed two Phase 10 regressions:
Bug A: force-claim on healthy peer due to wrong heartbeat threshold.
The 'frozen heartbeat' check compared the peer's last_update age to
PEER_ALIVE_MAX_WAIT_MS (5s by default), but HeartbeatService ticks
every 30s. Between ticks the peer's last_update is naturally 0-30s old.
Sample lines that triggered false positives:
'heartbeat frozen 5380ms, waited 5046ms — force-claiming'
'heartbeat frozen 8935ms, waited 5140ms — force-claiming'
'heartbeat frozen 5879ms, waited 5135ms — force-claiming'
Every cross-server join misclassified a healthy peer as dead and
force-claimed ~5s into the wait, making the 13.7s 'first restore'
observed in the logs. Worse, force-claiming before the peer's async
logout save commits is exactly the duplication scenario the Phase 10
commit went to great pains to avoid.
Fix: compare peer age against PEER_STALE_THRESHOLD_SECONDS (60s default).
Matches the existing isPeerServerStale() semantics — a peer is frozen
only when it has genuinely stopped heartbeating, not just between ticks.
Log now shows both numbers: 'heartbeat stale Xms > Yms, waited Zms'.
Bug B: RACE log spam.
The last_server poll logged a line every 500ms — up to 120 lines per
cross-server join with no new information after the first few. With
multiple concurrent joins this made sync.log unreadable. Now the RACE
line only fires every 10 attempts (every 5s at default interval),
plus the decision points (heartbeat-stale force-claim, slow-peer warn).
Also routes [perf-logout] breakdown to sync.log via SyncLogger.perf
so field reports include the core/backpacks/ss/rs2 split — we were
logging it only to server.log which admins rarely forward.
Two critical diagnostic/correctness improvements after user field report:
- '20s latency between inventory syncs with a full test inventory'
- 'duplication on throw + deposit in chest'
- 'bad sync on fast inter-server transfer if disconnect too quickly after modification'
(1) Real durations — 'completed in 0ms' was a lie
Every SyncLogger.saveCompleted / restoreCompleted call hardcoded 0 for the
duration field. The log line always showed 'in 0ms' regardless of actual
latency, making the user's 20s-latency reports impossible to reproduce from
logs alone. Fixed across all 4 save paths (LOGOUT / SHUTDOWN / DEATH /
EMERGENCY_FLUSH) and the RESTORE path. Durations are measured from the
start of the BG task (or the start of the restore lock acquisition) to
just before the success log line.
New info log 'Logout save completed for {uuid} in {n}ms'
New warn log '[perf-restore] slow restore for {uuid} ({n}ms)' above 1s
New info log '[perf-logout] core=Xms backpacks=Yms ss=Zms rs2=Wms total=Nms'
above 200 ms — breakdown so we can pinpoint which downstream write
takes the time in the reported 20s cases.
(2) Phase 9 force-takeover could CAUSE duplication
Phase 9 aimed to fix 30-60s join waits when the previous server was alive
but the player was ghost-online there. It force-claimed after 5s. But if
the peer was mid-way through a LEGITIMATE logout save (which is atomic
with online=1 -> online=0 via writeSnapshotToDB setOffline=true), force-
claiming before that commit read STALE DB data and restored the player
from the PRE-disconnect state — e.g., an item the player dropped just
before disconnect came back in inventory, duplicating with the ItemEntity
the peer had already spawned in the world.
Fix: the wait cap is now ADVISORY, not a hard force-claim. Past the cap,
we only force-claim when the peer's heartbeat has FROZEN (age > cap ms)
— meaning the peer's process is actually dead or stuck mid-tick, not
just slow to flush. If the peer is still heartbeating normally, we keep
waiting: writeSnapshotToDB + online=0 is an atomic UPDATE, so the flush
WILL land, we just need to be patient. A warn line every 20 attempts
(10s at default interval) tells admins the save is taking a long time
so they can profile the peer's DB connection.
New helper peerHeartbeatAgeMs(id) returns age in ms, Long.MAX_VALUE if
the peer has no heartbeat row. Used to decide force-claim vs keep-waiting.
Reproduction (from production logs, 2026-04-22):
02:54:13 - 02:54:44 player 724b9ff8 waits 30s for server 1708833664 (60 attempts)
02:54:31 - 02:55:02 player 46284b41 waits 30s for server 0 (zombie)
05:10:53 - 05:11:55 player 95d0db86 waits 62s for server 1708833664 (120 attempts)
05:10:59 - 05:12:01 player 724b9ff8 waits 62s for server 1708833664 (120 attempts)
User report: 'un joueur se connecte et son inventaire s'affiche 30 secondes
après sa connexion'.
Root cause: doPlayerJoin's last_server poll waits for the previous server to
clear online=0. If the peer is alive (heartbeat fresh) but the player is
ghost-online there (proxy bypass, network drop, or actively playing on the
other server without clean logout), the peer NEVER flushes → we wait the
full join_poll_max_attempts * join_poll_interval_ms (60s default) for
nothing. Meanwhile the player sees an empty inventory on this server.
The zombie-peer short-circuit already handled dead peers. This commit adds
the complementary case: ALIVE peers with a stuck session.
Fix:
- New config key join_peer_alive_max_wait_seconds (default 5, range 0-600).
- When the peer's heartbeat is fresh but player.online is still 1,
wait at most this many seconds, then force-claim ownership by setting
online=0 AND last_server=self.
- The peer will be prevented from overwriting us: writeSnapshotToDB
already has the last_server guard (added in Phase 2) which blocks any
future save the peer issues for this player — they see a GUARD log
and skip downstream backpack/SS/RS2 writes.
- Default 5s is a reasonable trade-off: legitimate slow flushes complete
within that window, ghost sessions don't block the player 60s+.
- Set to 0 to force-claim immediately (most aggressive, best for proxies).
- Set high to restore the legacy behavior (wait full poll length).
Also removed the per-tick 'Player X still being saved...' LOGGER.info line
that was spamming the Minecraft server log every 500ms during a ghost wait
— the SyncLogger.raceCondition entry already captures the same information
in the dedicated sync.log and avoids polluting server.log with 120+ lines
per join.
New op command to pretty-print a player's stored inventory from the DB.
Works on offline players — reads the serialized columns directly and
deserializes each slot through the same deserializeAndCreatePlaceholderIfNeeded
path used by the normal restore.
Usage:
/playersync inventory <player> — everything (main + armor + ender + curios)
/playersync inventory <player> main — 36-slot hotbar + main inventory only
/playersync inventory <player> armor — 4 armor slots (0=boots, 1=legs, 2=chest, 3=helm)
/playersync inventory <player> ender — 27 ender chest slots
/playersync inventory <player> curios — Curios slots (funct + cosmetic), composite-keyed
Output per section lists only non-empty slots:
[5] minecraft:diamond_sword x1
[8] sophisticatedbackpacks:backpack x1 (Gilded Backpack)
[cos🔙0] [placeholder] minecraft:paper x1 <- cross-server missing mod
Placeholder items (items from a mod not loaded on this server) are tagged
[placeholder] in magenta so admins can see at a glance which slots contain
'travelling' items. Parse errors on a single slot don't break the listing —
the affected slot shows <parse error: ClassName> and the rest continues.
Help listing updated. No other behavior changed.
If the admin installs PlayerSync without configuring a reachable database,
onServerStarting used to throw SQLException and either crash the server or
spam a raw JDBC stack trace with no guidance. Now the whole init is wrapped
in a single try/catch that prints a large, readable banner to the console:
- What failed (root cause summary, message truncated to 180 chars)
- Current config values (host, port, user, db, password status)
- A 5-step checklist:
1. Is the DB reachable (telnet / mysql CLI hints)
2. Is the password still the default placeholder
3. Docker compose up for local dev
4. GRANT + bind-address reminders
5. How to skip PlayerSync entirely for a session
- Then the full stack trace for bug reports.
The server keeps booting — sync operations will no-op until the DB comes
back. Avoids the 'server crashed, no idea why' experience for first-time
users.
Detection of placeholder credentials (password == 'pleaseChangeThisPassword'
or host == 'localhost') also emits a WARN line up-front so the tutorial
context is primed even when the connection itself would have succeeded.
Enables co-installation with arcadia-lib2 which embeds
HikariCP in [5.1.0, 6.0.0)
mysql-connector-j in [8.3.0, 9.0.0)
Before: PlayerSync declared its embedded libs with no range, only the
exact version (9.3.0 / 5.1.0). When another mod declared a range that
did not include our exact version, NeoForge's jarJar resolver had no
valid overlap and would either refuse to load or arbitrary-pick one
version, risking runtime breakage.
After:
- mysql-connector-j: strictly [8.3.0, 10.0.0), prefer 9.3.0.
Intersects arcadia-lib's [8.3.0, 9.0.0) — resolver picks 8.3.0
when both mods are present. 8.3.0 and 9.3.0 share the same
Connection / PreparedStatement / ResultSet APIs we actually use,
so downgrade is safe.
- HikariCP: strictly [5.1.0, 6.0.0), prefer 5.1.0. Identical to
arcadia-lib's declared range — shared single instance.
No code changes — only the metadata shipped in META-INF/jarjar/metadata.json.
Verified via unzip -p that the range is correctly emitted.
The Phase 8 refactor moved the connection keys (host, password, Server_id,
etc.) from [general] into a new [connection] section. On servers with an
existing playersync-common.toml this would silently reset:
- host to 'localhost'
- password to 'pleaseChangeThisPassword'
- Server_id to a new random value
The last one is the worst: every player_data row with last_server=<old_id>
would momentarily point to a zombie peer until the next heartbeat tick.
Fix: move every key that already existed in 2.1.4 configs back into
[general]. Only genuinely new keys (save_triggers, sync_toggles,
performance, safety, observability) stay in their new sections. Existing
users upgrading see their old [general] block load correctly; the new
sections get created with defaults on first boot and don't wipe anything.
Also adds modid=PlayerSync.MODID to CommandInit's @EventBusSubscriber
so RegisterCommandsEvent is guaranteed to fire under our mod's bus scope.
Based on a fresh audit against the Arcadia V2 modpack (444 mods, including
Curios + Accessories + SophisticatedBackpacks/Storage + RS2 + Cosmetic
Armor Reworked). Three perf wins + two opportunistic fixes.
Perf
- Heartbeat period 10s -> 30s. Paired with the 60s staleness threshold
this keeps failure-detection latency unchanged while cutting 3x the
server_info UPDATE traffic per server.
- Per-player hash-skip for unchanged snapshots (SaveToFile + staggered
auto-save). computeSnapshotHash() rolls over inventory/equipment/
enderchest/effects/xp/health/food/mod-data; when an auto-save produces
the same hash as the last successful write, the BG task returns early
and no UPDATE hits MySQL. Idle-server reduction is >95%. Logout /
shutdown / death never use the skip and refresh the hash on success
so post-logout rejoin doesn't wrongly skip.
- Batched backpack + SS saves. saveBackpackSnapshots / saveSSSnapshots
now build one transaction via executeBatchTransaction instead of
N sequential REPLACE INTO calls. A player with 3 backpacks + 2
shulkers drops from 5 network round-trips to 1 per logout save.
Per-entry fallback preserved on transaction failure.
- Periodic-save tick short-circuits when the player list is empty —
no main-thread hop, no log line, no DB heartbeat on empty servers.
Compat notes (no code change needed)
- CosmeticArmours (modid=cosmeticarmoursmod) items are worn in vanilla
armor slots (Helmet / Chestplate / Leggings / Boots inner classes) —
already captured by the core armor[] serialization. No handler needed.
- CosmeticWeapons uses the same pattern via main hand / offhand — also
already covered by core inventory serialization.
Cleanup
- removePlayerLock now also clears the hash cache so a player who
fully logged out doesn't leave a stale hash behind.
SyncLogger additions
- containerForceClosed(uuid, reason)
- modCompatSkip / modCompatSaved / modCompatRestored (per-mod tracing)
- storageSave(storageUuid, kind, detail) for backpack/SS/RS2 lines
- poolStats(exec active/queue/idle, hikari active/idle)
- warnPlayer / nbtAnomaly generic helpers
PoolStatsReporter.java
- Dedicated single-thread daemon scheduler, 5-min cadence.
- Reads VanillaSync.executorService stats via reflection.
- Reads HikariCP MBean via new JDBCsetUp.getPoolMXBean().
- Emits WARN logs when executor queue > 400/512 or Hikari active >= 14/15
so admins see saturation trends before they become outages.
JDBCsetUp.getPoolMXBean()
- Public accessor for the HikariCP pool MBean. Returns null when pool
is uninitialised / closed.
Wire-in: PlayerSync.onServerStarting starts the reporter, onServerShutdown
stops it before pool close.
Instrumentation
- VanillaSync.onPlayerLogout logs containerForceClosed for self + viewer
containers.
- ModCompatSync.snapshotAccessories logs modCompatSkip when cap==null.
Adds two new triggers that complement NeoForge's vanilla SaveToFile event:
PeriodicSaveService.java
- Dedicated single-thread daemon scheduler, started after server boot.
- Ticks every 'auto_save_interval_minutes' (config, default 10 min).
- On each tick: hops to main thread, snapshots every online synced
player via VanillaSync.snapshotAndQueueSave, async BG writes with full
P0 guard stack (pendingLogoutSaves + online=0 + bgLock tryLock).
- Set interval to 0 to disable.
VanillaSync.snapshotAndQueueSave(Player, String label)
- Extracted from onPlayerSaveToFile body; public entry point shared by
PeriodicSaveService, onPlayerChangeDimension, and the existing SaveToFile
event. Label flows into logs for traceability (SaveToFile / PERIODIC / DIMENSION).
VanillaSync.onPlayerChangeDimension
- New @SubscribeEvent on PlayerChangedDimensionEvent, gated by
'save_on_dimension_change' config (default false). Queues a full save
when a player teleports across dimensions, protecting against mid-
teleport crashes.
JdbcConfig
- Added AUTO_SAVE_INTERVAL_MINUTES (int, 0-1440, default 10)
- Added SAVE_ON_DIMENSION_CHANGE (bool, default false)
VanillaSync.onServerShutdown also stops PeriodicSaveService before the pool
close, same pattern as HeartbeatService.
Adds three utilities to harden PlayerSync against ungraceful server exits:
CrashRecovery.java
- installShutdownHook: registers a non-daemon JVM shutdown hook that calls
VanillaSync.emergencyFlushAll() synchronously when the process is killed
(SIGTERM, kill, OOM, host reboot). Covers the case where the normal
ServerStoppingEvent path never runs.
- clearOrphanedOnlineFlags: on startup, clears any online=1 player_data
rows pointing to this server_id (left by a previous crash). Reports the
count via SyncLogger so admins can see recovery activity.
- reportZombiePeers: logs peer server_ids whose heartbeat is missing or
stale (>60s), exposing the root of doPlayerJoin poll timeouts.
HeartbeatService.java
- Single-thread daemon scheduler pinging server_info.last_update every 10s.
- Lets peer servers distinguish live from dead via isPeerServerStale().
- Stopped explicitly in VanillaSync.onServerShutdown before pool close.
VanillaSync.emergencyFlushAll()
- Synchronous best-effort flush for every online player. No executor, no
locks — the server is dying, we just want data on disk. Writes player_data,
backpacks, SS, RS2 directly; logs SAVE/SKIPPED/FAILED per player via
SyncLogger so post-mortem analysis is possible.
PlayerSync.onServerStarting wires the four new calls after table init.
Fixes the production issue where players remained online=1 forever after
kill -9 and the 30s poll timeouts waiting for zombie server_ids.
P0-1: Backpack/SS clear-before-restore now has a belt-and-suspenders
reflection fallback if the public removeBackpackContents / removeStorageContents
API fails. setBackpackContents / setStorageContents receive a defensive NBT
copy to prevent upstream from mutating the cached snapshot.
P0-2: writeSnapshotToDB now returns a boolean. When the last_server guard
blocks the core player_data UPDATE (another server claimed the player),
the downstream backpack / SS / RS2 saves are skipped instead of overwriting
the claiming server's rows. Affects logout, shutdown, staggered auto-save,
and death-save paths.
P1-1: StoreCurios now aborts when the Curios capability is unavailable
(dead player, mod init race) instead of writing an empty flatMap that
would wipe the DB row.
P1-3: doPlayerJoin last_server poll raised 60→120 attempts (30s→60s)
and gained a zombie-server short-circuit: if the peer server_id is 0
(legacy / corrupted), or its server_info heartbeat is older than 60s,
the poll takes over immediately and force-clears the orphaned online=1.
Fixes the user-observed 'attempt 60/60' loops on server_id=0 and stale
heartbeats.
Staggered auto-save and death-save BG tasks also gained the P0-a/b/c
guards introduced in bea5f80 (pendingLogoutSaves + online=0 DB check).
Root cause: auto-save BG task queued before logout could acquire bgLock and
write a stale snapshot AFTER the logout BG task had committed fresh data +
online=0. On reconnect, the stale inventory was restored while the dropped
ItemEntity remained on the ground -> duplication.
Three-layer guard applied to onPlayerSaveToFile and onLivingDeath BG tasks:
1. Early skip if pendingLogoutSaves contains the player (before tryLock)
2. Re-check pendingLogoutSaves after acquiring bgLock (race window)
3. SELECT online from player_data before write; skip if online=0
Logout BG task now acquires bgLock via .lock() (blocking) so concurrent
auto-save / death-save tasks using tryLock either skip cleanly or wait.
removePlayerLock reordered before bgLock.unlock so late auto-save BGs see
containsKey=false and skip.
Root cause of backpack duplication: Sophisticated Backpacks'
setBackpackContents merges shallowly when the UUID exists, so stale
sub-tags survived every restore. doBackPackRestore now calls
removeBackpackContents before setBackpackContents for a clean replace.
Curios cosmetic stacks (getCosmeticStacks) are now snapshotted, applied,
restored and cached on all paths. Old-format rows without the "cos:"
prefix still parse unchanged, so existing DB data is preserved on upgrade.
closeContainer no longer matches by class-name substring (was closing
unrelated mod menus containing "curio"/"accessor"). Only menus whose
slots reference the disconnecting player's inventory/ender-chest are
closed.
Thread-safety: Sophisticated Storage contents are now snapshotted on the
main thread (snapshotSSData + saveSSSnapshots) instead of read from a
background thread racing with world ticks.
Event priority / defensive guards:
- onPlayerDeath is now EventPriority.LOW and skips cancelled events so
Revive Me / Corail Tombstone's cancel runs first.
- onServerStarting short-circuits on integrated (single-player) servers
to avoid noisy MySQL connection attempts.
Observability:
- executeBatchTransaction now returns per-statement row counts.
- writeSnapshotToDB calls SyncLogger.guardBlocked when the core UPDATE
silently no-ops (another server claimed last_server).
- SyncLogger uses a daemon scheduler that flushes every 500 ms; shutdown
happens after parallel saves so final save logs are no longer dropped.
- Rollback failures inside executeBatchTransaction and
refreshInventoryForInputOutput are now logged instead of swallowed.
HikariCP retuned: maxPoolSize 25->15, connectionTimeout 30->10s,
idleTimeout 600->300s, leakDetectionThreshold 10->25s (covers worst-case
join polling without log spam).
New table_prefix config option (Tables helper) lets a user share one
MySQL database with other mods without table-name collisions. Default
is empty to preserve backward compatibility.
Reflection Methods for NeoForge AttachmentHolder are resolved once in
a static initializer and cached.
Chat sync and Cobblemon integration removed:
- Chat sync: 319 LoC of socket/thread code guarded by a config flag that
defaulted to false; orphaned config keys are silently ignored by the
NeoForge ModConfig loader, so no crash on upgrade.
- Cobblemon: 297 LoC of mixins that ran synchronous JDBC on the main
thread and built SQL with raw UUID concatenation. The existing
cobblemon table in the DB is left untouched on upgrade.
Also fixes cobblemon ALTER TABLE running blindly on every boot
(alterColumnIfNeeded helper checks INFORMATION_SCHEMA first).
Author: vyrriox
New SyncLogger utility class:
- Writes to logs/playersync/sync.log (separate from MC console)
- Automatic rotation: 10MB max per file, 5 files kept
- Thread-safe: lock-free ConcurrentLinkedQueue + async flush
- Categorized log levels: INFO, WARN, ERROR, DUPE_RISK, DATA_LOSS,
RACE, PERF_SLOW, SAVE, SAVE_FAIL, SAVE_SKIP, RESTORE, EVENT, GUARD
Tracked events:
- Every player join/leave with sync status
- Every save (logout, shutdown, death, auto-save) with duration
- Save failures with error details
- Saves skipped (uncompleted sync, dead player)
- Cross-server race conditions (poll loop waiting)
- Player disconnects before sync apply (potential data loss)
- Duplicate login kicks
- Slow operations (> 50ms threshold)
Usage: check logs/playersync/sync.log on your server for diagnostics.
Look for DUPE_RISK, DATA_LOSS, RACE, SAVE_FAIL entries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CRITICAL PERF - Staggered auto-save:
- Old: all 35 players snapshotted in ONE tick → 770-3605ms MSPT spike
(15-36 second TPS drop every 5 minutes)
- New: queue filled every 5min, drained 1 player/tick → max 22-103ms/tick
- autoSaveQueue processes one player per server tick, imperceptible impact
CRITICAL PERF - Pool scaling for 35+ players:
- Thread pool: 2-8 → 4-16 threads, queue 256 → 512
Prevents CallerRunsPolicy from executing DB tasks on main thread
- HikariCP: 10 → 25 max connections, 2 → 4 min idle
Prevents connection starvation during concurrent saves
HIGH PERF - Cached kick check (eliminates main thread DB queries):
- doPlayerConnect (network thread) caches online/lastServer/serverAlive
- onPlayerLoggedInKickCheck (MAIN thread) reuses cached result
- Fast path: 1 DB query on main thread instead of 2-4
- Fallback: full DB check if cache miss (race condition safety)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backpack data loss on server crash:
- Periodic auto-save (every 5min) now includes backpack content snapshots.
Previously backpacks were only saved on logout/shutdown — hard crashes
(OOM, watchdog, kill -9) skipped both, losing all backpack changes.
- snapshotBackpackData captures NBT with .copy() on main thread.
Backpack ender chest restore mismatch:
- doBackPackRestore now scans ender chest in addition to main inventory.
Save side already scanned ender chest, but restore didn't — backpacks
in ender chest were saved to DB but never restored on join.
ReviveMe mod compatibility:
- Dead player kick check now uses health <= 0 instead of isDeadOrDying().
ReviveMe puts players in a "downed" state (alive but isDeadOrDying=true)
— previously these players were kicked on join.
Infinite effect filtering (phantom effects fix):
- Effects with infinite duration are now skipped during save. These come
from ReviveMe (downed state effects with MAX_VALUE duration), beacons,
and other mods. Syncing them across servers caused phantom effects.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CRITICAL - New player data loss (players lose everything):
- store() INSERT now includes last_server column. Without it, last_server
stayed NULL, causing ALL subsequent writes (AND last_server=?) to fail
silently — new players' data was never saved after initial INSERT.
- writeSnapshotToDB now handles legacy NULL last_server with
(last_server=? OR last_server IS NULL) and auto-claims ownership.
- Same NULL handling in writeGuardedModData for mod_player_data table.
CRITICAL - online=0 stuck at 1 (players unable to connect):
- Removed AND last_server=? from deadPlayerWhileLogging and
syncNotCompletedPlayer logout paths. These fire before doPlayerJoin
sets last_server, so the guard always failed → online stayed 1.
CRITICAL - Backpack duplication via viewer race:
- snapshotBackpackData() now captures backpack NBT on the MAIN THREAD
(not just UUIDs). Previously saveBackpacksByUuids read BackpackStorage
on an async thread — another player viewing the backpack could take
items between the main-thread refresh and the async read.
- .copy() freezes the NBT state at snapshot time.
CRITICAL - Backpacks in ender chest not synced:
- snapshotBackpackData() and doBackPackRestore now scan the ender chest
in addition to main inventory. PlayerInventoryProvider.runOnBackpacks
only scans equipment/inventory, missing ender chest backpacks entirely.
Anti-duplication - Container closing on disconnect:
- Owner's container menu is force-closed before snapshot to prevent
post-snapshot modifications by viewers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Accessories & CosmeticArmor duplication fix:
- snapshotAccessories() and snapshotCosmeticArmor() returned null when
slots were empty, causing writeModSnapshot to SKIP the write. The DB
kept stale data from when slots had items, restoring them on next join.
- Now return "{}" (like snapshotCuriosData already does), so empty state
is properly written to DB. On restore, apply*FromData clears slots
when it sees "{}" (length <= 2).
Guard remaining online=0 writes:
- deadPlayerWhileLogging and syncNotCompletedPlayer logout paths now
use AND last_server=? to prevent setting online=0 for a player that
already moved to another server.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CRITICAL FIX - Stale server overwrite prevention:
- writeSnapshotToDB now guards ALL writes with AND last_server=? so a
crashing/slow server cannot overwrite fresher data saved by another server
- Logout and shutdown saves atomically set online=0 in the SAME UPDATE as
the data write (no more gap between data write and flag set)
- ModCompatSync.writeModSnapshot guarded variant uses subquery on last_server
CRITICAL FIX - Poll loop actually waits now:
- onPlayerLoggedInKickCheck no longer sets last_server (only online=1)
- last_server is claimed AFTER the poll in doPlayerJoin completes
- This allows the poll to correctly detect and wait for the old server's
async save to finish before reading data
- Poll increased from 30 to 60 attempts (30s window)
Memory leak fix:
- Added removePlayerLock() in doPlayerJoin's outer catch block to prevent
unbounded growth of playerLocks ConcurrentHashMap on exceptions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Advancements: default to null instead of "" in snapshotPlayerData, use
COALESCE(?, advancements) in SQL so failed file reads preserve DB value
instead of silently wiping advancements every 5min periodic save
- Effects: skip saving effects when player isDeadOrDying() — Minecraft
clears effects on respawn not death, so pre-death effects were persisted
in DB and restored as phantom effects on next login
- Legacy store() also uses COALESCE(NULLIF(?, ''), advancements)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Migrate connection pool from manual LinkedBlockingQueue to HikariCP
(eliminates isValid() ping on every query visible in Spark profiler)
- Move ALL DB writes off server thread: logout uses snapshot+async+latch,
shutdown uses snapshot+CompletableFuture.allOf for parallel saves
- Pre-read curios/accessories/cosmeticarmor/attachments on background
thread during login (4-7 fewer DB queries on main thread per login)
- Auto-save interval increased to 5 minutes
- Fix pool shutdown ordering: shutdownPool() now runs AFTER all shutdown
saves complete (previously could fire before, silently losing all data)
- Fix connection leak in executeQuery/executePreparedQuery when
prepareStatement throws (leaked connections exhaust HikariCP pool)
- Fix duplication bug: saveStorageContents guard used nbt.size()<=1 which
blocked legitimately emptied backpacks from saving to DB
- Fix stale SaveToFile overwriting logout: check playerLocks.containsKey
before writing to prevent stale background task from regressing data
- Remove LIMIT 1000 on startup online=0 reset (could leave players stuck)
- Add executorService.shutdown() on server stop to prevent JVM hang
- Add apply methods (applyCuriosFromData, applyAccessoriesFromData, etc.)
to separate entity writes from DB reads for thread-safe restore
- Add UUID collectors (collectBackpackUuids, collectSSUuids) and
background save methods for snapshot+async logout/shutdown pattern
Root cause of lag (TPS 9-16, MSPT spikes to 4846ms):
PlayerEvent.SaveToFile triggered synchronous JDBC writes on the
server main thread every Minecraft autosave cycle. With 35 players
this caused hundreds of network round-trips to MySQL blocking the
tick loop for up to 4846ms (97x the 50ms limit).
Fixes applied:
- onPlayerSaveToFile: now fully async. Entity state is snapshotted
on the main thread (pure memory ops, <1ms), then ALL DB writes are
submitted to the background executor. Main thread never blocks on
MySQL again.
- snapshotPlayerData: now captures ALL entity-dependent mod data
(Curios, Accessories, CosmeticArmor, NeoForge attachments) on the
main thread. Previously these were read from a background thread
which is not thread-safe and could cause data corruption.
- writeSnapshotToDB: single method that writes all player data in one
background pass: player_data + curios + mod_player_data.
- Auto-save background task: removed ModCompatSync.storeAll(player),
storeSophisticatedBackpacks, storeSophisticatedStorageItems,
storeRefinedStorageDisks from background thread. These all accessed
entity state off-thread. Mod compat data is now in the main-thread
snapshot; backpack/SS/RS2 contents are saved on logout/shutdown.
- Added ModCompatSync snapshot API: snapshotAccessories(),
snapshotCosmeticArmor(), snapshotAttachments(), writeModSnapshot()
for clean separation of entity reads vs DB writes.
Spark showed 5.66% server thread from auto-save. Breakdown:
- store() DB write: 1.39% (already moved to background)
- StoreCurios DB write: 0.56% (was on main thread)
- storeAccessories DB write: 0.55% (was on main thread)
- storeCosmeticArmor DB write: 0.56% (was on main thread)
- storeNeoForgeAttachments DB write: 0.58% (was on main thread)
- storeSophisticatedStorage: 0.69% (was on main thread)
- storeSophisticatedBackpacks: 0.59% (was on main thread)
Changes:
1. Curios snapshot: new snapshotCuriosData() reads entity state on
main thread (fast), returns serialized string. DB write in background.
2. ALL mod saves moved to background thread lambda:
- ModCompatSync.storeAll (Accessories, CosmeticArmor, Attachments)
- Sophisticated Backpacks/Storage/RS2
3. Auto-save interval doubled: 1200 -> 2400 ticks (1min -> 2min)
4. Main thread now only does: entity snapshot (~0.3ms) + curios snapshot
Expected: ~80% reduction in main thread usage (5.66% -> ~1%)
Vyrriox
ROOT CAUSE: Sophisticated Backpacks/Storage wrappers cache inventory
in memory. When store() reads from BackpackStorage/ItemContentsStorage,
the SavedData may not have the latest wrapper state (unflushed changes).
This returns empty/default NBT which overwrites the real data in our DB.
Going back to the original server showed data because that server's
local SavedData still had the correct data (never overwritten).
FIX: saveStorageContents() now checks if the NBT is empty/minimal
before writing. If the DB already has substantial data (>50 bytes)
and the new NBT is empty, the save is SKIPPED to preserve the real
data. This prevents the empty-overwrite scenario while still allowing
legitimate saves of actual content.
Vyrriox
CRITICAL-1: online=0 moved to finally block in logout handler.
If store() threw an exception, online=0 was never written and the
player was permanently locked out of all servers.
CRITICAL-2: Same fix for shutdown handler. Any save failure during
shutdown left the player permanently stuck as online=1.
IMPORTANT: Auto-save background DB write now acquires tryLock()
before writing. If logout already saved newer data and holds/held
the lock, the stale auto-save snapshot is skipped. Prevents
overwriting correct logout data with an older snapshot.
Vyrriox
Spark showed 5.66% server thread from auto-save DB writes blocking
the tick loop (~1-2ms per player per query, ~8 queries per save).
New approach:
- snapshotPlayerData() captures ALL entity data into an immutable
PlayerDataSnapshot record on the main thread (fast, no DB I/O)
- writeSnapshotToDB() writes the snapshot to DB on the background
thread via executorService (slow DB I/O off main thread)
- Mod data (Curios, Accessories, CosmeticArmor, NeoForge attachments)
still read entity state on main thread but their DB writes happen
inline (they manage their own connections)
- Sophisticated Backpacks/Storage/RS2 saves happen during snapshot
phase on main thread (they need entity access for inventory scan)
Expected: ~60-70% reduction in main thread blocking from auto-save.
Vyrriox
Spark showed PlayerSync consuming 10.16% of the server thread, almost
entirely from DriverManager.getConnection() (TCP handshake + MySQL auth
+ USE db) called for EVERY single query. With auto-save every 60s,
each player generated ~6 new connections per save cycle on main thread.
FIX: Simple connection pool (LinkedBlockingQueue, 5 connections).
- Connections are reused instead of opened/closed per query
- isValid(2) check before reuse to detect dead connections
- returnConnection() puts connections back in pool instead of closing
- QueryResult.close() also returns to pool
- autoReconnect=true in JDBC URL for resilience
- shutdownPool() for clean server stop
- Non-database connections (startup DDL) bypass the pool
Expected improvement: ~90% reduction in MySQL overhead on server thread.
Vyrriox
ROOT CAUSE from logs:
"Invalid UUID capacity: Invalid UUID string: capacity"
"Invalid UUID resources: Invalid UUID string: resources"
We saved the INNER storage data ({type, capacity, resources}) but the
map codec expects {uuid-string: {type, capacity, resources}}.
The codec tried to parse "capacity", "resources", "type" as UUIDs.
FIX: Wrap the stored NBT back in a UUID-keyed CompoundTag before
decoding: wrapped.put(uuid.toString(), storedNbt)
Also increased sync timeout from 15s to 60s - the server was 34s
behind (691 ticks) causing timeout errors for player sync.
Vyrriox
repo.set(uuid, storage) throws IllegalArgumentException if the UUID
already exists in the StorageRepository. This happens when a player
revisits a server where the disk was previously used.
Items appeared briefly (data was decoded correctly) but then the
exception prevented the set() and the storage fell back to empty.
Fix:
- Call repo.remove(uuid) before repo.set(uuid, storage)
- If set() still fails, inject directly into the entries map via
reflection + mark SavedData dirty
- setDirty() ensures the injected data persists to disk
Vyrriox
ROOT CAUSE: getMapCodec(Runnable) returns MapCodec (not Codec).
createCodec(Runnable) returns Codec<Map<UUID, SerializableStorage>>.
Reflection on getMapCodec silently failed because the returned
MapCodec.decode() has a different signature than Codec.decode().
Both save fallback and restore codec paths now use createCodec().
RS2 uses ErrorHandlingMapCodec with UUIDUtil.STRING_CODEC for keys,
so the encoded format IS a CompoundTag with UUID strings as keys.
Vyrriox
Save side:
- save() returns data in a NEW CompoundTag (fixed in previous commit)
- Now logs full NBT structure for debugging (describeNbtStructure)
- If UUID not found in save() NBT, falls back to reflection on
internal entries map + codec.encodeStart() to serialize directly
Restore side:
- Rewritten to use raw Codec types to avoid generic compilation issues
- Decodes stored NBT via the same map codec, then repo.set() to inject
Both sides now have comprehensive logging to diagnose any remaining
format issues in production.
Vyrriox