Crash Recovery

This page explains the crash recovery system for Live Trading mode, which ensures that active trading positions survive process crashes and restarts. When a live trading bot crashes, all in-memory state is lost. Without crash recovery, open positions would be abandoned with no monitoring of take profit (TP) or stop loss (SL) conditions, leading to uncontrolled risk exposure and potential financial losses.

The crash recovery system persists signal state to disk and restores it on restart, preserving critical timing information (pendingAt, scheduledAt) to maintain correct signal lifetime calculations. For information about the overall Live Trading execution flow, see Live Execution Flow. For details on how restored signals are monitored, see Real-time Monitoring.

Key Point: Crash recovery is only active in Live mode. Backtest mode operates entirely in-memory with no persistence, as documented in Backtest Execution Flow.


Without crash recovery, a process crash during live trading causes:

Problem Impact Financial Risk
Lost Active Signals Open positions become unmonitored No TP/SL enforcement, unlimited loss potential
Blocked Risk Limits Risk counters never decremented Cannot open new positions, strategy deadlock
Fee Accumulation Entry fees paid, but no exit trade Guaranteed loss from fees alone
Timing Reset Signal lifetime recalculated from restart Premature closure, wasted entry fees

The crash recovery system prevents all these issues by:

  1. Persisting signal state atomically before acknowledging position opens
  2. Restoring signals on restart with original timing information intact
  3. Validating restored signals to prevent stale data from wrong exchange/strategy
  4. Triggering lifecycle callbacks to notify monitoring systems of restoration

The crash recovery system uses two specialized persistence adapters:

Mermaid Diagram

Event Pending Signal Persistence Scheduled Signal Persistence
Signal Creation No write (not yet opened) Write immediately (setScheduledSignal)
Signal Open Write immediately (setPendingSignal) Delete schedule file, write pending file
Signal Active No action (already persisted) N/A
Signal Close Delete file (setPendingSignal(null)) N/A (already converted to pending)
Signal Cancel N/A Delete file (setScheduledSignal(null))

When ClientStrategy initializes in Live mode, the waitForInit() function restores persisted signals before processing any ticks:

Mermaid Diagram

Function Location Purpose
WAIT_FOR_INIT_FN src/client/ClientStrategy.ts:491-552 Main restoration logic, checks backtest mode
PersistSignalAdapter.readSignalData src/client/ClientStrategy.ts:497-509 Reads pending signal from disk
PersistScheduleAdapter.readScheduleData src/client/ClientStrategy.ts:525-537 Reads scheduled signal from disk
callbacks.onActive src/client/ClientStrategy.ts:512-522 Notifies monitoring of restored active position
callbacks.onSchedule src/client/ClientStrategy.ts:540-550 Notifies monitoring of restored scheduled signal
strategy.waitForInit() src/lib/services/connection/StrategyConnectionService.ts:154 Called before first tick

The restoration process includes critical validation to prevent stale signal data:

// Validation checks from ClientStrategy.ts:503-508
if (pendingSignal.exchangeName !== self.params.method.context.exchangeName) {
return; // Discard signal - different exchange
}
if (pendingSignal.strategyName !== self.params.method.context.strategyName) {
return; // Discard signal - different strategy
}

Why this matters: If you switch strategies or exchanges, old persisted signals must not be restored. The exchangeName and strategyName fields in persisted data act as ownership markers.


The most important aspect of crash recovery is preserving timing information. Every ISignalRow has two timestamps:

Timestamp When Set Used For
scheduledAt Signal first created Tracking total signal lifecycle, scheduled timeout
pendingAt Position activated at priceOpen Signal lifetime calculation (minuteEstimatedTime)

Without crash recovery, this timing bug occurs:

Mermaid Diagram

During Signal Activation (src/client/ClientStrategy.ts:681-774):

// Line 734: Update pendingAt when scheduled signal activates
const activatedSignal: ISignalRow = {
...scheduled,
pendingAt: activationTime, // NEW timestamp when position opens
_isScheduled: false,
};

await self.setPendingSignal(activatedSignal); // PERSIST with correct pendingAt

During Time Expiration Check (src/client/ClientStrategy.ts:901-920):

// Line 907: CRITICAL - uses pendingAt, NOT scheduledAt
const signalTime = signal.pendingAt; // Start counting from activation time
const maxTimeToWait = signal.minuteEstimatedTime * 60 * 1000;
const elapsedTime = currentTime - signalTime;

if (elapsedTime >= maxTimeToWait) {
// Close signal by time_expired
}

During Restoration (src/client/ClientStrategy.ts:497-509):

// Line 498: Read persisted signal with ORIGINAL pendingAt intact
const pendingSignal = await PersistSignalAdapter.readSignalData(
self.params.execution.context.symbol,
self.params.strategyName,
);
// pendingSignal.pendingAt is PRESERVED from before crash

Test Validation (test/e2e/timing.test.mjs:416-505):

// Line 424: Create signal that was activated 12 hours ago
const twelveHoursAgo = now - 12 * 60 * 60 * 1000;

// Line 442: Persist with pendingAt = twelveHoursAgo
pendingAt: twelveHoursAgo, // Activated 12 hours ago

// Line 478: After restoration, verify remaining time is correct
const remainingTime = expectedTime - elapsedTime;
// Should be ~12 hours remaining (not restarting from 24h)

Crash recovery is disabled in Backtest mode to maximize performance:

Mermaid Diagram

Code Path Location Behavior
waitForInit() backtest check src/client/ClientStrategy.ts:493-495 if (backtest) return;
setPendingSignal() no-op Referenced in ClientStrategy Skips file write when backtest=true
Backtest signal handling Backtest Execution Flow All signals processed in-memory

Why backtest skips persistence:

  • Backtests iterate through thousands of candles rapidly
  • Historical data is deterministic (no crashes in historical replay)
  • File I/O would slow down backtests by 100x+
  • Results are reproducible without state persistence

The persistence layer uses atomic file writes to prevent partial/corrupted data:

Mermaid Diagram

The atomic write pattern ensures:

  • No partial writes: Either full new data or old data, never corrupted
  • No race conditions: File rename is atomic at OS level
  • Safe concurrent reads: Readers always see complete JSON

Persistence Base Class: PersistBase provides the atomic write implementation (referenced in Persistence Layer).


The test suite validates crash recovery behavior using custom persistence adapters:

// test/e2e/timing.test.mjs:416-505
test("Restored pending signal preserves 24h timing from pendingAt", async ({ pass, fail }) => {
const now = Date.now();
const twelveHoursAgo = now - 12 * 60 * 60 * 1000;

// Inject custom adapter that returns a pre-existing signal
PersistSignalAdapter.usePersistSignalAdapter(class {
async readValue() {
return {
id: "restored-signal",
position: "long",
minuteEstimatedTime: 1440, // 24 hours total
pendingAt: twelveHoursAgo, // Activated 12 hours ago
// ... other fields
};
}
});

// Start live trading - signal should restore with 12h remaining
await Live.background("BTCUSDT", {
strategyName: "test-strategy",
exchangeName: "test-exchange",
});

// Verify: remaining time should be ~12h, not reset to 24h
});
Test Case File Line Range Validates
Pending signal restoration test/e2e/timing.test.mjs 416-505 pendingAt preservation, 12h remaining
Scheduled signal restoration test/spec/scheduled.test.mjs 211-362 Scheduled signal lifecycle tracking
Timing calculations test/e2e/timing.test.mjs 34-201 minuteEstimatedTime from pendingAt

Crash recovery behavior is controlled by global configuration:

Parameter Default Purpose
CC_SCHEDULE_AWAIT_MINUTES 120 Max time scheduled signal waits for activation before cancellation
CC_MAX_SIGNAL_LIFETIME_MINUTES 10080 (7 days) Max minuteEstimatedTime to prevent eternal signals
TICK_TTL 60000ms Interval between tick processing (affects restore frequency)

Related: See Timing Parameters for detailed configuration reference.


The crash recovery system is a critical safety mechanism for live trading:

  1. Persists signals atomically when positions open (src/client/ClientStrategy.ts:740)
  2. Restores signals on startup with validation (src/client/ClientStrategy.ts:497-537)
  3. Preserves timing information to prevent premature closure (src/client/ClientStrategy.ts:734-738)
  4. Triggers lifecycle callbacks for monitoring integration (src/client/ClientStrategy.ts:512-522)
  5. Disabled in backtest mode for performance (src/client/ClientStrategy.ts:493-495)

Without crash recovery, process crashes would abandon open positions, leading to uncontrolled risk exposure and guaranteed losses from entry fees. The system ensures that live trading operations are resilient to infrastructure failures while maintaining correct position lifetime calculations.