This page explains the crash recovery system for Live Trading mode, which ensures that active trading positions survive process crashes and restarts. When a live trading bot crashes, all in-memory state is lost. Without crash recovery, open positions would be abandoned with no monitoring of take profit (TP) or stop loss (SL) conditions, leading to uncontrolled risk exposure and potential financial losses.
The crash recovery system persists signal state to disk and restores it on restart, preserving critical timing information (pendingAt, scheduledAt) to maintain correct signal lifetime calculations. For information about the overall Live Trading execution flow, see Live Execution Flow. For details on how restored signals are monitored, see Real-time Monitoring.
Key Point: Crash recovery is only active in Live mode. Backtest mode operates entirely in-memory with no persistence, as documented in Backtest Execution Flow.
Without crash recovery, a process crash during live trading causes:
| Problem | Impact | Financial Risk |
|---|---|---|
| Lost Active Signals | Open positions become unmonitored | No TP/SL enforcement, unlimited loss potential |
| Blocked Risk Limits | Risk counters never decremented | Cannot open new positions, strategy deadlock |
| Fee Accumulation | Entry fees paid, but no exit trade | Guaranteed loss from fees alone |
| Timing Reset | Signal lifetime recalculated from restart | Premature closure, wasted entry fees |
The crash recovery system prevents all these issues by:
The crash recovery system uses two specialized persistence adapters:
| Event | Pending Signal Persistence | Scheduled Signal Persistence |
|---|---|---|
| Signal Creation | No write (not yet opened) | Write immediately (setScheduledSignal) |
| Signal Open | Write immediately (setPendingSignal) |
Delete schedule file, write pending file |
| Signal Active | No action (already persisted) | N/A |
| Signal Close | Delete file (setPendingSignal(null)) |
N/A (already converted to pending) |
| Signal Cancel | N/A | Delete file (setScheduledSignal(null)) |
When ClientStrategy initializes in Live mode, the waitForInit() function restores persisted signals before processing any ticks:
| Function | Location | Purpose |
|---|---|---|
WAIT_FOR_INIT_FN |
src/client/ClientStrategy.ts:491-552 | Main restoration logic, checks backtest mode |
PersistSignalAdapter.readSignalData |
src/client/ClientStrategy.ts:497-509 | Reads pending signal from disk |
PersistScheduleAdapter.readScheduleData |
src/client/ClientStrategy.ts:525-537 | Reads scheduled signal from disk |
callbacks.onActive |
src/client/ClientStrategy.ts:512-522 | Notifies monitoring of restored active position |
callbacks.onSchedule |
src/client/ClientStrategy.ts:540-550 | Notifies monitoring of restored scheduled signal |
strategy.waitForInit() |
src/lib/services/connection/StrategyConnectionService.ts:154 | Called before first tick |
The restoration process includes critical validation to prevent stale signal data:
// Validation checks from ClientStrategy.ts:503-508
if (pendingSignal.exchangeName !== self.params.method.context.exchangeName) {
return; // Discard signal - different exchange
}
if (pendingSignal.strategyName !== self.params.method.context.strategyName) {
return; // Discard signal - different strategy
}
Why this matters: If you switch strategies or exchanges, old persisted signals must not be restored. The exchangeName and strategyName fields in persisted data act as ownership markers.
pendingAt FieldThe most important aspect of crash recovery is preserving timing information. Every ISignalRow has two timestamps:
| Timestamp | When Set | Used For |
|---|---|---|
scheduledAt |
Signal first created | Tracking total signal lifecycle, scheduled timeout |
pendingAt |
Position activated at priceOpen |
Signal lifetime calculation (minuteEstimatedTime) |
Without crash recovery, this timing bug occurs:
During Signal Activation (src/client/ClientStrategy.ts:681-774):
// Line 734: Update pendingAt when scheduled signal activates
const activatedSignal: ISignalRow = {
...scheduled,
pendingAt: activationTime, // NEW timestamp when position opens
_isScheduled: false,
};
await self.setPendingSignal(activatedSignal); // PERSIST with correct pendingAt
During Time Expiration Check (src/client/ClientStrategy.ts:901-920):
// Line 907: CRITICAL - uses pendingAt, NOT scheduledAt
const signalTime = signal.pendingAt; // Start counting from activation time
const maxTimeToWait = signal.minuteEstimatedTime * 60 * 1000;
const elapsedTime = currentTime - signalTime;
if (elapsedTime >= maxTimeToWait) {
// Close signal by time_expired
}
During Restoration (src/client/ClientStrategy.ts:497-509):
// Line 498: Read persisted signal with ORIGINAL pendingAt intact
const pendingSignal = await PersistSignalAdapter.readSignalData(
self.params.execution.context.symbol,
self.params.strategyName,
);
// pendingSignal.pendingAt is PRESERVED from before crash
Test Validation (test/e2e/timing.test.mjs:416-505):
// Line 424: Create signal that was activated 12 hours ago
const twelveHoursAgo = now - 12 * 60 * 60 * 1000;
// Line 442: Persist with pendingAt = twelveHoursAgo
pendingAt: twelveHoursAgo, // Activated 12 hours ago
// Line 478: After restoration, verify remaining time is correct
const remainingTime = expectedTime - elapsedTime;
// Should be ~12 hours remaining (not restarting from 24h)
Crash recovery is disabled in Backtest mode to maximize performance:
| Code Path | Location | Behavior |
|---|---|---|
waitForInit() backtest check |
src/client/ClientStrategy.ts:493-495 | if (backtest) return; |
setPendingSignal() no-op |
Referenced in ClientStrategy |
Skips file write when backtest=true |
| Backtest signal handling | Backtest Execution Flow | All signals processed in-memory |
Why backtest skips persistence:
The persistence layer uses atomic file writes to prevent partial/corrupted data:
The atomic write pattern ensures:
Persistence Base Class: PersistBase provides the atomic write implementation (referenced in Persistence Layer).
The test suite validates crash recovery behavior using custom persistence adapters:
// test/e2e/timing.test.mjs:416-505
test("Restored pending signal preserves 24h timing from pendingAt", async ({ pass, fail }) => {
const now = Date.now();
const twelveHoursAgo = now - 12 * 60 * 60 * 1000;
// Inject custom adapter that returns a pre-existing signal
PersistSignalAdapter.usePersistSignalAdapter(class {
async readValue() {
return {
id: "restored-signal",
position: "long",
minuteEstimatedTime: 1440, // 24 hours total
pendingAt: twelveHoursAgo, // Activated 12 hours ago
// ... other fields
};
}
});
// Start live trading - signal should restore with 12h remaining
await Live.background("BTCUSDT", {
strategyName: "test-strategy",
exchangeName: "test-exchange",
});
// Verify: remaining time should be ~12h, not reset to 24h
});
| Test Case | File | Line Range | Validates |
|---|---|---|---|
| Pending signal restoration | test/e2e/timing.test.mjs | 416-505 | pendingAt preservation, 12h remaining |
| Scheduled signal restoration | test/spec/scheduled.test.mjs | 211-362 | Scheduled signal lifecycle tracking |
| Timing calculations | test/e2e/timing.test.mjs | 34-201 | minuteEstimatedTime from pendingAt |
Crash recovery behavior is controlled by global configuration:
| Parameter | Default | Purpose |
|---|---|---|
CC_SCHEDULE_AWAIT_MINUTES |
120 | Max time scheduled signal waits for activation before cancellation |
CC_MAX_SIGNAL_LIFETIME_MINUTES |
10080 (7 days) | Max minuteEstimatedTime to prevent eternal signals |
TICK_TTL |
60000ms | Interval between tick processing (affects restore frequency) |
Related: See Timing Parameters for detailed configuration reference.
The crash recovery system is a critical safety mechanism for live trading:
Without crash recovery, process crashes would abandon open positions, leading to uncontrolled risk exposure and guaranteed losses from entry fees. The system ensures that live trading operations are resilient to infrastructure failures while maintaining correct position lifetime calculations.