Skip to main content
Version: Next

14. Job Migration Guide

14.1 The runtime chain every migration follows

In every one of the six completed migrations, the legacy worker class/interface was never deleted — rollback is always "restore the old RecurringJob.AddOrUpdate line," never "recreate deleted code."

14.2 The six completed migrations

1. Notification Dead Letter (cleanup) — Phase 3.2 (pilot) / 3.3 (cutover)

Before: two separate legacy Hangfire registrations calling NotificationDeadLetterCleanupWorker directly — notification-deadletter-cleanupCleanupDeadLettersAsync() and notification-retry-cleanupCleanupStaleRetriesAsync(), both daily at 02:00 UTC.

What changed: this was the framework's pilot migration (no "ideal" pilot candidate existed, so a user-approved exception job was chosen). New chain: NotificationDeadLetterCleanupPipelineJob (bridge) → pipeline → NotificationDeadLetterCleanupJobAdapter (calls both CleanupDeadLettersAsync then CleanupStaleRetriesAsync in sequence) → unchanged worker → unchanged INotificationCleanupService.

Current Hangfire job ID: notification-deadletter-cleanup-pipeline, cron 0 2 * * * — a single job now covers both former legacy IDs. Introduced additively in Phase 3.2 (legacy jobs ran in parallel — idempotent, so double-execution was harmless); both legacy IDs were only formally removed in Phase 3.3, which established the additive → exclusive cutover pattern every later migration reused.

Concurrency model: no distributed lock — idempotent cleanup; two concurrent runs simply both find nothing to do on the second pass. Runs at the host level (tenantId: null).

2. Notification Retry Expire — Phase 3.4

Before: RecurringJob.AddOrUpdate<INotificationRetryWorker>("notification-retry-expire", w => w.ExpireStaleRetriesAsync(...), "0 * * * *") — direct, hourly.

What changed: NotificationRetryExpirePipelineJob (bridge) + NotificationRetryExpireJobAdapter, calling only ExpireStaleRetriesAsync. Legacy job removed in the same phase (pilot and cutover folded into one, unlike the dead-letter job).

Current Hangfire job ID: notification-retry-expire-pipeline, cron 0 * * * *.

Noteworthy: INotificationRetryWorker exposes two methods (ExpireStaleRetriesAsync and ProcessDueRetriesAsync). This migration established the isolation-testing discipline reused by every later job that shares a worker interface with a sibling job: an explicit test proves the adapter calls only its one assigned method.

3. Notification Retry Processor — Phase 3.5 (assessment) / 3.6 (migration)

Before: RecurringJob.AddOrUpdate<INotificationRetryWorker>("notification-retry-processor", w => w.ProcessDueRetriesAsync(...), "*/2 * * * *") — every 2 minutes, performs live external dispatch (Email/SMS/WhatsApp).

Assessment first (documentation-only, no code changed): the first job where failures are user-visible (missed notifications) and duplication is user-visible (duplicate sends). Verdict: "Ready With Conditions" — 5 conditions, most importantly that exception propagation to Hangfire must be explicitly tested, and that no job-level distributed lock is needed because concurrency is already handled by per-entry optimistic claiming (IRetryRepository.ClaimAsync(entryId, workerToken)).

Migration: NotificationRetryProcessorPipelineJob (bridge) + NotificationRetryProcessorJobAdapter (calls only ProcessDueRetriesAsync, never ExpireStaleRetriesAsync). Legacy job removed same phase.

Current Hangfire job ID: notification-retry-processor-pipeline, cron */2 * * * *.

Called the "highest risk" migration in the whole sequence. Concurrency is entry-level optimistic claiming, not a job-level lock — a distributed lock here was explicitly evaluated and rejected as redundant with, and potentially interfering with, the existing per-entry claim model. A secondary idempotency guard exists in IDeliveryEngineService, keyed on (correlationId, attemptNumber, channel).

4. Campaign Cleanup — Phase 3.7

Before: RecurringJob.AddOrUpdate<INotificationCampaignCleanupWorker>("campaign-cleanup", w => w.CleanupCompletedCampaignsAsync(...), "0 3 * * *") — daily 03:00 UTC, soft-deletes completed campaigns.

What changed: the first Campaign-domain job migrated. CampaignCleanupPipelineJob (bridge) + CampaignCleanupJobAdapter — a single-method interface, no isolation concern. Legacy removed same phase.

Current Hangfire job ID: campaign-cleanup-pipeline, cron 0 3 * * *.

Concurrency model: no distributed lock — a set-based idempotent soft-delete, explicitly contrasted against the retry processor's per-entry-claim requirement (no external delivery side effect here at all).

5. Campaign Scheduler — Phase 3.8 (assessment, covering both remaining campaign jobs) / 3.9 (migration)

Before: RecurringJob.AddOrUpdate<INotificationCampaignSchedulerWorker>("campaign-scheduler", w => w.ProcessDueCampaignsAsync(...), "* * * * *") — every minute; creates NotificationCampaignExecution (Pending) records and advances NextScheduledAt per due campaign.

Assessment: concluded scheduler concurrency risk is Low-Medium — duplicate Pending execution rows are possible but self-correcting (cleaned up by the already-migrated campaign-cleanup) — versus the executor's HIGH risk (see #6). Recommended migrating the scheduler first.

Migration: CampaignSchedulerPipelineJob (bridge) + CampaignSchedulerJobAdapter. Legacy removed same phase; campaign-workflow-executor explicitly left untouched pending its own prerequisite fix.

Current Hangfire job ID: campaign-scheduler-pipeline, cron * * * * *.

Noteworthy: per-campaign errors are caught inside CampaignSchedulerService and never re-thrown — Hangfire only sees a failure if GetDueAsync itself throws (e.g. database unreachable). All transactions (per-campaign BeginTransactionAsync/CommitAsync) are owned entirely by the Notification Framework Core; the pipeline/adapter/bridge layers never open a transaction.

6. Campaign Workflow Executor — Phase 3.8 (blocked) → Phase 3.10 (prerequisite) → Phase 3.11 (migration)

Before: RecurringJob.AddOrUpdate<INotificationWorkflowExecutorWorker>("campaign-workflow-executor", w => w.ProcessPendingStepsAsync(...), "*/2 * * * *") — every 2 minutes; dispatches actual notifications per step × audience × recipient.

Blocking condition found in Phase 3.8: ICampaignExecutionRepository.GetPendingAsync had no per-execution claiming mechanism — two concurrent instances could both mark the same Pending execution Running and both dispatch, causing duplicate, irreversible notification deliveries (rated HIGH severity). Migration was explicitly blocked pending a Notification Framework Core fix.

Prerequisite (Phase 3.10): added ICampaignExecutionRepository.ClaimAsync — an atomic SQL UPDATE ... WHERE Status = Pending AND WorkerToken IS NULL — closing the gap.

Migration (Phase 3.11): CampaignWorkflowExecutorPipelineJob (bridge) + CampaignWorkflowExecutorJobAdapter (the adapter body is literally two lines: a null-check and one await _worker.ProcessPendingStepsAsync(ct) call). Explicitly called "the last production recurring job in the current Background Jobs Framework migration scope" — completing the set of six.

Current Hangfire job ID: campaign-workflow-executor-pipeline, cron */2 * * * *.

Proven by test (TwoConcurrentBridgeRuns_EachInvokeWorkerOnce_NoAdditionalDedupAtThisLayer): the pipeline/adapter layer performs no deduplication of its own — the real race is resolved entirely inside ClaimAsync, which this phase deliberately left untouched.

14.3 Final registration table (all six active)

Job IDCronMigrated in
notification-deadletter-cleanup-pipeline0 2 * * *Phase 3.2 / 3.3
notification-retry-expire-pipeline0 * * * *Phase 3.4
notification-retry-processor-pipeline*/2 * * * *Phase 3.6
campaign-cleanup-pipeline0 3 * * *Phase 3.7
campaign-scheduler-pipeline* * * * *Phase 3.9
campaign-workflow-executor-pipeline*/2 * * * *Phase 3.11

14.4 The repeatable migration recipe

Formalized in the framework's own Extension Guide as the process for "Add a New Pipeline Job." Every migration phase document ends with an explicit "do NOT proceed without approval" list naming the jobs not yet touched — migrations were never batched.

Invariants every migration (and every future extension) must preserve:

  • Contracts stays a zero-dependency leaf.
  • Core and Adapters remain siblings — neither references the other.
  • The entry package references Core only, never Adapters directly.
  • No new adapter implements more than one adapter interface.
  • No job adapter contains business logic, repository access, or transaction management.
  • The framework never opens a database transaction.
  • The framework never implements per-row duplicate-execution claiming for a business entity — that belongs to the domain framework owning the row.
  • Legacy code is kept, not deleted, at every cutover.
  • All 5 packages version together.

14.5 Why Product Import, Tenant Provisioning, and DatabaseInitializer were not migrated

These three are consistently described across the framework's governance documents as deferred work, not abandoned or permanently rejected — a distinction worth preserving precisely. Migrating them "requires a new, separately approved initiative or ADR to begin," even though the mechanical steps would resemble the six completed migrations. This is a materially different governance status from the two artifacts that are permanently excluded by name: UserJob/ImportJobHistory/ImportJobRowError persistence (permanently ERP-owned per ADR-003) and Hangfire Dashboard centralization (permanently ERP-owned per ADR-004) — see §15.

The closest available technical rationale, from the framework's Phase 0 ownership analysis:

  • Product Import (IProductImportBackgroundService/ProductImportBackgroundService) stays in the ERP host because it "contains Product/Unit/Category/Department/TaxGroup domain logic — exactly the kind of 'business job' the architecture standard says stays in the host and is registered against the framework, not migrated into it." Its tracking entities (ProductImportJobHistory, ProductImportJobRowError) are domain-specific by design.
  • Tenant Provisioning / DatabaseInitializer (DatabaseInitializer.InitializeApplicationDbForTenantAsync) stays in the ERP host because "tenant provisioning is core ERP/MultiTenancy business logic; it merely uses IBackgroundJobService.Enqueue, exactly the intended extension point." Its associated MediatR handlers are tied to ERP domain events.
  • The general principle these exemplify: "business job bodies... are domain logic that happens to run on a generic scheduler; the scheduler is generic, the job is not."

An honest caveat, not papered over: this same reasoning (business logic stays in the host) applied equally to the six jobs that were migrated — their business logic, campaign state, and transactions also stayed entirely in the Notification Framework/ERP; only the Hangfire scheduling/triggering layer moved. So "business logic belongs to the host" does not, on its own, explain why these three specific jobs' triggering mechanism wasn't also migrated the way the six others were. The best-supported statement available: these three were never inside the ownership-freeze scope set at the initiative's Phase 0.5, and every phase from 3.2 onward treats "add a new job to the migration list" as requiring fresh approval regardless of mechanical similarity — this is a scope/governance boundary, not a documented technical risk comparison. Confirm with the architecture owner before asserting a more specific technical reason than this in any future revision of this chapter.