Skip to main content
Version: 1.1

18. Troubleshooting

18.1 "PendingModelChangesWarning appears at ERP host startup after a Background Jobs package update"

Not caused by this framework — it owns no EF Core schema at all (ADR-003, ADR-005). If this warning appears right after a version bump, the actual cause is unrelated, pre-existing model/snapshot drift in the ERP's own ApplicationDbContext, coincidentally surfaced by the deploy. Fix by reconciling the EF model snapshot to the current model — do not ship unrelated schema DDL to make the warning disappear.

18.2 "A migrated job stops appearing in the Hangfire Dashboard"

Check ApplicationBuilderExtensions.UseInfrastructure() for that job's RecurringJob.AddOrUpdate<I{Job}PipelineJob>(...) call. The most common cause is a RemoveIfExists call meant for a different job's legacy ID being pasted with the wrong string — deregistering the wrong job. Cross-check against the final registration table in §14.3 for the authoritative list of all 6 current job IDs.

18.3 "A job runs but its business worker never actually gets called"

Check the job's adapter ({Job}JobAdapter.cs) for calling the wrong method on a shared worker interface. This exact bug class is explicitly guarded against for the two jobs sharing INotificationRetryWorker (notification-retry-processor vs. notification-retry-expire) via dedicated isolation tests. When adding a new job that shares a worker interface with an existing one, add the equivalent isolation test — see §7.6.

18.4 "Two workers appear to process the same business row twice"

This is explicitly not a Background Jobs Framework concern to fix directly. Per-row duplicate-execution prevention belongs to whichever domain framework owns that row. For the two jobs where this has mattered (campaign-workflow-executor, notification-retry-processor), the fix lives in the Notification Framework's own atomic ClaimAsync implementation. Do not add a distributed lock in Background Jobs as a workaround — this is a permanently rejected idea. See §9.4.

18.5 "DependencyGraphTests are failing"

DependencyGraph_HasNoCircularDependencies or DependencyGraph_FrameworkDoesNotRegisterAdapters failing means a genuine architecture violation was just introduced. Never suppress or skip these tests — trace which new ProjectReference or DI registration crossed a forbidden boundary and revert it. See §3.7 for the allowed dependency directions.

18.6 "Exceptions from a migrated job are surfacing in the Hangfire dashboard"

BackgroundJobExecutionPipeline always re-throws — it never suppresses an exception. An exception on a migrated job originates inside the job's own worker/business service (the Notification Framework, for all 6 current jobs), not inside the pipeline itself. Start the trace at the worker, not the bridge.

18.7 "Cancellation isn't propagating in production"

Every migrated job's test suite includes a cancellation-propagation test from bridge to worker. If this still passes locally but cancellation genuinely isn't reaching a worker in production, the break is downstream inside the business worker's own per-entry/per-execution loop — not in the Background Jobs Framework's plumbing, which is already proven to propagate the token correctly.

18.8 "DI resolution is failing"

Run the framework-repo's own HostDiVerificationTests and ConstructorCoverageTests/ AdapterConstructorCoverageTests locally — these reflection-sweep every public service and adapter, and are faster to isolate than debugging a live host startup failure. On the ERP consumer side, BackgroundJobsHostDiTests (Phase 3.1, 12 tests) is the equivalent DI verification suite. Also check §13.4 — DI registration order — a missing-service exception at job-run time (not at startup) usually means adapters were registered before their prerequisites (Hangfire, MultiTenancy, Logging).

18.9 "A tenant-scoped job seems to be running for the wrong tenant"

Treat any change to tenant-context resolution with the same scrutiny as a change to an EF Core global query filter — a worker-DI-scope misconfiguration here is a potential multi-tenancy data-isolation failure (running against the wrong tenant's data), not merely a job failure. Verify IMultiTenantContextAccessor is being populated correctly for the execution before assuming the bug is in BackgroundJobTenantResolverAdapter itself. See §8 — Tenant Context.

18.10 "No explicit retry policy seems to be configured for any job"

This is accurate, not a misconfiguration — none of the six migrated jobs configures [AutomaticRetry(...)] explicitly. Whole-job-failure retry behavior is entirely Hangfire's own library default (10 attempts, exponential-ish backoff) for every one of the six jobs today. This is a known, documented gap (AP-04), not a bug to fix locally — see §20 — Appendix.

18.11 Debug checklist template

[ ] Confirm the job's ID exists in the final registration table (§14.3) — not a stale legacy ID
[ ] Confirm the job's adapter calls the correct worker method (check shared-interface isolation if applicable)
[ ] Confirm DI registration order: Hangfire → MultiTenancy → Logging → AddShumoulBackgroundJobsFramework() → AddBackgroundJobsAdapters()
[ ] Check the Hangfire Dashboard for the job's actual last-run status and exception (if any)
[ ] If an exception surfaced, start tracing at the business worker, not the pipeline/bridge
[ ] If duplicate processing occurred, check the domain framework's own per-row claim, not this framework's lock primitive
[ ] Run DependencyGraphTests and HostDiVerificationTests locally before assuming a production-only issue
[ ] Confirm tenantId on the execution context matches the job's intended scope (null = host-level, by design for all 6 current jobs)