18. Troubleshooting
18.1 "PendingModelChangesWarning appears at ERP host startup after a Background Jobs package update"
Not caused by this framework — it owns no EF Core schema at all (ADR-003, ADR-005). If this warning
appears right after a version bump, the actual cause is unrelated, pre-existing model/snapshot drift in the
ERP's own ApplicationDbContext, coincidentally surfaced by the deploy. Fix by reconciling the EF model
snapshot to the current model — do not ship unrelated schema DDL to make the warning disappear.
18.2 "A migrated job stops appearing in the Hangfire Dashboard"
Check ApplicationBuilderExtensions.UseInfrastructure() for that job's
RecurringJob.AddOrUpdate<I{Job}PipelineJob>(...) call. The most common cause is a RemoveIfExists call
meant for a different job's legacy ID being pasted with the wrong string — deregistering the wrong job.
Cross-check against the final registration table in §14.3
for the authoritative list of all 6 current job IDs.
18.3 "A job runs but its business worker never actually gets called"
Check the job's adapter ({Job}JobAdapter.cs) for calling the wrong method on a shared worker interface.
This exact bug class is explicitly guarded against for the two jobs sharing INotificationRetryWorker
(notification-retry-processor vs. notification-retry-expire) via dedicated isolation tests. When adding a
new job that shares a worker interface with an existing one, add the equivalent isolation test — see
§7.6.
18.4 "Two workers appear to process the same business row twice"
This is explicitly not a Background Jobs Framework concern to fix directly. Per-row duplicate-execution
prevention belongs to whichever domain framework owns that row. For the two jobs where this has mattered
(campaign-workflow-executor, notification-retry-processor), the fix lives in the Notification Framework's
own atomic ClaimAsync implementation. Do not add a distributed lock in Background Jobs as a workaround
— this is a permanently rejected idea. See §9.4.
18.5 "DependencyGraphTests are failing"
DependencyGraph_HasNoCircularDependencies or DependencyGraph_FrameworkDoesNotRegisterAdapters failing
means a genuine architecture violation was just introduced. Never suppress or skip these tests — trace which
new ProjectReference or DI registration crossed a forbidden boundary and revert it. See
§3.7 for the allowed dependency directions.
18.6 "Exceptions from a migrated job are surfacing in the Hangfire dashboard"
BackgroundJobExecutionPipeline always re-throws — it never suppresses an exception. An exception on a
migrated job originates inside the job's own worker/business service (the Notification Framework, for all 6
current jobs), not inside the pipeline itself. Start the trace at the worker, not the bridge.
18.7 "Cancellation isn't propagating in production"
Every migrated job's test suite includes a cancellation-propagation test from bridge to worker. If this still passes locally but cancellation genuinely isn't reaching a worker in production, the break is downstream inside the business worker's own per-entry/per-execution loop — not in the Background Jobs Framework's plumbing, which is already proven to propagate the token correctly.
18.8 "DI resolution is failing"
Run the framework-repo's own HostDiVerificationTests and ConstructorCoverageTests/
AdapterConstructorCoverageTests locally — these reflection-sweep every public service and adapter, and are
faster to isolate than debugging a live host startup failure. On the ERP consumer side,
BackgroundJobsHostDiTests (Phase 3.1, 12 tests) is the equivalent DI verification suite. Also check
§13.4 — DI registration order — a
missing-service exception at job-run time (not at startup) usually means adapters were registered before
their prerequisites (Hangfire, MultiTenancy, Logging).
18.9 "A tenant-scoped job seems to be running for the wrong tenant"
Treat any change to tenant-context resolution with the same scrutiny as a change to an EF Core global query
filter — a worker-DI-scope misconfiguration here is a potential multi-tenancy data-isolation failure (running
against the wrong tenant's data), not merely a job failure. Verify IMultiTenantContextAccessor is being
populated correctly for the execution before assuming the bug is in BackgroundJobTenantResolverAdapter
itself. See §8 — Tenant Context.
18.10 "No explicit retry policy seems to be configured for any job"
This is accurate, not a misconfiguration — none of the six migrated jobs configures [AutomaticRetry(...)]
explicitly. Whole-job-failure retry behavior is entirely Hangfire's own library default (10 attempts,
exponential-ish backoff) for every one of the six jobs today. This is a known, documented gap (AP-04), not
a bug to fix locally — see §20 — Appendix.
18.11 Debug checklist template
[ ] Confirm the job's ID exists in the final registration table (§14.3) — not a stale legacy ID
[ ] Confirm the job's adapter calls the correct worker method (check shared-interface isolation if applicable)
[ ] Confirm DI registration order: Hangfire → MultiTenancy → Logging → AddShumoulBackgroundJobsFramework() → AddBackgroundJobsAdapters()
[ ] Check the Hangfire Dashboard for the job's actual last-run status and exception (if any)
[ ] If an exception surfaced, start tracing at the business worker, not the pipeline/bridge
[ ] If duplicate processing occurred, check the domain framework's own per-row claim, not this framework's lock primitive
[ ] Run DependencyGraphTests and HostDiVerificationTests locally before assuming a production-only issue
[ ] Confirm tenantId on the execution context matches the job's intended scope (null = host-level, by design for all 6 current jobs)