Skip to content

OTEL Tracing

This document describes Queuert’s OpenTelemetry tracing implementation. Tracing provides end-to-end visibility into job chain execution, including job dependencies, retry attempts, and blocker relationships.

Queuert uses a five-level span hierarchy:

PRODUCER: create chain.{type} ← Chain published (ends immediately)
├── PRODUCER: create job.{type} ← Job published (ends immediately)
│ │
│ ├── PRODUCER: await chain.{type} ← Blocker dependency
│ │ links: [blocker chain]
│ │ └── CONSUMER: resolve chain.{type} ← Blocker resolved
│ │
│ ├── CONSUMER: start job-attempt.{type} ← Worker processes attempt (has duration)
│ │ ├── INTERNAL: prepare
│ │ └── INTERNAL: complete
│ │
│ └── CONSUMER: start job-attempt.{type} ← Retry attempt
│ ├── INTERNAL: prepare
│ └── INTERNAL: complete
├── PRODUCER: create job.{type} ← Continuation job
│ │
│ └── CONSUMER: start job-attempt.{type} (final)
│ ├── INTERNAL: prepare
│ ├── INTERNAL: complete
│ └── CONSUMER: complete chain.{type} ← Chain completion

Span kinds use OpenTelemetry’s PRODUCER/CONSUMER/INTERNAL semantics. The chain has both a PRODUCER (creation) and CONSUMER (completion) span for symmetry.

SpanKindCreatedEndedDuration
create chain.{type}PRODUCERstartJobChain()Immediately~0ms
create job.{type}PRODUCERstartJobChain(), continueWith()Immediately~0ms
await chain.{type}PRODUCERstartJobChain() with blockersImmediately~0ms
resolve chain.{type}CONSUMERBlocker chain completesImmediately~0ms
start job-attempt.{type}CONSUMERWorker claims jobAttempt completes/failsProcessing time
prepareINTERNALprepare() calledprepare() returnsTransaction time
completeINTERNALcomplete() calledcomplete() returnsTransaction time
complete job.{type}CONSUMERWorkerless completionImmediately~0ms
complete chain.{type}CONSUMERFinal job completesImmediately~0ms

When a job has blockers (dependencies on other chains), each blocker gets a PRODUCER/CONSUMER span pair as a child of the blocked job’s PRODUCER span. The PRODUCER (await chain.{type}) is created at startJobChain time with a link to the blocker chain. The CONSUMER (resolve chain.{type}) is created when the blocker chain completes, so the time between them represents the blocking duration.

The blocker PRODUCER span’s trace context is persisted in the job_blocker table so the CONSUMER can be created later by a different process (the one completing the blocker chain).

EXTERNAL span (e.g., HTTP request)
├── PRODUCER: create chain.process-order ──────────────
│ │
│ └── PRODUCER: create job.process-order
│ │
│ ├── PRODUCER: await chain.fetch-user ──link──→ chain fetch-user
│ │ └── CONSUMER: resolve chain.fetch-user
│ │
│ ├── PRODUCER: await chain.fetch-inventory ──link──→ chain fetch-inventory
│ │ └── CONSUMER: resolve chain.fetch-inventory
│ │
│ └── CONSUMER: start job-attempt.process-order
│ │ job.blockers contains resolved blocker outputs
│ ├── INTERNAL: prepare
│ ├── INTERNAL: complete ✓
│ └── CONSUMER: complete chain.process-order
├── PRODUCER: create chain.fetch-user ─────────────────
│ │
│ └── PRODUCER: create job.fetch-user
│ │
│ └── CONSUMER: start job-attempt.fetch-user ✓
│ ├── INTERNAL: prepare
│ ├── INTERNAL: complete
│ └── CONSUMER: complete chain.fetch-user
└── PRODUCER: create chain.fetch-inventory ────────────
└── PRODUCER: create job.fetch-inventory
└── CONSUMER: start job-attempt.fetch-inventory ✓
├── INTERNAL: prepare
├── INTERNAL: complete
└── CONSUMER: complete chain.fetch-inventory
  1. PRODUCER created and ended in startJobChain when the job has blockers — one PRODUCER span per blocker, as a child of the job’s PRODUCER span, with a link to the blocker chain’s trace context
  2. Persisted — the PRODUCER span context is stored in the job_blocker table (trace_context column) so the CONSUMER can be created by another process
  3. CONSUMER created when unblockJobs detects the blocker chain has completed — the PRODUCER span context is read from job_blocker and a CONSUMER span is created as its child

When a job continues to another job via continueWith, the continuation links to its origin:

PRODUCER: create chain.multi-step ────────────────────────
├── PRODUCER: create job.step-one
│ └── CONSUMER: start job-attempt.step-one #1
│ ├── INTERNAL: prepare
│ └── INTERNAL: complete (calls continueWith)
└── PRODUCER: create job.step-two
│ links: [job step-one] ← origin link
└── CONSUMER: start job-attempt.step-two #1 (final)
├── INTERNAL: prepare
├── INTERNAL: complete
└── CONSUMER: complete chain.multi-step

The origin link shows the causal flow: “step-two was created by step-one’s completion”.

When a job is completed via completeJobChain (without a worker), there is no job-attempt. Instead, a CONSUMER job span marks the completion, and if the chain is fully completed, a CONSUMER chain span closes the trace:

PRODUCER: create chain.approve-order ─────────────────────
└── PRODUCER: create job.approve-order
└── CONSUMER: complete job.approve-order ← Workerless completion
└── CONSUMER: complete chain.approve-order

The CONSUMER job span is a child of the PRODUCER job span and carries the same chain/job attributes. When continueWith is called during workerless completion, the CONSUMER chain span is omitted (the chain continues):

PRODUCER: create chain.multi-step ────────────────────────
├── PRODUCER: create job.step-one
│ │
│ └── CONSUMER: complete job.step-one ← Workerless completion (continueWith)
└── PRODUCER: create job.step-two
│ links: [job step-one]
└── ...

This uses the completeJobSpan adapter method rather than startAttemptSpan, reflecting that no attempt processing occurred.

With create chain at start and complete chain at end, total chain duration is calculated as:

Chain Duration = complete chain.startTime - create chain.startTime

This provides end-to-end visibility even though individual PRODUCER/CONSUMER spans are instantaneous markers.

When startJobChain is called with deduplication options and a matching chain already exists, no new chain is created. The span must reflect this outcome correctly.

Deduplication is not an error—it’s expected behavior that successfully returned an existing chain. Per OpenTelemetry status conventions, the span status should remain UNSET (not ERROR), with an attribute indicating deduplication occurred.

When deduplication occurs:

  1. Adds attribute queuert.chain.deduplicated: true
  2. References the existing chain’s IDs
  3. Optionally links to the existing chain’s trace context
Caller requests startJobChain with deduplication key "user-123":
First call (creates new chain):
PRODUCER create chain.process-user [0ms] ──────────────
│ queuert.chain.id: "abc-123"
│ queuert.chain.deduplicated: false
└── ... (normal processing)
Second call (deduplicated):
PRODUCER create chain.process-user [0ms] ──────────────
queuert.chain.id: "abc-123" ← same as existing
queuert.chain.deduplicated: true
links: [chain abc-123] ← link to existing chain
AttributeTypeDescription
queuert.chain.idstringJob chain ID
queuert.chain.typestringJob chain type name
queuert.chain.deduplicatedbooleantrue when chain was deduplicated
AttributeTypeDescription
queuert.job.idstringJob ID
queuert.job.typestringJob type name
queuert.job.attemptnumberAttempt number (on attempt spans)
AttributeTypeDescription
queuert.worker.idstringWorker ID processing the attempt
AttributeTypeDescription
queuert.attempt.resultstring"completed" or "failed"
queuert.rescheduled_atstringISO 8601 timestamp of next retry (on failure)
queuert.rescheduled_after_msnumberDelay in ms before next retry (on failure)
AttributeTypeDescription
queuert.continued_with.job_idstringID of the continuation job
queuert.continued_with.job_typestringType name of the continuation job
AttributeTypeDescription
queuert.blocker.chain.idstringBlocker chain ID
queuert.blocker.chain.typestringBlocker chain type name
queuert.blocker.indexnumberIndex of the blocker in the blockers array