Releases: oban-bg/oban
v2.16.0
🐑 Oban Instance Module
New facade modules allow you to call Oban
functions on instances with custom names, e.g. not Oban
, without passing a t:Oban.name/0
as the first argument.
For example, rather than calling Oban.config/1
, you'd call MyOban.config/0
:
MyOban.config()
It also makes piping into Oban functions far more convenient:
%{some: :args}
|> MyWorker.new()
|> MyOban.insert()
🧩 Partial Matches in Testing Assertions
It's now possible to match a subset of fields on args or meta with all_enqueued
, assert_enqueued
, and refute_enqueued
. For example, the following assertion will now pass:
# Given a job with these args: %{id: 123, mode: "active"}
assert_enqueued args: %{id: 123} #=> true
assert_enqueued args: %{mode: "active"} #=> true
assert_enqueued args: %{id: 321, mode: "active"} #=> false
The change applies to args
and meta
queries for all_enqueued/2
, assert_enqueued/2
and refute_enqueued/2
helpers.
⏲️ Unique Timestamp Option
Jobs are frequently scheduled for a time far in the future and it's often desirable for to consider scheduled
jobs for uniqueness, but unique jobs only checked the :inserted_at
timestamp.
Now unique
has a timestamp
option that allows checking the :scheduled_at
timestamp instead:
use Oban.Worker, unique: [period: 120, timestamp: :scheduled_at]
Bug Fixes
-
[Reindexer] Correct relname match for reindexer plugin
We can safely assume all indexes start with
oban_jobs
. The previous pattern was based on an outdated index format from older migrations. -
[Testing] Support
repo
,prefix
, andlog
query options inuse Oban.Testing
v2.15.3
Enhancements
-
[Pruner] Prune jobs using the
scheduled_at
timestamp regardless of state.The previous pruning query checked a different timestamp field for each prunable state, e.g.
cancelled
usedcancelled_at
. There aren't any indexes for those timestamps, let alone the combination of each state and timestamp, which led to slow pruning queries in larger databases.In a database with a mixture of ~1.2m prunable jobs the updated query is 130x faster, reducing the query time from 177ms down to 1.3ms.
-
[Lite] Avoid unnecessary transactions during staging and pruning operations
Contention between SQLite3 transactions causes deadlocks that lead to period errors. Avoiding transactions when there isn't anything to write minimizes contention.
Bug Fixes
-
[Foreman] Explicitly pause queues when shutdown begins.
A call to
Producer.shutdown/1
was erroneously removed during theDynamicSupervisor
queue refactor. -
[Job] Preserve explicit state set along with
scheduled_in
time.The presence of a
scheduled_in
timestamp would always set the state toscheduled
, even when an explicit state was passed.
v2.15.2
Enhancements
-
[Repo] Pass
oban: true
option to all queries.Telemetry options are exposed in Ecto instrumentation but aren't obviously available in the opts passed to
prepare_query
. Now all queries have anoban: true
option so users can ignore them in multi-tenancy setups. -
[Engine] Generate a UUID for all
Basic
andLite
queue instances to aid in identifying orphaned jobs or churning queue producers. -
[Oban] Use
Logger.warning/2
and replace deprecated use of:warn
level with:warning
across all modules.
Bug Fixes
-
[Job] Validate changesets during
Job.to_map/1
conversionThe
Job.to_map/1
function converts jobs to a map "entry" suitable for use ininsert_all
. Previously, that function didn't perform any validation and would allow inserting (or attempting to insert) invalid jobs duringinsert_all
. Aside from inconsistency withinsert
andinsert!
,insert_all
could insert invalid jobs that would never run successfully.Now
to_map/1
usesapply_action!/2
to apply the changeset with validation and raises an exception identical toinsert!
, but before calling the database. -
[Notifier] Store PG notifier state in the registry for non-blocking lookup
We pull the state from the notifier's registry metadata to avoid
GenServer.call
timeouts when the system is under high load. -
[Migration] Add
primary_key
explicitly during SQLite3 migrationsIf a user has configured Ecto's
:migration_primary_key
to something other thanbigserial
the schema is incompatible with Oban's job schema.
v2.15.1
Enhancements
-
[Telemetry] Add
[:oban, :stager, :switch]
telemetry event and use it for logging changes.Aside from an instrumentable event, the new logs are structured for consistent parsing on external aggregators.
-
[Job] Remove default priority from job schema to allow changing defaults through the database
Bug Fixes
-
[Basic] Restore
attempt < max_attempts
condition when fetching jobsIn some situations, a condition to ensure the attempts don't exceed max attempts is still necessary. By checking the attempts outside the CTE we maintain optimal query performance for the large scan that finds jobs, and only apply the check in the small outer query.
-
[Pruner] Add missing
:interval
tot:Oban.Plugins.Pruner.option/0
v2.15.0
🗜️ Notification Compression
Oban uses notifications across most core functionality, from job staging to cancellation. Some notifications, such as gossip, contain massive redundancy that compresses nicely. For example, this table breaks down the compression ratios for a fairly standard gossip payload containing data from ten queues:
Mode | Bytes | % of Original |
---|---|---|
Original | 4720 | 100% |
Gzip | 307 | 7% |
Encode 64 | 412 | 9% |
Minimizing notification payloads is especially important for Postgres because it applies an 8kb limit to all messages. Now all pub/sub notifications are compressed automatically, with a safety mechanism for compatibility with external notifiers, namely Postgres triggers.
🗃️ Query Improvements
There has been an ongoing issue with systems recording a job attempt twice, when it only executed once. While that sounds minor, it could break an entire queue when the attempt exceeded max attempts because it would violate a database constraint.
Apparently, the Postgres planner may choose to generate a plan that executes a nested loop over the LIMITing subquery, causing more UPDATEs than LIMIT. That could cause unexpected updates, including attempts > max_attempts in some cases. The solution is to use a CTE as an "optimization fence" that forces Postgres not to optimize the query.
We also worked in a few additional query improvements:
- Use an index only scan for job staging to safely handle tables with millions of scheduled jobs.
- Remove unnecessary row locking from staging and pruning queries.
🪶 New Engine Callbacks for SQL Compatibility
We're pleased to share improvements in Oban's SQLite integration. A few SQLite pioneers identified pruning and staging compatibility bugs, and instead of simply patching around the issues with conditional logic, we tackled them with new engine callbacks: stage_jobs/3
and prune_jobs/3
. The result is safer, optimized queries for each specific database.
Introducing new engine callbacks with database-specific queries paves the way for working with other databases. There's even an open issue for MySQL support...
v2.15.0 — 2023-04-13
Enhancements
-
[Oban] Use DynamicSupervisor to supervise queues for optimal shutdown
Standard supervisors shut down in a fixed order, which could make shutting down queues with active jobs and a lengthy grace period very slow. This switches to a
DynamicSupervisor
for queue supervision so queues can shut down simultaneously while still respecting the grace period. -
[Executor] Retry acking infinitely after job execution
After jobs execute the producer must record their status in the database. Previously, if acking failed due to a connection error after 10 retries it would orphan the job. Now, acking retries infinitely (with backoff) until the function succeeds. The result is stronger execution guarantees with backpressure during periods of database fragility.
-
[Oban] Accept a
Job
struct as well as a job id forcancel_job/1
andretry_job/1
Now it's possible to write
Oban.cancel_job(job)
directly rather thanOban.cancel_job(job.id)
. -
[Worker] Allow snoozing jobs for zero seconds.
Returning
{:snooze, 0}
immediately reschedules a job without any delay. -
[Notifier] Accept arbitrary channel names for notifications, e.g. "my-channel"
-
[Telemetry] Add 'detach_default_logger/0' to programmatically disable an attached logger.
-
[Testing] Avoid unnecessary query for "happy path" assertion errors in
assert_enqueued/2
-
[Testing] Inspect charlists as lists in testing assertions
Args frequently contain lists of integers like
[123]
, which was curiously displayed as'{'
.
Bug Fixes
-
[Executor] Correctly raise "unknown worker" errors.
Unknown workers triggered an unknown case error rather than the appropriate "unknown worker" runtime error.
-
[Testing] Allow
assert_enqueued
with ascheduled_at
time foravailable
jobsThe use of
Job.new
to normalize query fields would change assertions with a "scheduled_at" date to only check scheduled, never "available" -
[Telemetry] Remove
:worker
from engine and plugin query meta.The
worker
isn't part of any query indexes and prevents optimal index usage. -
[Job] Correct priority type to cover default of 0
For changes prior to v2.15 see the v2.14 docs.
v2.14.2
Bug Fixes
-
[Oban] Always disable peering with
plugins: false
. There's no reason to enable peering when plugins are fully disabled. -
[Notifier] Notify
Global
peers when the leader terminates.Now the
Global
leader sends adown
message to all connected nodes when the process terminates cleanly. This behaviour prevents up to 30s of downtime without a leader and matches how the Postgres peer operates. -
[Notifier] Allow compilation in a SQLite application when the
postgrex
package isn't available. -
[Engine] Include
jobs
infetch_jobs
event metadata
Changes
-
[Notifier] Pass
pid
in instead of relying onfrom
for Postgres notifications.This prepares Oban for the upcoming
Postgrex.SimpleConnection
switch to usegen_statem
.
v2.14.1
Bug Fixes
-
[Repo] Prevent logging SQL queries by correctly handling default opts
The query dispatch call included opts in the args list, rather than separately. That passed options to
Repo.query
correctly, but it missed any default options such aslog: false
, which made for noisy development logs.
v2.14.0
Time marches on, and we minimally support Elixir 1.12+, PostgreSQL 12+, and SQLite 3.37.0+
🪶 SQLite3 Support with the Lite Engine
Increasingly, developers are choosing SQLite for small to medium-sized projects, not just in the
embedded space where it's had utility for many years. Many of Oban's features, such as isolated
queues, scheduling, cron, unique jobs, and observability, are valuable in smaller or embedded
environments. That's why we've added a new SQLite3 storage engine to bring Oban to smaller,
stand-alone, or embedded environments where PostgreSQL isn't ideal (or possible).
There's frighteningly little configuration needed to run with SQLite3. Migrations, queues, and
plugins all "Just Work™".
To get started, add the ecto_sqlite3
package to your deps and configure Oban to use the
Oban.Engines.Lite
engine:
config :my_app, Oban,
engine: Oban.Engines.Lite,
queues: [default: 10],
repo: MyApp.Repo
Presto! Run the migrations, include Oban in your application's supervision tree, and then start
inserting and executing jobs as normal.
issues or gaps in documentation.
👩🔬 Smarter Job Fetching
The most common cause of "jobs not processing" is when PubSub isn't available. Our troubleshooting
section instructed people to investigate their PubSub and optionally include the Repeater
plugin. That kind of manual remediation isn't necessary now! Instead, we automatically switch back
to local polling mode when PubSub isn't available—if it is a temporary glitch, then fetching
returns to the optimized global mode after the next health check.
Along with smarter fetching, Stager
is no longer a plugin. It wasn't ever really a plugin, as
it's core to Oban's operation, but it was treated as a plugin to simplify configuration and
testing. If you're in the minority that tweaked the staging interval, don't worry, the existing
plugin configuration is automatically translated for backward compatibility. However, if you're a
stickler for avoiding deprecated options, you can switch to the top-level stage_interval
:
config :my_app, Oban,
queues: [default: 10],
- plugins: [{Stager, interval: 5_000}]
+ stage_interval: 5_000
📡 Comprehensive Telemetry Data
Oban has exposed telemetry data that allows you to collect and track metrics about jobs and queues
since the very beginning. Telemetry events followed a job's lifecycle from insertion through
execution. Still, there were holes in the data—it wasn't possible to track the exact state of your
entire Oban system through telemetry data.
Now that's changed. All operations that change job state, whether inserting, deleting, scheduling,
or processing jobs report complete state-change events for every job including queue
, state
,
and worker
details. Even bulk operations such as insert_all_jobs
, cancel_all_jobs
, and
retry_all_jobs
return a subset of fields for all modified jobs, rather than a simple count.
See the 2.14 upgrade guide for step-by-step instructions (all two of them).
Enhancements
-
[Oban] Store a
{:cancel, :shutdown}
error and emit[:oban, :job, :stop]
telemetry when jobs
are manually cancelled withcancel_job/1
orcancel_all_jobs/1
. -
[Oban] Include "did you mean" suggestions for
Oban.start_link/1
and all nested plugins when a
similar option is available.Oban.start_link(rep: MyApp.Repo, queues: [default: 10]) ** (ArgumentError) unknown option :rep, did you mean :repo? (oban 2.14.0-dev) lib/oban/validation.ex:46: Oban.Validation.validate!/2 (oban 2.14.0-dev) lib/oban/config.ex:88: Oban.Config.new/1 (oban 2.14.0-dev) lib/oban.ex:227: Oban.start_link/1 iex:1: (file)
-
[Oban] Support scoping queue actions to a particular node.
In addition to scoping to the current node with
:local_only
, it is now possible to scope
pause
,resume
,scale
,start
, andstop
queues on a single node using the:node
option.Oban.scale_queue(queue: :default, node: "worker.123")
-
[Oban] Remove
retry_job/1
andretry_all_jobs/1
restriction around retryingscheduled
jobs. -
[Job] Restrict
replace
option to specific states when unique job's have a conflict.# Replace the scheduled time only if the job is still scheduled SomeWorker.new(args, replace: [scheduled: [:schedule_in]], schedule_in: 60) # Change the args only if the job is still available SomeWorker.new(args, replace: [available: [:args]])
-
[Job] Introduce
format_attempt/1
helper to standardize error and attempt formatting
across engines -
[Repo] Wrap nearly all
Ecto.Repo
callbacks.Now every
Ecto.Repo
callback, aside from a handful that are only used to manage aRepo
instance, are wrapped with code generation that omits any typespecs. Slight inconsistencies
between the wrapper's specs andEcto.Repo
's own specs caused dialyzer failures when nothing
was genuinely broken. Furthermore, many functions were missing because it was tedious to
manually define every wrapper function. -
[Peer] Emit telemetry events for peer leadership elections.
Both peer modules,
Postgres
andGlobal
, now emit[:oban, :peer, :election]
events during
leader election. The telemetry meta includes aleader?
field for start and stop events to
indicate if a leadership change took place. -
[Notifier] Allow passing a single channel to
listen/2
rather than a list. -
[Registry] Add
lookup/2
for conveniently fetching registered{pid, value}
pairs.
Bug Fixes
-
[Basic] Capture
StaleEntryError
on unique replace.Replacing while a job is updated externally, e.g. it starts executing, could occasionally raise
anEcto.StaleEntryError
within the Basic engine. Now, that exception is translated into an
error tuple and bubbles up to theinsert
call site. -
[Job] Update
t:Oban.Job/0
to indicate timestamp fields are nullable.
Deprecations
-
[Stager] Deprecate the
Stager
plugin as it's part of the core supervision tree and may be
configured with the top-levelstage_interval
option. -
[Repeater] Deprecate the
Repeater
plugin as it's no longer necessary with hybrid staging. -
[Migration] Rename
Migrations
toMigration
, but continue delegating functions for backward
compatibility.
v2.13.6
v2.13.4
Bug Fixes
-
[Oban] Fix dialyzer ambiguity for
insert_all/2
when using a custom name rather than options. -
[Testing] Increment attempt when executing with
:inline
testing modeInline testing mode neglected incrementing the
attempt
and left it at 0. That caused jobs with a single attempt to erroneously reportfailure
rather than adiscard
telemetry event. -
[Reindexer] Correct namespace reference in reindexer query.