Skip to content

Releases: oban-bg/oban

v2.19

17 Jan 11:31
Compare
Choose a tag to compare

The minimum Elixir version is now v1.15. The official policy is to only support the three latest versions of Elixir.

🐬 MySQL Support

Oban officially supports MySQL with the new Dolphin engine. Oban supports modern (read "with full JSON support") MySQL versions from 8.4 on, and has been tested on the highly scalable Plantescale database.

Running on MySQL is as simple as specifying the Dolphin engine in your configuration:

config :my_app, Oban,
  engine: Oban.Engines.Dolphin,
  queues: [default: 10],
  repo: MyApp.Repo

With this addition, Oban can run in estimated 10% more Elixir applications!

⚗️ Automated Installer

Installing Oban into a new application is simplified with a new igniter powered mix task. The new oban.install task handles installing and configuring a standard Oban installation, and it will deduce the correct engine and notifier automatically based on the database adapter.

mix igniter.install oban

This oban.install task is currently the recommended way to install Oban. As a bonus, the task composes together with other igniter installers, making it possible to install phoenix, ash, oban, and other packages with a single command:

mix igniter.install phoenix ash_phoenix ash_postgres ash_oban

Look at the Mix.Oban.Install docs for full usage and options.

📔 Logging Enhancements

Logging in a busy system may be noisy due to job events, but there are other events that are particularly useful for diagnosing issues. A new events option for attach_default_logger/1 allows selective event logging, so it's possible to receive important notices such as notifier connectivity issues, without logging all job activity:

Oban.Telemetry.attach_default_logger(events: ~w(notifier peer stager)a)

Along with filtering, there are new events to make diagnosing operational problems easier.

A peer:election events logs leadership changes to indicate when nodes gain or lose leadership. Leadership issues are rare, but insidious, and make diagnosing production problems especially tricky.

[
  message: "peer became leader",
  source: "oban",
  event: "peer:election",
  node: "worker.1",
  leader: true,
  was_leader: false
]

Helpfully, plugin:stop events are now logged for all core plugins via an optional callback, and plugin:exception events are logged for all plugins regardless of whether they implement the callback. Runtime information is logged for Cron, Lifeline, Pruner, Stager, and Reindexer plugins.

For example, every time Cron runs successfully it will output details about the execution time and all of the inserted job ids:

[
  source: "oban",
  duration: 103,
  event: "plugin:stop",
  plugin: "Oban.Plugins.Cron",
  jobs: [1, 2, 3]
]

⛵️ Official JSON

Oban will default to using the official JSON module built into Elixir v1.18+ when available.

A new Oban.JSON module detects whether the official Elixir JSON module is available at compile time. If it isn't available, then it falls back to Jason, and if Jason isn't available (which is extremely rare) then it warns about a missing module.

This approach was chosen over a config option for backward compatibility because Oban will only support the JSON module once the minimum supported Elixir version is v1.18.

v2.19.0 — 2025-01-16

Enhancements

  • [Oban] Start all queues in parallel on initialization.

    The midwife now starts queues using an async stream to parallelize startup and minimize boot time for applications with many queues. Previously,

  • [Oban] Safely return nil from check_queue/2 when checking queues that aren't running.

    Checking on a queue that wasn't currently running on the local node now returns nil rather than causing a crash. This makes it safer to check the whether a queue is running at all without a try/catch clause.

  • [Oban] Add check_all_queues/1 to gather all queue status in a single function.

    This new helper gathers the "check" details from all running queues on the local node. While it was previously possible to pull the queues list from config and call check_queue/2 on each entry, this more accurately pulls from the registry and checks each producer concurrently.

  • [Oban] Add delete_job/2 and delete_all_jobs/2 operations.

    This adds Oban.delete_job/2, Oban.delete_all_jobs/2, Engine callbacks, and associated operations for all native engines. Deleting jobs is now easier and safer, due to automatic state protections.

  • [Engine] Record when a queue starts shutting down

    Queue producer metadata now includes a shutdown_started_at field to indicate that a queue isn't just paused, but is actually shutting down as well.

  • [Engine] Add rescue_jobs/3 callback for all engines.

    The Lifeline plugin formerly used two queries to rescue jobs—one to mark jobs with remaining attempts as available and another that discarded the remaining stuck jobs. Those are now combined into a single callback, with the base definition in the Basic engine.

    MySQL won't accept a select in an update statement. The Dolphin implementation of rescue_jobs/3 uses multiple queries to return the relevant telemetry data and make multiple updates.

  • [Cron] Introduce Oban.Cron with schedule_interval/4

    The new Cron module allows processes, namely plugins, to get cron-like scheduled functionality with a single function call. This will allow plugins to removes boilerplate around parsing, scheduling, and evaluating for cron behavior.

  • [Registry] Add select/1 to simplify querying for registered modules.

  • [Testing] Add build_job/3 helper for easier testing.

    Extract the mechanism for verifying and building jobs out of perform_job/3 so that it's usable in isolation. This also introduces perform_job/2 for executing built jobs.

  • [Telemetry] Add information on leadership changes to oban.peer.election event.

    An additional was_leader? field is included in [:oban, :peer, :election | _] event metadata to make hooking into leadership change events simpler.

  • [Telemetry] Add callback powered logging for plugin events.

    Events are now logged for plugins that implement the a new optional callback, and exceptions are logged for all plugins regardless of whether they implement the callback.

    This adds logging for Cron, Lifeline, Pruner, Stager, and Reindexer.

  • [Telemetry] Add peer election logging to default logger.

    The default logger now includes leadership events to make identifying the leader, and leadership changes between nodes, easier.

  • [Telemetry] Add option to restrict logging to certain events.

    Logging in a busy system may be noisy due to job events, but there are other events that are particularly useful for diagnosing issues. This adds an events option to attach_default_logger/1 to allow selective event logging.

  • [Telemetry] Expose default_handler_id/0 for telemetry testing.

    Simplifies testing whether the default logger is attached or detached in application code.

Chores

  • [Peer] The default database-backed peer was renamed from Postgres to Database because it is also used for MySQL databases.

Bug Fixes

  • [Oban] Allow overwriting all insert/* functions arities after use Oban.

  • [Node] Correctly handle :node option for scale_queue/2

    Scoping scale_queue/2 calls to a single node didn't work as advertised due to some extra validation for producer meta compatibility.

  • [Migration] Fix version query for databases with non-unique oid

    Use pg_catalog.obj_description(object_oid, catalog_name), introduced in PostgreSQL 7.2, to specify the pg_class catalog so only the oban_jobs description is returned.

  • [Pruner] Use state specific fields when querying for prunable jobs.

    Using scheduled_at is not correct in all situations. Depending on job state, one of cancelled_at, discarded_at, or scheduled_at should be used.

  • [Peer] Conditionally return the current node as leader for isolated peers.

    Prevents returning the current node name when leadership is disabled.

  • [Testing] Retain time as microseconds for scheduled_at tests.

    Include microseconds in the begin and until times used for scheduled_at tests with a delta. The prior version would truncate, which rounded the until down and broke microsecond level checks.

  • [Telemetry] Correct spelling of "elapsed" in oban.queue.shutdown metadata.

v2.18.3

13 Sep 13:34
Compare
Choose a tag to compare

Enhancements

  • [Basic] Use the shared concat operator when appending errors.

    The standard push operation for updates is designed for arrays and uses array_append internally. This replaces all use of push with a fragment that uses the || operator instead, which works for both arrays and jsonb.

    CockroachDB doesn't support arrays of jsonb, but they do support simple jsonb columns. Now we can append to the errors column in either format for CRDB compatibility.

Bug Fixes

  • [Queue] Link the dynamic queue supervisor and Midwife for automatic restarts.

    When a producer crashes it brings the queue's supervisor down with it. With enough database errors, the producer may crash repeatedly enough to exhaust restarts and bring down the DynamicSupervisor in charge of all queues.

    Now the supervisor is linked to the midwife to ensure that the midwife restarts as well, and it restarts all of the queues.

  • [Testing] Handle insert_all/3 with streams for the :inline testing engine.

    The inline engine's insert_all_jobs callback incorrectly expected changesets to always be a list rather and couldn't handle streams.

v2.18.2

16 Aug 15:23
Compare
Choose a tag to compare

Bug Fixes

  • [Repo] Prevent debug noise by ensuring default opts for standard transactions.

    Without default opts each transaction is logged. Many standard operations execute each second, which makes for noisy logs. Now transaction opts are passed as a third argument to ensure defaults are applied.

  • [Repo] Increase transaction retry delay and increase with each attempt.

    Bump the base transaction retry from 100ms to 500ms, and increase linearly between each successive attempt to provide deeper backoff. This alleviates pressure on smaller connection pools and gives more time to recover from contentions failures.

v2.18.1

15 Aug 13:44
Compare
Choose a tag to compare

Enhancements

  • [Repo] Automatically retry all transactions with backoff.

    Avoid both expected an unexpected database errors by automatically retrying transactions. Some operations, such as serialization and lock not available errors, are likely to occur during standard use depending on how a database is configured. Other errors happen infrequently due to pool contention or flickering connections, and those should also be retried for increased safety.

    This change is applied to Oban.Repo.transaction/3 itself, so it will apply to every location that uses transactions.

  • [Migration] Declare tags as an array of text rather than varchar.

    We don't provide a limit on the size of tags and they could conceivably be larger than 256 characters. Externally the types are interchangeable, but internally there are minor advantages to using the text type.

    There isn't a new migration; this change is only for new tables.

Bug Fixes

  • [Repo] Correctly dispatch query!/4 to query! rather than query without a bang.

v2.18.0

26 Jul 12:43
Compare
Choose a tag to compare

🔭 Queue Shutdown Telemetry

A new queue shutdown event, [:oban, :queue, :shutdown], is emitted by each queue when it terminates. The event originates from the watchman process, which tracks the total ellapsed time from when termination starts to when all jobs complete or the allotted period is exhausted.

Any jobs that take longer than the :shutdown_grace_period (by default 15 seconds) are brutally killed and left as orphans. The ids of jobs left in an executing state are listed in the event's orphaned meta.

This also adds queue:shutdown logging to the default logger. Only queues that shutdown with orphaned jobs are logged, which makes it easier to detect orphaned jobs and which jobs were affected:

[
  message: "jobs were orphaned because they didn't finish executing in the allotted time",
  queue: "alpha",
  source: "oban",
  event: "queue:shutdown",
  ellapsed: 500,
  orphaned: [101, 102, 103]
]

🚚 Distributed PostgreSQL Support

It's now possible to run Oban in distributed PostgreSQL databases such as Yugabyte. This is made possible by a few simple changes to the Basic engine, and a new unlogged migration option.

Some PostgreSQL compatible databases don't support unlogged tables. Making oban_peers unlogged isn't a requirement for Oban to operate, so it can be disabled with a migration flag:

defmodule MyApp.Repo.Migrations.AddObanTables do
  use Ecto.Migration

  def up do
    Oban.Migration.up(version: 12, unlogged: false)
  end
end

🧠 Job Observability

Job stop and exception telemetry now includes the reported memory and total reductions from the job's process. Values are pulled with Process.info/2 after the job executes and safely fall back to 0 in the event the process has crashed. Reductions are a rough proxy for CPU load, and the new measurements will make it easier to identify computationally expensive or memory hungry jobs.

In addition, thanks to the addition of Process.set_label in recent Elixir versions, the worker name is set as the job's process label. That makes it possible to identify which job is running in a pid via observer or live dashboard.

v2.18.0 — 2024-07-26

Enhancements

  • [Job] Support simple unique: true and unique: false declarations

    Uniqueness can now be enabled with unique: true and disabled with unique: false from job options or a worker definition. The unique: true option uses all the standard defaults, but sets the period to :infinity for compatibility with Oban Pro's new simple unique mode.

  • [Cron] Remove forced uniqueness when inserting scheduled jobs.

    Using uniqueness by default prevents being able to use the Cron plugin with databases that don't support uniqueness because of advisory locks. Luckily, uniqueness hasn't been necessary for safe cron insertion since leadership was introduced and scheduling changed to top-of-the-minute many versions ago.

  • [Engine] Introduce check_available/1 engine callback

    The check_available/1 callback allows engines to customize the query used to find jobs in the available state. That makes it possible for alternative engines, such Oban Pro's Smart engine, to check for available jobs in a fraction of the time with large queues.

  • [Peer] Add Oban.Peer.get_leader/2 for checking leadership

    The get_leader/2 function makes it possible to check which node is currently the leader regardless of the Peer implementation, and without having to query the database.

  • [Producer] Log a warning for unhandled producer messages.

    Some messages are falling through to the catch-all handle_info/2 clause. Previously, they were silently ignored and it degraded producer functionality because inactive jobs with dead pids were still tracked as running in the producer.

  • [Oban] Use structured messages for most logger warnings.

    A standard structure for warning logs makes it easier to search for errors or unhandled messages from Oban or a particular module.

Bug Fixes

  • [Job] Include all fields in the unique section of Job.t/0.

    The unique spec lacked types for both keys and timestamp keys.

  • [Basic] Remove materialized option from fetch_jobs/3.

    The MATERIALIZED clause for CTEs didn't make a meaningful difference in job fetching accuracy. In some situations it caused a performance regression (which is why it was removed from Pro's Smart engine a while ago).

v2.17.11

25 Jun 15:25
Compare
Choose a tag to compare

Bug Fixes

  • [Oban] Handle deprecation warnings from Elixir 1.17

  • [Notifier] Prevent noisy logging about switching between modes.

    There's an apparent race condition in Sonar between pruning stale nodes on :ping and updating the status after a notification. This primarily happens in development for two reasons:

    1. Development laptops are most prone to time warp because of system sleep.
    2. Apps only run a single node in development.

    Using monotonic_time/1 instead of system_time/1 guards against clock drift/time warp effects.

  • [Stager] Prevent notification status timeouts from bubbling into the Stager.

    A clogged Ecto pool could cause cascading errors on startup due to a sequence of calls between the Notifier, Sonar, and Stager.

    1. Sonar sends a notification in handle_continue on startup.
    2. The notification is blocked while the Notifier waits for a connection from the Ecto pool.
    3. Stager checks for the connection status on startup, which would eventually time out because the Sonar hadn't finished initializing.
    4. The Stager crashes from the timeout error.

    This makes the following changes to prevent this sequence of events:

    1. The Stager no longer gets the sonar status during startup.
    2. The Notifier catches timeout errors from Sonar checks, warns about it, then returns an :unknown status.
  • [Engine] Defensively check the process dictionary during inline testing.

    Not all processes are guaranteed to return a value for the process dictionary. Sometimes a value was missing during inline testing, which would crash the test.

  • [Basic] Set conflict? flag when encountering a unique advisory lock.

    The conflict? flag wasn't set when inserting a unique job was blocked by an advisory lock. Now the flag is set on either a fetched duplicate, or when the advisory lock is set.

  • [Job] Correct replace_by_state_option type by switching from keyword to tuples.

  • [Config] Correctly type shutdown_grace_period as an integer rather than a timeout.

v2.17.10

25 Jun 15:26
Compare
Choose a tag to compare

Enhancements

  • [Oban] Make all generated functions from use Oban overridable.

    Now the functions generated by use Oban are all marked with defoverridable for extensibility.

Bug Fixes

  • [Testing] Use $callers rather than $ancestors for ancestry tree check.

    We care about Tasks for inline testing checks, not normal supervision tree ancestry. The $callers entry is the appropriate mechanism to find the trail of calling processes:

v2.17.9

20 Apr 11:18
Compare
Choose a tag to compare

Enhancements

  • [Testing] Check process ancestry tree for with_testing_mode override.

    Cascade the with_testing_mode block to nested processes that make use of :$ancestry in the process dictionary, i.e. tasks. Now enqueuing a job within spawned processes like Task.async or Task.async_stream will honor the testing mode specified in with_testing_mode/2.

  • [PG] Support alternative namespacing in PG notifier

    By default, all Oban instances using the same prefix option would receive notifications from each other. Now you can use the namespace option to separate instances that are in the same cluster without changing the prefix.

Bug Fixes

  • [Oban] Restore zero arity version of pause_all_queues/0

    Both pause and resume variants lost their default argument in a refactor that shifted around guard clauses.

  • [Oban] Add :oban_draining to process dict while draining

    The flag marks the test process while draining to give hints to the executor and engines. It fixes an incompatibility between Oban.drain_queue/2 and Pro's Testing.drain_jobs/2.

v2.17.8

08 Apr 17:31
Compare
Choose a tag to compare

Enhancements

  • [Backoff] Backoff retry on DBConnection and Postgrex errors from GenServer calls.

    GenServer calls that result in a ConnectionError or Postgrex.Error should also be caught and retried rather than crashing on the first attempt.

Bug Fixes

  • [Notifier] Check for a live notifier process and propagate notify errors.

    The Notifier.notify/1 spec showed it would always return :ok, but that wasn't the case when the notifier was disconnected or the process was no longer running. Now an error tuple is returned when a notifier process isn't running.

    This situation happened most frequently during shutdown, particularly from external usage of the Notifier like an application or the oban_met package.

    In addition, the errors bubble up through top level Oban functions like scale_queue/1, pause_queue/1, etc. to indicate that the operation can't actually succeed.

  • [Peers.Postgres] Rescue DBConnection.ConnectionError in peer leadership check.

    Previously, only Postgrex.Error exceptions were rescued and other standard connection errors were ignored, crashing the Peer. Because leadership is checked immediately after the peer initializes, any connection issues would trigger a crash loop that could bring down the rest of the supervision tree.

v2.17.7

27 Mar 13:16
Compare
Choose a tag to compare

Bug Fixes

  • [Notifier] Prevent Sonar from running in :testing modes.

    Sonar has no purpose during tests, and it can cause sandbox issues when tests run with the Postgres notifier.

  • [Oban] Correctly handle pause and resume all with opts.

    The primary clause had two default arguments and it was impossible to call pause_all_queues/1 or resume_all_queues/1 with opts and no name.