Skip to content

Commit

Permalink
Add an ADR to keep document history in sync
Browse files Browse the repository at this point in the history
  • Loading branch information
pezholio committed Dec 11, 2024
1 parent 2bebe2f commit 905691f
Showing 1 changed file with 147 additions and 0 deletions.
147 changes: 147 additions & 0 deletions docs/adr/0005-keep-document-history-in-sync-with-rabbit-mq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# 5. Keep document history in sync with Publishing API via RabbitMQ

Date: 2025-11-27

## Status

Accepted

## Context

When Content Blocks created in the Content Block Manager are used in documents, we want to be able to
record when a change to a content block triggers an update to the host document. Currently this works
like so:

* Content block is updated
* We find all documents that use the content block
* Each document is then represented to the content store with the updated content block details

This all happens in Publishing API, so there is no record in Whitehall (or any other publishing apps)
of when a change to a document has been triggered by an update to a content block.

With this in mind, we need to find some way of enabling two-way communication between Publishing API
and Whitehall, so publishers can see when content blocks that their document uses have been updated.

There are two potential solutions, each with their own advantages and drawbacks:

### Solution 1: Interweave content block updates in with Whitehall's history

In order to do this, we need to update the Publishing API to record an event when a document has been
republished as a result to a change to a content block, as well as add an endpoint that allows us to
see the events for a particular document, as well as filtering by event type and date.

An JSON representation of event object will look like this:

```json
{
"id": 115,
"action": "HostContentUpdateJob",
"user_uid": null,
"created_at": "2024-11-28T14:14:11.375Z",
"updated_at": "2024-11-28T14:14:11.375Z",
"request_id": "91cfbab2f3ff8889ff55a1c7b308d60c",
"content_id": "0c643225-b5ae-4bd4-8c5d-9d8911433e28",
"payload": {
"locale": "en",
"message": "Host content updated by content block update",
"content_id": "0c643225-b5ae-4bd4-8c5d-9d8911433e28",
"source_block": {
"title": " Universal Credit Helpline ",
"content_id": "a55a917b-740f-466b-9b31-9a9df4526de4",
}
}
}
```

When a document is loaded in Whitehall, we could then call the API and weave these events into the timeline.
However, this is complicated by the fact that Whitehall's document history is paginated, so we won't necessarily
have the full Whitehall history at load time and won't necessarily know the full date window of Publishing events
to fetch. For example:

A document has the following range of event datetimes for the first page:

```
2024-03-23T09:23:00
.....
2023-12-10T11:13:00
```

And a range of event datetimes for the second page

```
2023-11-22T12:27:00
...
2023-09-12T15:17:00
```

If we have an event that happens between `2023-11-22T12:27:00` (the newest event for the second page) and
`2023-12-10T11:13:00` (the oldest event for the first page) it won't get picked up because it doesn't occur
within that range of events.

We could get around this by making a request to get the datetime of the first event on the next page, thus
giving us a full window of dates to interleave, but this makes an already [complex class][1] harder to understand.

Additionally, making an extra database query and calling out to an API endpoint could have performance impacts.

It's also worth considering that currently, we display 10 events on each "page" of results. If we are interleaving
new events with each page of results, this could be confusing for the user if they only expect to see 10 results.

Another solution could be sending a request to the Publishing API endpoint before we fetch the history and then creating
new events, however:

1. This will result in an API call every time a user views a document; and
2. Carrying out an INSERT query on a GET request isn't a pattern we want to encourage

## Solution 2: Add a new message consumer in Whitehall

This would involve setting up a new RabbitMQ message topic in Publishing API that sends
messages when a content block update triggers a change to a document. This would be a brand new
topic that contains a thin message that includes the `content_id` of the document that has
been updated, when it was updated and information about the content block that triggered the update:

```json
{
"locale": "en",
"content_id": "0c643225-b5ae-4bd4-8c5d-9d8911433e28",
"updated_at": "2024-11-28T14:14:11.375Z",
"content_block": {
"title": " Universal Credit Helpline ",
"content_id": "a55a917b-740f-466b-9b31-9a9df4526de4",
}
}
```

We will then set up a queue in Whitehall to listen for events with the relevant key. When an
event has been received, we create a new event in Whitehall (something like an `EditorialRemark`)
for the document with that `content_id`.

This will require a bit more work on both the Publishing API and Whitehall side and will involve
a degree of opacity (as well as extra lines on an architecture graph), but this will avoid complexity
when rendering the history of the document.

## Decision

We propose going with Solution 2.

## Consequences

We will need to set up a RabbitMQ consumer in Whitehall, which will require some minor work on the
ops side of things. It will also mean we will need to consider two-way communication between the
two applications when thinking about the publishing platform architecture.

However, once this is set up, this could potentially open up the possibility of more two way
communication between Whitehall and Publishing API in the future, such as feeding back to
the user when something has not published successfully.

## Alternatives considered

We could remove pagination entirely from the events, or carry out in-memory pagination, but these
options could result in performance issues, especially with older documents. We would also have to
make an API call to Publishing API each time a document is loaded, which could slow things down.

Another option could be to treat Publishing API as the source of truth for the history of a document,
but this could be a considerably more complex piece of work, which we would have limited resource for.
If we decided in the future that it was worth the investment of time, we could still do this further
down the line.

[1]: https://github.com/alphagov/whitehall/blob/main/app/models/document/paginated_timeline.rb

0 comments on commit 905691f

Please sign in to comment.