Skip to content

API Logic

mark-bell-tna edited this page Jun 2, 2023 · 14 revisions

Introduction to Set notation (click arrow to read)

Set notation

The API logic will be described in Set notation due to its theoretical basis and compactness.

For our purposes, a set is a collection of distinct objects. Sets will be named using capital letters, $A$, $B$ etc.

Subscripts may also be used to distinguish specific subsets, $A_{1}$

The Union of two sets is a set containing all of the objects which are contained in either, or both, of the two sets. The $\cup$ symbol is used to signify a union, $A \cup B$

The Intersection of two sets is a set containing all of the objects which are contained in both sets. The $\cap$ symbol is used to signify and intersection, $A \cap B$

Two important results from Set theory are that:

$A \cap (B \cup C) = (A \cap B) \cup (A \cap C)$

$A \cup (B \cap C) = (A \cup B) \cap (A \cup C)$

If all of the members of a set, $A$, are also members of another set, $B$, we say $A$ is a subset of $B$ and use the notation $A \subseteq B$

The reverse of this notation is $A \supset B$ which means that the set $A$ is a superset of $B$.

A property, $p$, is a characteristic that applies to a given set, X, and is defined through a function which maps members of the set to True (they have the property) or False (they do not). The notation for this is $p: X \to [ true, false ] $

We will use $p_{i}$ when defining and referencing properties.

API notation

Specific to this API we will work with the following sets:

Set Description
$A$ The set of all archives
$C$ The set of all collections
$D$ The set of all documents
$E$ The set of all entities (actors, locations, events, objects)

There is an explicit hierarchical structure in the first three entries (which we will call Hierarchical Entities when necessary). Archives contain collections which contain documents. When generalising to any hierarchical entity we will use the set $H$, when generalising to all entities (i.e $H \cup E$) we will use $X$

Since each of the four sets $A, C, D, E$ are collections of different types of objects we will introduce some shorthand to keep the notation clean.

Subsets of $A, C, D, E$ will be defined through properties, and will be named using subscripts: $A_{1}, A_{2}, C_{1}, ...$ etc.

For two sets of different types (e.g.) $A_{1}, D$, the intersection $A_{1} \cap D$ is shorthand for $D \mid D \subseteq A_{1}$; i.e. the set of Documents contained in the Archives in the set $A_{1}$ ($\mid$ stands for "such that", so that statement can be read as "Documents such that the document is contained in the set of Archives, $A_{1}$")

Where the two sets are of the same type the intersection (and union) are used as normal. Two sets of the same type will use the same capital letter but with different subscripts (e.g. $A_{1}, A_{2}$)

Entities are a separate case as they may be related to any of Archives, Collections, Documents. For example, a document may have been written by a person, there may also be a collection of documents belonging to a person, and in exceptional cases an archive dedicated to a person.

In linked data the Entities are related to the hierarchical entities but for the purposes of the API they will be modelled as properties. Again for the sake of clean notation we will use shorthand and the Intersection symbol between Hierarchical Entities and Entities will imply that the hierarchical entity has an entity as a property.

We will use the notation $X_{i} = X \cap p_{i}$ to mean the set of Entities which have the property $p_{i}"

As a more concrete example, if $E_{1}$ is the entity representing a specific location, then $A \cap E_{1}$ is the set of Archives with that location as a property (i.e. archives linked/related to that location). This could be their physical address, or they are a local archive representing that location (a county archive, for example).

Since entities can be linked to any level of the hierarchy, the symbol $E$ will be shorthand for $E \cup (A \cap C \cap D)$

Pictorial representation

graph TD;

    Archive-->Collection;
    Collection-->Document;
    Archive-->Entity;
    Collection-->Entity;
    Document-->Entity;
    Entity-->Entity;
Loading

API Query Logic

An API Query will be of the form:

$(A \cap A_{1} \cap ... ) \cap$

$(C \cap C_{1} \cap ... ) \cap$

$(D \cap D_{1} \cap ... ) \cap$

$(E \cap E_{1} \cap ... ) $

where the subsets $A{1} ..., C_{1} ... $ etc. are optional.

A subset $H_{i}$ is defined through properties. If we generalise the term Entity as including hierarchical entities, then properties are either Entity properties, or Entities themselves. All entities have an Identifier, a Name, and a Type; these are Entity properties. They may also have other properties which are other Entities (actor, location, event, object).

For example, a Document has a Name (the title) but was also created by a person (Actor), at/during a time (Event). That person has a Name, but they in turn were born somewhere (Location), on a day (Event).

When building a query each filter applied will generally result in an Intersection operation between sets. The exception to this is when multiple filters of the same type are included, in which case a Union operation is used.

Example:

Set $H_{1} = H \cap E_{1} \cap E_{2}$ where $E_{1}$ is a Location and $E_{2}$ is an Event

Set $H_{1} = H \cap (E_{1} \cup E_{2})$ where $E_{1}$ is a Location and $E_{2}$ is also a Location. This is the same as $(H \cap E_{1}) \cup (H \cap E_{2})$

The final part of the API call was the filter by Entity ($(E \cap E_{1} \cap ... ) $). Remember that $E$ is shorthand for $E \cap (A \cup C \cup D)$. When $E_{i}$ is applied as a property of a member of $H$ the part within the brackets cancels out with the rest of the expression, but this is not the case when $E_{i}$ is a filter in its own right. The effect is as follows:

For subsets $A_{1}, C_{1}, D_{1}$ and $E_{1} [= E_{1} \cap (A \cup C \cup D)]$ the resulting query is:

$(A_{1} \cap E_{1}) \cup (C_{1} \cap E_{1}) \cup (D_{1} \cap E_{1})$

Return types

The API will return one of Archive, Collection, Document, Entity. For a query $Q_{1}$ returning a response $R$ (where $R$ is in $A, C, D, E$), the final response will be $R \cap Q$. This is shorthand for $Q \subseteq R$.

Query Examples

Q1. Archives with name "X"
> $p_{1}$ is the property with name = "X"; $p_{2}$ is the property with type = "Location"; $E_{1} = E \cap p_{1} \cap p_{2}$

> The query returns $A_{1} = A \cap E_{1}$
Q2. Archives with documents created in year Y
> $p_{1}$ is the property with name = 'created' and date value = Y and type = "Event"; $E_{1} = E \cap p_{1}$

> This returns Documents, $D_{1} = D \cap E_{1}$

> The final query returns $A \cap D_{1}$ which is the set of all Archives containing the documents in $D_{1}$