From 7441c47f7a2cbca4773e399d66508d8719a18f0a Mon Sep 17 00:00:00 2001 From: Brett Sun Date: Mon, 18 Jul 2016 17:54:50 +0200 Subject: [PATCH 1/5] Edit JSON-LD and schema.org sections --- coala-ip/README.md | 198 +++++++++++++++++++++------------------------ 1 file changed, 94 insertions(+), 104 deletions(-) diff --git a/coala-ip/README.md b/coala-ip/README.md index 027585d..094e675 100644 --- a/coala-ip/README.md +++ b/coala-ip/README.md @@ -311,10 +311,10 @@ highlight some of its main features. #### JSON Linked Data -[JSON-Linked Data](https://www.w3.org/TR/json-ld/) (short form: JSON-LD) is a data structure merging -the concepts of the [Resource Description Framework](https://www.w3.org/TR/rdf11-concepts/) with -[JSON](https://tools.ietf.org/html/rfc7159). Using the concept of a "context", it allows to provide -additional mappings by linking JSON-object properties to RDF schemata in an ontology. +[JSON-Linked Data](https://www.w3.org/TR/json-ld/) (JSON-LD) is a data structure merging the +concepts of the [Resource Description Framework](https://www.w3.org/TR/rdf11-concepts/) with +[JSON](https://tools.ietf.org/html/rfc7159). Using the concept of a `context`, it allows users to +link a JSON object's property to their corresponding RDF schemata in an ontology. Lets assume we have the following set of data: @@ -328,15 +328,14 @@ Lets assume we have the following set of data: ``` -Now, for a human it's obvious that this set of data is about a person named "Andy Warhol" who was -born on the 6th August 1928. For a machine that is lacking the intuition and _context_ of a human, -resolving this representation is rather difficult. +For a human it's obvious that this is about a person named "Andy Warhol" who was born on August +6th, 1928. However, for a machine that lacks the intuition and *context* of a human, +resolving this representation into the same conclusion is rather difficult. -JSON-LD solves this problem by introducing the concept of a "context" into JSON documents. On a high -level, this allows to link data to already defined schemata. In order to include "context" into a -JSON-object a key called `@context`, needs to be included that defines or references the schema of -the underlying data. Using JSON-LD to define our previously mentioned example, it would look like -this: +JSON-LD solves this problem by introducing a `context` into JSON documents; on a high level, this +allows data to be linked to already defined schemata. Adding a special `@context` key to the +document provides a reference to the schema of the underlying data. Transforming our previous +example to use JSON-LD would result in: ```javascript @@ -349,40 +348,44 @@ this: ``` -Using the JSON-LD-specific keyword `@context` - pointing to a resource that defines how our data -should look like - a JSON-LD parser could `GET http://schema.org/Person` the schema and validate the -attached data against it. If some other application developer were to be handling this kind of data -for their users, they could rely on the same schema definition. This would unify data representation -across services to enable cross-service data exchange without the need for data-transformation. - -Think of it like this: Twitter, Facebook, Github, Instagram - they all have the notion of a user -model for example. Some of them might name the key of the user's birthday `birthDay`, while others -name it `dayOfBirth`, while again others would name it `birth_day`. All those keys however, have the -same semantic meaning for a user model, as they define when the user was born. Even worse, imagine -they'd all use different formats for the user's birthday value (e.g. not being not compliant with -[ISO 8601](http://www.iso.org/iso/catalogue_detail?csnumber=40874). This would mean that not only -for mapping keys custom logic would have to be written, but for most value fields as well. - -Since JSON-LD is simply a serialization format of RDF, and since [RDF's primitive data types are -based on XML schema](https://www.w3.org/TR/rdf11-concepts/#section-Datatypes), the problem is -circumvented at the base, as all advanced data types are derived from primitive data types. - -Going back to the example, a remaining question is: How does JSON-LD know how to map our -self-defined key (`givenName`, `familyName` and `birthDate`) names to the properties of schema.org's -Person? Turns out we didn't choose those key names randomly. They're already part of the -schema.org's Person definition, hence a JSON-LD parser is capable to map them automatically and then -execute validation against it. +Upon seeing this data, a JSON-LD parser could use the `@context` property and send a `GET` to +`http://schema.org/Person` to receive the defined schema and perform validation. Now, if another +application developer were to handle this data, they could also rely on the same schema definition +rather than their own; over time, as more and more services use JSON-LD, data representations across +services would begin to unify to improve cross-service data interoperability. + +Think of it like this: Twitter, Facebook, Github, Instagram, etc---they all have the notion of a +user model. Some might use `birthday` as the key for the user's birthday while others use +`dayOfBirth` or others still `birth_day`. All those keys, however, have the same semantic meaning on +a user model: they all define when the user was born. Even worse, imagine if they all used different +formats for the user's birthday value (i.e. not being not compliant with [ISO 8601](http://www.iso.org/iso/catalogue_detail?csnumber=40874). +Custom logic would have to be written to not only handle mapping different keys to each other, but +also to convert their value fields into normalized representations. + +However, with JSON-LD, as it's simply a serialization format of RDF, and as [RDF's primitive data +types are based on XML schema](https://www.w3.org/TR/rdf11-concepts/#section-Datatypes), this +problem is circumvented at the data format level because all advanced data types must derive from +primitive data types. + +Going back to the Andy Warhol example above, one remaining piece of magic to be explained is how +JSON-LD maps our self-defined keys (`givenName`, `familyName` and `birthDate`) to the properties of +schema.org's `Person`. If you look at schema.org's `Person` definition, you'll see we didn't exactly +choose random key names; they were already part of the definition. In this case, a JSON-LD parser +is able to automatically map and execute validation against these properties by using the schema +definition. For more clarity, let's see how a JSON-LD parser would look at this example: +1. Notice `@context` contains `http://schema.org/Person` 1. `GET http://schema.org/Person` -2. For each of the user-defined keys, check if they map to the keys provided in the schema - 2.1 If this is the case, traverse the schema until a leaf node (the JSON-LD specification calls - this an `identifying blank node`) is found - 2.2 "Expand" the data, replacing a keys name with an URI to its more granular schema definition +1. For each of the user-defined keys, check if they map to any keys provided in the schema + 1. If this is the case, traverse the schema until a leaf node (the JSON-LD specification calls + this an `identifying blank node`) is found + 1. "Expand" the data, replacing keys' names with URIs to their more granular schema definitions -To give a practical example, this is how our previously defined set of data would look like: +Going along with the example, this is how our previously defined set of data would look like after +expansion: ```javascript @@ -406,29 +409,25 @@ To give a practical example, this is how our previously defined set of data woul ``` -What we end up with is a much more verbose form of our set of data. In fact, the JSON-LD -specification gives certain forms names. The form that is shown above is called _expanded_ form, as -it was expanded using Person's schema.org definition defined in `@context`. The form in the example -we gave earlier (where we defined `@context`) is called _compacted_ form. +We end up with a much more verbose form of our set of data---what the JSON-LD specification calls +*expanded* form, as the original object's been expanded with its `@context`. The original object's +form, still with an `@context`, is defined by the specification as *compacted* form. -So essentially, what happens is that the JSON-LD parser assumes we defined the correctly named keys -for Person already, so when expanding the compacted version of our JSON-object, it just individually -looks them up at `http://schema.org/Person` and if they're defined, replaces them with more detailed -URIs to their schema definition. What we end up with is a automatically mapped set of data by -simply using what is already out there. Since every key of a given value now points to a leaf node -of a schema ontology and since leaf nodes are only allowed to define the most basic types (like -string, boolean, number), the parser can then easily traverse the document and validate each -occurrence of `@value`. +In summary, the JSON-LD parser assumes we've defined the correctly named keys for a `Person` and +uses `http://schema.org/Person` to individually replace each of our properties with their more +detailed schema definition URIs. The result is an automatically mapped set of data that uses an +already available schema. As every key of a given value now points to a left node on a schema +ontology, and as leaf nodes are only allowed to define the most basic types, such as string, +boolean, integer, etc, the parser can now easily traverse the document and validate each occurence +of `@value`. ##### Final thoughts -With this example, we've just shown you the tip of the iceberg. - -JSON-LD has tremendous powers (Aliasing, Self-Referencing, Built-in types, Indexing, ...) to do all -kinds of crazy things. Since this specification will make use of JSON-LD heavily, we encourage to -learn more about JSON-LD before continuing reading this document. Useful links can be found in the -**Sources** section below. +This example is only just the tip of the iceberg; JSON-LD has tremendous power (e.g. aliasing, +self-referencing, built-in types, indexing, etc) and can do all sorts of crazy things. Before +continuing, we encourage you to learn more about JSON-LD as the rest of this document will rely +heavily on it. Useful links can be found in the **Sources** section below. **Sources:** @@ -439,7 +438,7 @@ learn more about JSON-LD before continuing reading this document. Useful links c May 2016 -#### schema.org +#### Schema.org - TODOs in this section: - Just describing schema.org is I think way to narrow here. This section should be about linked @@ -448,20 +447,20 @@ learn more about JSON-LD before continuing reading this document. Useful links c (http://wiki.dbpedia.org/). Obviously mention schema.org as a prefered source though. -schema.org is a collaborative initiative with the mission to create, maintain and promote schemata +Schema.org is a collaborative initiative with the mission to create, maintain and promote schemata for structured data on the Internet. It's vocabulary is defined as an ontology, connecting different concepts using links. It can be used with different encodings, including RDFa, Microdata and -_JSON-LD_. +*JSON-LD*. ##### Available Schemas -Schema.org includes the following schemata that could be helpful in defining a digital intellectual -property specification based on LCC's EM/RRM: +Schema.org includes the following schemata that are closely related to LCC RRM's `Entity` types; +these will be used later to help define the COALA IP specification: - [schema.org/Person](http://schema.org/Person): See LCC RRM `Party` - [schema.org/Organization](http://schema.org/Organization): See LCC RRM `Party` (A `Person` can be - member of an `Organization`) + a member of an `Organization`) - [schema.org/CreativeWork](http://schema.org/CreativeWork): See LCC RRM `Creation` - [schema.org/Article](http://schema.org/Article) - [schema.org/Blog](http://schema.org/Blog) @@ -494,58 +493,49 @@ property specification based on LCC's EM/RRM: - [schema.org/Place](http://schema.org/Place): See LCC RRM `Place` -*A full list of all core schema.org schemata can be found [here](https://schema.org/docs/full.html).* - +In summary: -##### Extensibility of schema.org +- **What schema.org helps us with:** + - **LCC Party:** [schema.org/Organization](http://schema.org/Organization) and + [schema.org/Person](http://schema.org/Person) + - **LCC Creation:** [schema.org/CreativeWork](http://schema.org/CreativeWork) and all its + subschemata could be used + - **LCC Place:** [schema.org/Place](http://schema.org/Place) + - **LCC Assertion:** [schema.org/AssessAction](http://schema.org/AssessAction) +- **What schema.org *doesn't* help us with (yet?):** + - **LCC Right** + - **LCC RightsAssignment** + - **LCC RightsConflict** -As it is the goal of this specification to convert LCC's RRM to a linked data ontology, we need to -be able to model the seven main LCC RRM entities: Party, Creation, Place, Right, RightsAssignment, -Assertion, RightsConflict. While enumerating schema.org's schemata in the previous section, for -schemata that have similarities to the LCC's definition we already marked them. For clarity though, -we're listing them in this section again to also highlight what schema.org **doesn't** provide yet: +*A full list of all core schema.org schemata can be found [here](https://schema.org/docs/full.html).* -**What schema.org helps us with:** - -- **LCC Party:** [schema.org/Organization](http://schema.org/Organization) and [schema.org/Person](http://schema.org/Person) -- **LCC Creation:** [schema.org/CreativeWork](http://schema.org/CreativeWork) and all its - subschemata could be used -- **LCC Place:** [schema.org/Place](http://schema.org/Place) -- **LCC Assertion:** [schema.org/AssessAction](http://schema.org/AssessAction) - - -**What schema.org _doesn't_ help us with (yet?):** - -- **LCC Right** -- **LCC RightsAssignment** -- **LCC RightsConflict** +##### Extensibility of schema.org -So even though schema.org already helps us by defining some of the LCC models some do not even exist -at all (specifically Rights, RightsAssignment and RightsConflict). Although, this seems like a -problem at first, it is not. schema.org's schemata are easily extensible. schema.org [even -encourages](http://schema.org/docs/extension.html) subclassing their 'core' schemata into so called -'hosted' and 'external' extensions. In general, there are three types of schemata on schema.org: +Although some of the `Entity` types do not exist in schema.org yet (specifically Rights, +RightsAssignment and RightsConflict), their schemata are easily extendible and we can create our own +schemata to fit the needs of LCC. Schema.org [even encourages](http://schema.org/docs/extension.html) +others to subclass their *core* schemata into so called *hosted* and *external* extensions. In +general, there are three types of schemata on schema.org: -- **Core:** A basic vocabulary for describing the kind of entities the most common web applications - need -- **Hosted:** Subclassed models from Core that have their own namespace (e.g. - http://health-lifesci.schema.org/) and were reviewed by the schema.org community. Hosted - extensions should be application-agnostic -- **External:** Subclassed models from Core/Hosted that do have an application-specific namespace - (e.g. http://schema.bigchaindb.com). External extensions may be application-specific +- **Core:** A basic vocabulary for describing the kind of entities most common web applications need +- **Hosted:** Subclassed models from Core that have their own namespace on schema.org (e.g. + http://health-lifesci.schema.org/) and are reviewed by the schema.org community; should be + application-agnostic +- **External:** Subclassed models from Core/Hosted that have an application-specific namespace (e.g. + http://schema.bigchaindb.com); may be application-specific -Applied to the contents of this specification, this would mean that application-agnostic schemata -(so everything contained in LCC RRM) would ideally become a Hosted extension, while -application-specific schemata (data models that are specific for a certain applications or services) -would become External schemata. Fortunately, using schema.org in this way it also complies with -rule five and six of the ten LCC targets for the rights data network, which say: +Applied to the contents of this specification, ideally any application-agnostic schemata---including +all of LCC RRM--would ideally become a *hosted* extension, while application-specific +schemata---data models that are specific for a specific application or service---would become +*external* schemata. Fortunately, leveraging schema.org in this way maintains compliance with rules +five and six of the LCC's "Ten Targets", which say: - Rule 5: Links between identifiers are system agnostic and need to be authorized by participating consortiums -- Rule 6: Meta data is system agnostic and its schema has to be authorized by participating parties +- Rule 6: Metadata is system agnostic and its schema has to be authorized by participating parties or consortiums From fd47ee943d0a037b604570735a6af07aad334ac7 Mon Sep 17 00:00:00 2001 From: Brett Sun Date: Mon, 18 Jul 2016 17:57:29 +0200 Subject: [PATCH 2/5] Edit IPLD and fingerprinting sections --- coala-ip/README.md | 317 ++++++++++++++++++++++----------------------- 1 file changed, 156 insertions(+), 161 deletions(-) diff --git a/coala-ip/README.md b/coala-ip/README.md index 094e675..eccbe72 100644 --- a/coala-ip/README.md +++ b/coala-ip/README.md @@ -548,14 +548,14 @@ five and six of the LCC's "Ten Targets", which say: ### Interplanetary Linked Data -This section describes the functionality of Interplanetary Linked Data (short form: IPLD) and why it -is useful in working with immutable data stores and Linked Data. +This section describes the functionality of Interplanetary Linked Data (IPLD) and its use in working +with immutable data stores and Linked Data. -#### Motivation to use IPLD +#### Motivation for IPLD -IPLD is an attempt to make Linked Data happen on immutable ledgers using hashes for linking (so -called "merkle-links"). Let's assume we have the following set of data describing a person: +IPLD is an attempt to put Linked Data happen on immutable ledgers using hashes for linking (so +called "merkle-links"). Let's assume we have the following set of data describing a person: ```javascript @@ -579,143 +579,142 @@ In addition, we have a set of data describing this person's work: ``` -As of now, both objects are not linked to each other, meaning there is no way to tell that Andy -Warhol is the author of "32 Campbell's Soup Cans". Now, we could use the already introduced Linked -Data approach using JSON-LD. We'd have to make both of the objects resolvable within the Internet, -add `@id`s to their bodies as well as an `author` property to the creation pointing to a location -where the person object is resolvable. +Note that in the above, neither object contains a link to the other; there is no way to tell that +Andy Warhol is the author of "32 Campbell's Soup Cans." We could use the aorementioned JSON-LD to +create a link between the objects by making both of the objects resolvable within the Internet, +adding `@id`s to their bodies, and adding an `author` property to the creation pointing that points +to a resolvable location of the person object. -The problem with this solution is though that we'd have to trust the hosts that make these objects -resolvable. While they'd return the correct objects at first, they'd be free to make any changes to -the objects at any point, potentially allowing for the exploitation of the system. Since there is no -way for resolving actors to integrity-check the object they're requesting, a host could return -arbitrary data unnoticed at any time. Additionally, internal linking within objects, as well as -internal linking from URIs turns out to be challenging using Linked Data protocols like JSON-LD. -Hence in the next section, we're exploring a technology called Interplanetary Linked Data that -promises to solve these problems. +However, the problem with this approach is that we have to trust the hosts that make these objects +resolvable. While hosts might return the correct objects at first, they're free to make any changes +to the objects later---potentially exploiting the system. Since there is no way for resolving actors +to integrity-check the object they're requesting, a host could return arbitrary data at any time +and go unnoticed. Additionally, internal linking within objects, as well as internal linking from +URIs is challenging using Linked Data protocols like JSON-LD. Hence, in the rest of this section, we +explore the features of IPLD that promise to solve these problems. #### IPLD by Example -The following sections give a brief overview of the functionality of IPLD. We choose to demonstrate -the technology by demonstration as a [comprehensible specification of IPLD](https://github.com/ipfs/specs/tree/master/ipld) -exists already. +The following sections give a brief overview of IPLD's functionality. For more information, visit +[IPLD's specification draft](https://github.com/ipfs/specs/tree/master/ipld). ##### Creation of Linked Objects -Using the two objects presented in the example of the previous section, we have to perform the -following steps to link them using IPLD: +Using the person and creation objects previously presented, we perform the following steps to link +them using IPLD: 1. Serialize the person's object to a canonical form of [Concise Binary Object Representation](http://cbor.io/) - (short form: CBOR) + (CBOR) -```python -In [1]: import ipld + ```python + In [1]: import ipld -In [2]: person = { -...: "givenName": "Andy", -...: "familyName": "Warhol", -...: "birthDate": "1928-08-06" -...: } + In [2]: person = { + ...: "givenName": "Andy", + ...: "familyName": "Warhol", + ...: "birthDate": "1928-08-06" + ...: } -In [3]: serialized_person = ipld.marshal(person) -Out[3]: b'\xa3ibirthDatej1928-08-06jfamilyNamefWarholigivenNamedAndy' -``` + In [3]: serialized_person = ipld.marshal(person) + Out[3]: b'\xa3ibirthDatej1928-08-06jfamilyNamefWarholigivenNamedAndy' + ``` -For the purpose of demonstration, we're using an already existent library called [py-ipld](https://github.com/bigchaindb/py-ipld). -In this case `ipld.marshal` does nothing more than serialize the `person` object using a [CBOR -reference implementation](https://bitbucket.org/bodhisnarkva/cbor). As a result, we get a byte -array. + For the purposes of demonstration, we use [py-ipld](https://github.com/bigchaindb/py-ipld), an + already existing python library, to handle IPLD specifics and data transformations. In this + case, `ipld.marshal` does nothing more than serialize the `person` object using a [CBOR + reference implementation](https://bitbucket.org/bodhisnarkva/cbor). As a result, we get a byte + array. -2. Hash the serialized byte array using [multihash](https://github.com/jbenet/multihash) and encode +1. Hash the serialized byte array using [multihash](https://github.com/jbenet/multihash) and encode the hash to base58 -```python -In [4]: ipld.multihash(serialized_person) -Out[4]: 'QmRinxtytQFizqBbcRfJ3i1ts617W8AA8xt53DsPGTfisC' -``` + ```python + In [4]: ipld.multihash(serialized_person) + Out[4]: 'QmRinxtytQFizqBbcRfJ3i1ts617W8AA8xt53DsPGTfisC' + ``` -[Multihash](https://github.com/jbenet/multihash) is a protocol for differentiating outputs from -various well-established cryptographic hash functions. What this means is that every hash generated -with multihash contains a [hexadecimal prefix](https://github.com/jbenet/multihash#table-for-multihash-v100-rc-semver), -symbolizing which hash function has been used for generating it. This is great since hash functions -will often need to be upgraded. Additionally, it allows for multiple hash functions to coexist -within/across applications. + [Multihash](https://github.com/jbenet/multihash) is a protocol for differentiating outputs from + various, well-established, cryptographic hash functions; it prefixes each hash generated with a + [hexadecimal prefix](https://github.com/jbenet/multihash#table-for-multihash-v100-rc-semver) + that symbolizes the hash function that was used. This offers users the ability to identify hash + functions in the future, in case of upgrades, and allows for multiple hash functions to coexist + within/across applications. -Now, since we have converted the person object to an IPLD object and since we also have its hash, we -can link the creation to its author. + Now that we have converted the person object to an IPLD object and also have its hash, we can + link it to the creation as its author. -3. Link the creation object to its creator using the base58 hash representation of the person +1. Link the creation object to its creator using the base58 hash representation of the person -```python -In [5]: creation = { - "name":"32 Campbell's Soup Cans", - "dateCreated": "01-01-1962", - "exampleOfWork": "https://en.wikipedia.org/wiki/Campbell%27s_Soup_Cans#/media/File:Campbells_Soup_Cans_MOMA.jpg", - "author": { "/": "QmRinxtytQFizqBbcRfJ3i1ts617W8AA8xt53DsPGTfisC" } -} -``` + ```python + In [5]: creation = { + "name":"32 Campbell's Soup Cans", + "dateCreated": "01-01-1962", + "exampleOfWork": "https://en.wikipedia.org/wiki/Campbell%27s_Soup_Cans#/media/File:Campbells_Soup_Cans_MOMA.jpg", + "author": { "/": "QmRinxtytQFizqBbcRfJ3i1ts617W8AA8xt53DsPGTfisC" } + } + ``` -Using what is called a "merkle-link" we're connecting the creation to a person by using the hash -we've previously created. Generally, a merkle-link can be schematized like this: + We've now connected the creation to a person by using the person's hash value (thereby creating + a "merkle-link"). Generally, merkle-links can be schematized like this: -```javascript -Property = { - "key": , - "value": -} + ```javascript + Property = { + "key": , + "value": + } -MerkleLink = { - "key": "/", - "value" -} -``` + MerkleLink = { + "key": "/", + "value" + } + ``` -To make this object resolvable as well, what's left to be done is to serialize it also to CBOR and -multihash it. + Finally, to make this creation object also be resolvable, we repeat the first two steps on it: -4. Serialize the creation object to a canonical form of CBOR +1. Serialize the creation object to a canonical form of CBOR -```python -In [6]: serialized_creation = ipld.marshal(creation) -Out[6]: b"\xa4fauthor\xd9\x01\x02x.QmRinxtytQFizqBbcRfJ3i1ts617W8AA8xt53DsPGTfisCkdateCreatedj01-01-1962mexampleOfWorkx]https://en.wikipedia.org/wiki/Campbell%27s_Soup_Cans#/media/File:Campbells_Soup_Cans_MOMA.jpgdnamew32 Campbell's Soup Cans" -``` + ```python + In [6]: serialized_creation = ipld.marshal(creation) + Out[6]: b"\xa4fauthor\xd9\x01\x02x.QmRinxtytQFizqBbcRfJ3i1ts617W8AA8xt53DsPGTfisCkdateCreatedj01-01-1962mexampleOfWorkx]https://en.wikipedia.org/wiki/Campbell%27s_Soup_Cans#/media/File:Campbells_Soup_Cans_MOMA.jpgdnamew32 Campbell's Soup Cans" + ``` -This case is special, in that the merkle-link contained in `creation` is being replaced and -serialized using an [unassigned CBOR tag (258)](https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml), -to make the link retrievable more easily when deserializing the object later on. + Note that this case is special, with the merkle-link contained in `creation` being replaced by + an [unassigned CBOR tag (258)](https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml) to + make the link more easily retrievable on deserialization. -5. Hash the resulting serialized byte array using [multihash](https://github.com/jbenet/multihash) -and encode the hash to base58 +1. Hash the resulting serialized byte array using [multihash](https://github.com/jbenet/multihash) + and encode the hash to base58 -```python -In [7]: ipld.multihash(serialized_creation) -Out[7]: 'QmfMLNLyJZgvSPkNMvsJspRby2oqP6hWZ8Nd2PvKLhudmK' -``` + ```python + In [7]: ipld.multihash(serialized_creation) + Out[7]: 'QmfMLNLyJZgvSPkNMvsJspRby2oqP6hWZ8Nd2PvKLhudmK' + ``` ##### Retrieval of Linked Objects -To further explore IPLD, let's assume that we've put these objects into some kind of database - -actually let's just pretend it's IPFS for now, their identifiers being the hashes we've created. -What this allows us to do now, is using so called "merkle-paths", to resolve any object within IPFS -using its hash, but also dereference all its connecting edges by following the objects merkle-links. -Given the example above, the `author` of the creation can be found using this merkle-path: +To further explore IPLD, let's assume we've put these objects into some kind of database---actually +let's pretend it's IPFS since their identifiers are the hashes we created previously. Now we can use +paths of merkle-links, so-called "merkle-paths", to resolve any object within IPFS using its hash +value and also further dereference any of the object's connecting edges by following its +merkle-links. Given the example above, the `author` of the creation could be found through this +merkle-path: ```python @@ -727,59 +726,57 @@ Out [8]: ``` -As can be seen, both merkle-links (meaning hashes of objects) as well as an objects' properties can -be used to traverse the IPLD object. For addressing a network address format called [multiaddr](https://github.com/jbenet/multiaddr) -is being used. This allows for the construction of protocol-overarching paths to resources. Meaning -that an IPLD object, resolvable on IPFS could point to an IPLD object resolvable within other -ledgers like BigchainDB, Bitcoin, etc... +Notice that IPLD resolves any merkle link, here the creation's author, with the actual object to +make traversing into a merkle link feel similar to regular property access in objects. To address +across network addresses, a format called [multiaddr](https://github.com/jbenet/multiaddr) can be +adopted that allows for the construction of resource paths to be protocol-overarching. Doing so +would allow an IPLD object to maintain resolvable links even if its merkle links were pointing to +separate ledgers (e.g. IPFS, BigchainDB, Bitcoin, etc). #### Evaluation of IPLD -In summary, IPLD is a promising new technology. This section aims to discuss both benefits and -caveats of it: - +In summary, IPLD is a promising new technology albeit with a few cavets. - **Benefits:** - - Cryptographic integrity-checking of data using upgradeable hash functions (multihash) - - Addressability of content instead of "by location" through hashes - - Inter-ledger/database resolvability of data through merkle-paths and multiaddr - - Unification of object identifiers through canonicalized hashing strategy - - Built-in immutability by using merkle-dag data structure - - Future-proof due to usage of future-proof underlying concepts (multi-x) + - Cryptographic integrity-checking of data using upgradable hash functions (multihash) + - Addressability of content through hashes instead of "by location" + - Inter-ledger/database resolvability of data (multiaddr) + - Unification of object identifiers through a canonicalized hashing strategy + - Built-in immutability by using a merkle-dag data structure + - Future-proof due to future-proof underlying concepts (multi-x) - Potentially wide compatibility, even down to the UNIX file system path - Lightweight protocol drafts and implementations - - Deserialization to multitude of other data serialization formats (YML, JSON, XML, ...) + - Deserializes to a multitude of other data serialization formats (YAML, JSON, XML, etc) - **Caveats:** - - Non-standardized protocols (multihash, multiaddr, ...) + - Non-standardized protocols (multi-x) - [Overlap](https://interledger.org/five-bells-condition/spec.html#crypto-conditions-type-registry) with other to-be-standardized protocols - Breakage with exisiting and well-established protocols (e.g. URI vs. multiaddr) - - Non-compliance with existing Linked Data ontology due to immutability + - Non-compliance with existing Linked Data ontologies due to immutability - Opinionated CBOR serialization #### Compatibility of IPLD and JSON-LD -Even though the naming and concept of IPLD was inspired by JSON-LD, the two concepts have disjoint -sets of functionality. This section highlights how much JSON-LD is possible with IPLD. +Although the naming and concept of IPLD was inspired by JSON-LD, the two have disjoint sets of +functionality. This section highlights the limitations imposed on JSON-LD by IPLD. #### Self-identifying JSON-LD objects -As mentioned in one of the previous sections about JSON-LD already, a JSON-LD object can have a -self-identifying link that is added to the object using the `@id` property. This means that a -JSON-LD object itself is able to express where it's located at. The like is true with IPLD. By -complying to a canonicalized representation of CBOR and by multihashing this representation, an IPLD -object itself is also able to express where it can be resolved within the Internet - it's simply -Content-Addressing -, even though it cannot express the location it is stored under. Trying to -combine these two concepts however, is not possible since the `@id` JSON-LD identifier would need to -be replaced with the multihash of its object (itself containing that hash). The amount of processing -required to solve this task would be incredible, which would render the identification of objects -incredibly inefficient economically speaking, which is why a JSON-LD object using IPLD links cannot -self-identify using the `@id` property. As mentioned though, this is not a problem as objects -generally identify themselves in the world of Content-Addressing. - +JSON-LD objects can maintain a self-identifying link using the `@id` property, which allows the +object to directly express its location to others. The same is possible for IPLD objects; by +complying to a canonicalized representation of CBOR and multihashing this representation, an IPLD +object is also able to express where it can be resolved on the Internet---its hash value is simply +its Content-Address---even though it cannot directly express the location it is stored under. +However, combining these two concepts is practically impossible since the `@id` JSON-LD identifier +would need to be replaced with the multihash of its object; one could view this as a cryptographic +puzzle of sorts: finding a value that would be hashed, along with the object's other properties, to +the same value. The amount of processing required would be incredible, rendering the identification +of objects incredibly inefficient; instead, we would rather disallow JSON-LD objects with IPLD links +from self-identifying themselves using an `@id` property. As mentioned though, this is usually not a +problem as objects can also identify themselves through Content-Addressing. **Sources:** @@ -804,46 +801,44 @@ generally identify themselves in the world of Content-Addressing. ### Fingerprinting -Specifying the originality and provenance of a physical thing is challenging. This is even more true -for a digital thing. While a physical fake might easily be identified by, for example using -chemical procedures, the same is not true for digital files. With ease they can be copied, -mashup'ed, cropped and so on, meaning that by identifying one manifestation of a work and computing -an identifier based on that single manifestation, the actual body of work cannot sufficiently -defined. - -The LCC is in line with this fact. In their document about the Ten Targets for a Global Rights -Network, they talk about cross-standard identifiers that can, if needed, be *transformed* into -corresponding different identifiers. +Specifying the originality and provenance of a physical object is challenging. This is even more +true for digital objects. While a physical fake might be easily identified by, for example, chemical +procedures, the same is not true for digital files which may be exact copies. Moreover, one can +easily modify others' works so that even if one manifestation of a work is identifiable, the actual +body of work behind the manifestation may not be sufficiently indentifiable. -This section discusses a similar idea, namely the existence of an arbitrary complex vector used to -link all the identifiers of manifestations to the single identifier of a work on a global rights +The LCC is aware of with this fact. In their "Ten Targets" document, they talk about cross-standard +identifiers that can, if needed, be *transformed* into alternative identifiers. This section +discusses a similar idea: the existence of an arbitrarily complex vector that can be used to link +all the alternative identifiers of a manifestation to their single identifier on a global rights registry. In this sense, every function that takes a digital asset as an input and yields a fixed length value -(be it number, string or float) as an output could potentially then be called a **Fingerprinting -function**. In it's simplest form, a hash function inspecting the arrangement of bytes in a digital -asset creating a unique digest. In more elaborate forms: - -- [Image-match](https://github.com/ascribe/image-match): a simple Python module for finding - approximate image matches from a corpus -- [pHash](http://www.phash.org/): A hash derived from various features of a digital asset -- [dejavu](https://github.com/worldveil/dejavu): A audio fingerprinting and recognition algorithm in - Python +(be it a string, integer, float, etc) could potentially be called a **Fingerprinting function**. In +its simplest form: a hash function that inspects the arrangement of bytes in a digital asset and +returns a integer. In more elaborate forms: + +- [Image-match](https://github.com/ascribe/image-match): An approximate image match algorithm + implemented in Python +- [pHash](http://www.phash.org/): A hashing method using various features of a digital asset +- [dejavu](https://github.com/worldveil/dejavu): An audio fingerprinting and recognition algorithm + implemented in Python - TODO: List more libraries - Find popular ones that do fingerprinting for all kinds of media types -While a manifestation of a digital creation initially might only have a single fingerprint which was -generated by feeding its bytes to an arbitrary hashing function, more elaborate fingerprinting -technologies could help identifying the usage of creations within the Internet automatically. Paired -with Linked Data, this would allow for storing all that information about the usages and -manifestations of a work in an arbitrarily complex graph. Copies, mash-ups and modified versions of -the asset could automatically be identified, paths easily located to the original manifestation of -the work and the author be fairly attributed and compensated. +While a manifestation of a digital creation may initially only have a single fingerprint generated +by an arbitrary hashing function, more elaborate fingerprinting schemes could later be used to help +automatically identify the creations' usage across the Internet. Paired with Linked Data, this would +allow one to store and track all the information about a work's usages and manifestations in an +arbitrarily complex graph. Copies, mash-ups, and modified versions of the asset could then be +identified automatically as paths in the graph; traversing through these paths would yield the +original manifestations and open up the possibility of fairly attributing and compensating the +creators. -As rightsholder information would be ubiquitously accessible, allowing rights users to acquire -rights, involved players in the system would be incentivized to create more elaborate fingerprinting -mechanisms, increasing transparency in the system further. +Over time, as rightsholder information becomes ubiquitously accessible and rights users are allowed +to acquire rights, involved players in the system would be incentivized to create more and more +elaborate fingerprinting mechanisms to further increase transparency in the system. ### The Interledger Protocol @@ -1077,7 +1072,7 @@ First off, lets look at some requirements various involved parties have given: (short form: IRI), either absolute or relative -**LCC's ten targets for the rights data network:** +**LCC's Ten Targets for the rights data network:** - A Party's identifier should be represented as an [International Standard Name Identifier](http://www.iso.org/iso/catalogue_detail?csnumber=44292) (short form: ISNI) linking to @@ -1176,7 +1171,7 @@ property isn't necessary: As lots of the users' data will be saved on public ledgers - meaning the user is required to sign -the meta data they're submitting - we'll need to make sure to map their cryptographical identity to +the metadata they're submitting - we'll need to make sure to map their cryptographical identity to their registered identity. Luckily, the [Friend of a Friend Project](http://www.foaf-project.org/) has us covered already by providing [an RDF ontology for signing RDF documents using Public Key Cryptography](http://xmlns.com/wot/0.1/), called Web of Trust RDF. Integrating this ontology into @@ -1195,7 +1190,7 @@ studying the LCC RRM document, it becomes clear that theoretically these require fulfilled, as there could be use cases where: - multiple Parties share a relationship (e.g. Party A and Party B created Creation C) -- Parties might provide Places as a meta data (think: their home location, a contact place or a +- Parties might provide Places as a metadata (think: their home location, a contact place or a billing address) - multiple Parties may be bundled to an Organization @@ -1604,7 +1599,7 @@ looking at schemata for asset transfers. As this specification's scope is to mak manageable on an immutable ledger though, we're not interested in defining this type of schema, as we think an immutable ledger must have this operation already built in from the start. This specification's goal is to be able to run on as many ledgers as possible. Both IPLD and the -Interledger Protocol were chosen consciously, to establish a meta data and licensing ontology that +Interledger Protocol were chosen consciously, to establish a metadata and licensing ontology that can potentially overspan many ledgers and immutable data stores. General requirements for a ledger's transactions are: @@ -1750,7 +1745,7 @@ What we'd end up with is the following: As changing any of the objects values of key would provoke a change in the object's IPLD hash and since mutating data is not possible anyways, we could also simply point the assertion to the object -itself. However then, we'd probably lose valuable meta data and it would be difficult to find out +itself. However then, we'd probably lose valuable metadata and it would be difficult to find out why an object was flagged by an asserter. From 388ac111eef68efdeed35b95b075c6def054c857 Mon Sep 17 00:00:00 2001 From: Brett Sun Date: Wed, 20 Jul 2016 14:51:22 +0200 Subject: [PATCH 3/5] Edit transformation introduction and Place transformation sections --- coala-ip/README.md | 108 +++++++++++++++++++++++++-------------------- 1 file changed, 60 insertions(+), 48 deletions(-) diff --git a/coala-ip/README.md b/coala-ip/README.md index eccbe72..f4f8b52 100644 --- a/coala-ip/README.md +++ b/coala-ip/README.md @@ -91,7 +91,7 @@ The implementation of COALA IP's vision could be distinguished into three major digitally handling licensing of intellectual property on immutable ledgers 1. Building a community to find and define a minimally-viable set of data for licensing intellectual property -1. Defining a free and open messaging/communication protocol for license-transactions +1. Defining a free and open messaging/communication protocol for licensing transactions ## Introduction @@ -125,8 +125,8 @@ which is composed of the following ten goals: 1. Every Party has a unique global identifier 2. Every Creation has a unique global identifier 3. Every Right has a unique global identifier -4. All identifiers have a [URI](https://www.w3.org/Addressing/URL/uri-spec.html) representation to persistently and predictably resolve them within the - Internet +4. All identifiers have a [URI](https://www.w3.org/Addressing/URL/uri-spec.html) representation to + persistently and predictably resolve them within the Internet 5. Links between identifiers are system agnostic and need to be authorized by participating consortiums 6. Metadata is system agnostic and its schema has to be authorized by participating parties or @@ -259,9 +259,9 @@ Description Framework (RDF), is briefly described in the following section. #### The Resource Description Framework [Resource Description Framework](https://www.w3.org/TR/rdf11-concepts/) (RDF) is a framework for -describing entities on the Web. Since it uses the Universal Resource Identifier (URI), a -generalization of the Universal Resource Location (URL), as a scheme to address resources, it is -exceptionally interoperable and extensible. +describing entities on the Web. Since it uses the [Universal Resource Identifier](https://tools.ietf.org/html/rfc1630) +(URI), a generalization of the Universal Resource Location (URL), as a scheme to address resources, +it is exceptionally interoperable and extensible. RDF's core data structure is a graph-based data model that uses sets of triplets, each consisting of a **subject**, **predicate** and an **object**, to construct subsets of the graph. In its smallest @@ -422,7 +422,7 @@ boolean, integer, etc, the parser can now easily traverse the document and valid of `@value`. -##### Final thoughts +##### Final Thoughts This example is only just the tip of the iceberg; JSON-LD has tremendous power (e.g. aliasing, self-referencing, built-in types, indexing, etc) and can do all sorts of crazy things. Before @@ -511,7 +511,7 @@ In summary: *A full list of all core schema.org schemata can be found [here](https://schema.org/docs/full.html).* -##### Extensibility of schema.org +##### Extensibility of Schema.org Although some of the `Entity` types do not exist in schema.org yet (specifically Rights, RightsAssignment and RightsConflict), their schemata are easily extendible and we can create our own @@ -763,7 +763,7 @@ Although the naming and concept of IPLD was inspired by JSON-LD, the two have di functionality. This section highlights the limitations imposed on JSON-LD by IPLD. -#### Self-identifying JSON-LD objects +#### Self-identifying JSON-LD Objects JSON-LD objects can maintain a self-identifying link using the `@id` property, which allows the object to directly express its location to others. The same is possible for IPLD objects; by @@ -849,48 +849,58 @@ elaborate fingerprinting mechanisms to further increase transparency in the syst - Basic same formalities as in all the sections before apply. -## Remodeling the LCC RRM using Linked Data +## COALA IP: Remodeling the LCC RRM with Linked Data -In this section we describe how LCC's Rights Reference Model can be modeled using JSON-LD, IPLD and -schema.org. In other words, we'll go over each model description given in the LCC Rights Reference -Model document and discuss how the respective model can be translated into Linked Data. +In this section we define and discuss guidelines for transforming each `Entity` type fom the LCC's +Rights Reference Model to Linked Data by using JSON-LD, IPLD and schema.org. This will form the +basis for the COALA IP communication protocol. +### What Linked Data Gives Us Out of the Box + +As a building block of RRM, the LCC first defines a generic, linkable Entity Model whose entities +can be composed together to create an extendable data model for intellectual property. However, by +using an RDF-based data structure, we can skip the transformation of these basic entities as RDF +already provides us with a base data structure for linking entities. ### General Approach -The section abstractly describes how to get from a LCC RRM model to a RDF-compatible JSON-LD/IPLD -model. As mentioned earlier already, with their document "[LCC: Entity Model](http://doi.org/10.1000/285)", -they defined a generic model to base their actual Rights Reference Model on. What this document in -essence describes, is how to implement a data model that is fully extensible using a multitude of -linked entities. Using an RDF-based data structure in turn, means that defining a base data -structure for linking entities is not necessary anymore, as this is what RDF is all about already. To successfully redefine the LCC's Rights Reference Model, the following steps are required: -- Identify RDF schemata that map to respective entities defined in the LCC RRM specification - - If appropriate RDF schemata are not available: - - Compose own RDF types from multiple RDF schemata - - Define own RDF schemata -- Define how entities are identified and resolved +- Identify RDF schemata that can be mapped to an LCC RRM `Entity` type + - If no appropriate RDF schemata exist, either: + - Compose our own RDF types from multiple, existing RDF schemata, or + - Define our own specialized RDF schemata +- Define how entities can be identified and resolved - Resolve mismatches between the LCC RRM lingo and RDF schemata -### The LCC Place Model +A slight speed bump in the transformation process is to ensure support for links between entities; +while the RRM defines the existance of links in a generic manner, e.g. as one-to-many (i.e. `0 - n`) +links, RDF and Linked Data require these links to be explicitly named so as to express specific +facts within their ontologies. A case in point is how schema.org's schemata often include a finite +set of links that can be mapped to the RRM's links but can not directly support the possibly +infinite number of such links required by the RRM. However, we can overcome this issue with a bit of +effort by extending the base JSON-LD schemata, or its underlying RDF implementation; such extensions +could be hosted by schema.org as a *hosted* extension, or by others as *extended* extensions. + -In the LCC Framework, a Place describes a localizable or virtual place. It has the following +### The LCC Place `Entity` + +In the LCC RRM, a Place describes a localizable or virtual place. It contains the following property: -- **PlaceType:** Defining the type of a Place - - `lcc:LocalizablePlace`: A Place in the universe that can be described using spatial +- **PlaceType:** Defines the type of a Place; is one of: + - `lcc:LocalizablePlace`: A Place in the physical universe that can be located by spatial coordinates - - `lcc:VirtualPlace`: A non-localizable Place at which a resource may be located under + - `lcc:VirtualPlace`: A non-localizable Place at which a resource may be located at -In addition, a Place can have the following outgoing reference to respective other entities: +In addition, a Place can have the following outgoing links to other entities: -- a self-referencing link (one-to-many) +- Links to other Places (`0 - n`; one-to-many): *RelatedPlace* -Visualized the LCC RRM Place looks like this: +Visualized, the RRM Place looks like: ![](media/lccrrmplace.png) @@ -898,26 +908,26 @@ Visualized the LCC RRM Place looks like this: #### Proposed Transformation -Compared to schema.org's definition of a Place, the LCC RRM Place both describes a physical as well -as a virtual Place. In this specification though, we need to separate the two concepts explicitly -upfront, to avoid confusions further in the transformation process. Neither a URI, nor a IPLD -merkle-link is able to represent a physical location which is why in the context of this -specification, they're links pointing to resources while the LCC Place model will solely be used to -point to a physical place. +Differing from schema.org's definition of a Place (a physical location), RRM's Place is able to +describe both physical as well as virtual Places. However, to avoid confusion later in the +transformation process, we explicitly separate these two concepts upfront. As neither URIs nor IPLD +merkle-links are able to represent physical locations, we use them solely as links pointing to +resources to let RRM Places unambiguously point to physical places. -For further reference, a: +In concrete terms, this means that an RRM Place with `PlaceType == lcc:LocalizablePlace` will be +transformed into an RDF representation, while an RRM Place with `PlaceType == lcc:VirtualPlace` will +be represented as a URI or IPLD hash that points to a dataset. -- **LLC RRM Place or Place** will be used to describe a localizable Place, meaning a Place in the - universe that can be described using spatial coordinates -- **Universal Resource Identifier** or **IPLD merkle-link** will be used to describe a virtual place - at which a resource may be located under +For further reference, we will use a: +- **RRM Place, modelled as a schema.org Place** to describe a *localizable* Place, i.e. a Place + in the physical universe that can be located by spatial coordinates +- **URI** or **IPLD merkle-link** to describe a *virtual* place at which a resource may be located + at -This implies that a LCC RRM Place of `PlaceType == lcc:LocalizablePlace` will be transformed to a -RDF Place, while a LCC RRM Place of `PlaceType == lcc:VirtualPlace` will just be URIs or hashes in -documents, linking in between data sets. -Using schema.org's Place, a transformation is straight forward (example taken from schema.org): +With schema.org's Place, the transformation of a *localizable* Place to RDF is straight forward +(example adapted from schema.org): ```javascript @@ -944,7 +954,9 @@ Using schema.org's Place, a transformation is straight forward (example taken fr } ``` -Using the special `containsPlace` property, self-referencing links to other Places are possible. +To support links to other Places, one can use either of the two already-defined properties on a +schema.org Place: `containsPlace` or `containedInPlace`, or extend the schema with their own +properties. ### The LCC Party Model From 1a438c9f25091b22be46932384f53cf00b772b40 Mon Sep 17 00:00:00 2001 From: Brett Sun Date: Wed, 20 Jul 2016 14:56:25 +0200 Subject: [PATCH 4/5] Edit Party transformation section --- coala-ip/README.md | 306 +++++++++++++++++++-------------------------- 1 file changed, 130 insertions(+), 176 deletions(-) diff --git a/coala-ip/README.md b/coala-ip/README.md index f4f8b52..aa7185e 100644 --- a/coala-ip/README.md +++ b/coala-ip/README.md @@ -959,61 +959,54 @@ schema.org Place: `containsPlace` or `containedInPlace`, or extend the schema wi properties. -### The LCC Party Model +### The LCC Party `Entity` -A recommendation of the LCC is that the Party model must be able to represent any class of party, -meaning: +The LCC recommends that a Party should be able to represent any of the following classes of parties: -- a rightsholder -- a licensor -- a administrator -- a user -- or any other participant doing something related to a right. +- Rightsholders +- Licensors +- Administrators +- Users +- Or any other participants related to rights -In the LCC RRM document a party has to have the following properties: +RRM Parties must have the following properties: -- **PartyType:** Defines whether the party is an individual or a group of individuals -- **DateOfBirth:** Only if `PartyType == 'lcc:Individual'` -- **DateOfDeath:** Only if `PartyType == 'lcc:Individual'` +- **PartyType:** Defines if the Party is an individual or a group of individuals +- **DateOfBirth:** Party's date of birth; only if `PartyType == 'lcc:Individual'` +- **DateOfDeath:** Party's date of death; only if `PartyType == 'lcc:Individual'` -Additionally, a Party can have the following outgoing references to respective other entities: +Additionally, a Party can have the following outgoing links to other entities: -- a self-referencing link (one-to-many relationship) -- a link to a Place (one-to-many relationship) +- Links to other Parties (`0 - n`; one-to-many): *RelatedParty* +- Links to Places (`0 - n`; one-to-many): *RelatedPlace* -Visualized the LCC Party model looks like this: +Visualized, the RRM Party looks like: ![](media/lccrrmparty.png) -Another feature of the LCC RRM Party model is that it must only have a `DateOfBirth` and a -`DateOfDeath` when its `PartyType` is `lcc:Individual`. Other features are that it may have -self-referencing links as well as links to LCC RRM Places. Note that using the property `PartyType` -an LCC RRM Party can both represent an individual as well as an organization. - - #### Proposed Transformation -*Side note: In this chapter, we describe the transformation of the LCC RRM Party model to a -JSON-LD/IPLD Person and/or Organization very literally, as we want to provide reasoning for -individual steps of the transformation. This will just be the case for this chapter, as in essence -the rationale for transforming other models is fairly similar.* +*Note: We will describe the transformation of a RRM Party into a JSON-LD/IPLD Person and +Organization very literally, so as to provide reasoning for the individual steps taken in the +transformation. This will only be the case for this `Entity`, as the rationale for transforming +later `Entity` types will be fairly similar.* -schema.org defines already both a [schema.org/Person](http://schema.org/Person) as well as a -[schema.org/Organization](http://schema.org/Organization). Hence, there is no need to define both -concepts as a single model and differentiate using `PartyType`. To value Separation of Concerns, -lets first transform the LCC RRM Party model with `PartyType == 'lcc:Individual'`, then apply the -discovered results to `PartyType == 'lcc:Organization'`. +Schema.org makes both a [Person](http://schema.org/Person) and an [Organization](http://schema.org/Organization) +available; hence, there is no need to define either concept as a single model differentiated by +`PartyType`. To keep the transformation of the `Entity` into an RDF schema simple, let us first +transform a RRM Party with `PartyType == 'lcc:Individual'` and then apply the learnings to an RRM +Party with `PartyType == 'lcc:Organization'`. -##### Transform LCC RRM Party to RDF Person +##### Transformation of RRM Party to an RDF Person -Using the minimum number of properties the LCC RRM describes, a `PartyType == 'lcc:Individual'` LCC -RRM Party in JSON-LD/IPLD using schema.org's Person could look like this: +Using the minimum number of properties described in the RRM, an RRM Party with `PartyType == +'lcc:Individual'` could look like this as a schema.org Person: ```javascript @@ -1025,28 +1018,27 @@ RRM Party in JSON-LD/IPLD using schema.org's Person could look like this: }, "@type": "http://schema.org/Person", "DateOfBirth": "1928-08-06", - "DateOfDeath": "1987-02-22", + "DateOfDeath": "1987-02-22" } // In IPLD { "@context": { "DateOfBirth": { "/": "" }, - "DateOfDeath": { "/": "" }, + "DateOfDeath": { "/": "" } }, "@type": { "/": "" }, "DateOfBirth": "1928-08-06", - "DateOfDeath": "1987-02-22", + "DateOfDeath": "1987-02-22" } ``` -Now, obviously mapping `birthDate` and `deathDate` of schema.org's Person to LCC's `DayOfBirth` and -`DayOfDeath` doesn't make a lot of sense. Neither do they comply with the way JSON is usually -formated (e.g. first letter is lower case), nor is it necessary to reinvent the wheel on top of -schema.org (by for example coming up with new names for properties). So for simplicity purposes we -simply get rid of the so called JSON-LD-'Aliasing', using the properties schema.org provides us -with: +While there's nothing technically wrong with the above, you may notice on a close inspection of +schema.org/Person that the schema already contains the `birthDate` and `deathDate` properties. +Rather than reinvent the wheel and remap `DayOfBirth` and `DayOfDeath` to these properties, we can +remove the aliasing and instead, use the properties directly defined on schema.org/Person. This gets +us: ```javascript @@ -1055,7 +1047,7 @@ with: "@type": "http://schema.org/Person", "@id": "https://en.wikipedia.org/wiki/Andy_Warhol", "birthDate": "1928-08-06", - "deathDate": "1987-02-22", + "deathDate": "1987-02-22" } // In IPLD @@ -1063,83 +1055,65 @@ with: "@type": { "/": "" }, "@id": "https://en.wikipedia.org/wiki/Andy_Warhol", "birthDate": "1928-08-06", - "deathDate": "1987-02-22", + "deathDate": "1987-02-22" } ``` -In the example, we used Andy Warhol's Wikipedia page as his Party identifier (`@id`). Considering -that fact that all we need to provide is a resolvable URI or an IPLD merkle-link, a JSON-LD parser -will validate this without complaining. Ideally though, `@id` is pointing to a location of the data -itself, showing a JSON-LD parser where its resolvable within the Internet. Since -`https://en.wikipedia.org/wiki/Andy_Warhol`, does not return the given data, we'll have to do -something about this. - -First off, lets look at some requirements various involved parties have given: - - -**JSON-LD:** - -- an `@id`'s value must be represented as an [Internationalized Resource Identifier](https://tools.ietf.org/html/rfc3987) - (short form: IRI), either absolute or relative - - -**LCC's Ten Targets for the rights data network:** - -- A Party's identifier should be represented as an [International Standard Name - Identifier](http://www.iso.org/iso/catalogue_detail?csnumber=44292) (short form: ISNI) linking to - the [International Standard Name Hub](http://www.isni.org) -- A Party's identifier should have an [Universal Resource Identifier](https://tools.ietf.org/html/rfc1630) - representation, so that it can be resolved predictably and persistently within the Internet - - -**LCC's Principles of identification:** - -- A Party should have at least one persistent unique public identifier that is both human- and - machine-readable -- Has a Party multiple public identifiers, then there should be a way that enables one identifier to - be automatically 'translated' to another -- A Party's identifier may have multiple designations (e.g. ISBN-10, ISBN-13, ISBN-A) -- A Party's identifier should have an [Universal Resource Identifier](https://tools.ietf.org/html/rfc1630) - representation -- A Party identifier's characters or elements have no intended meaning that could lead to - misinterpretation by humans -- A Party identifier's characters or elements include no information about the Party itself or its - registration date -- **TODO: There are even more requirements in this document that should be listed here!** - - -**Interplanetary Linked Data:** - -- Any object is addressable using its [multihashed](https://github.com/jbenet/multihash) value - (encoded in base58) - - By using multihash (and later "multibase" - prefixing base-encoding functions) different hash - functions can interoperate and stay upgradeable - - -As we're proposing this practical specification based on the LCC Framework, JSON-LD and IPLD with -the background of saving all linked entity data on public ledgers (read: "Blockchains" or -"Registries"), we'd like to add our own set of requirements: - -- Elements of the Party's identifier may represent the public part of an asymmetric cryptographic - key pair - - If so, the public key should be represented using a unified way of encoding (as inspiration - see [Bitcoin Address public key -encoding](https://en.bitcoin.it/wiki/Technical_background_of_version_1_Bitcoin_addresses) -- A Party must only allowed to be issued when providing at least one valid crypto-key pair - - -As the combination of these requirements do not exist as a coherent system yet, we'll just pretend -for the sake of completeness that there is in fact a system that fulfills them all. Hence for all -following examples, we'll use an imaginary identity service that acts as a registry for the LCC RRM -Party data. It: - -- lets users issue an identity that can be resolved using JSON-LD ([Content - Negotiation](https://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html)) or IPLD -- lets users attach the public part of their key pairs to their identity. - - -Notable services for this type of use case could be: +In the example, we've used Andy Warhol's Wikipedia page as his Party identifier (`@id`). As an `@id` +value is only required to be a resolvable URI or IPLD merkle-link, a JSON-LD parser would validate +this without complaining; however, `@id` would ideally point to the location of the data itself to +show the JSON-LD parser how it could be resolved within the Internet. Unfortunately, Wikipedia +doesn't support this and `https://en.wikipedia.org/wiki/Andy_Warhol` doesn't return the required +data so we'll have to implement this ourselves. + +To start off with, lets look at some limitations and requirements derived from the RRM and JSON-LD / +IPLD: + + +- **LCC's Ten Targets:** + - A Party's identifier should be linked to the [International Standard Name Identifier](http://www.iso.org/iso/catalogue_detail?csnumber=44292) + (ISNI) hub + - A Party's identifier should have an URI representation, so that it can be resolved predictably + and persistently within the Internet +- **LCC's Principles of identification:** + - A Party should have at least one persistent unique public identifier that is both human- and + machine-readable + - If a Party has multiple public identifiers, then there should be a way of automatically + transforming one identifier to another + - A Party's identifier can have multiple designations (e.g. ISBN-10, ISBN-13, ISBN-A, etc) + - A Party's identifier should have an URI representation + - A Party's identifier should not have any intended meaning that could be misinterpreted by + humans + - A Party's identifier should not include any information about the Party itself or its + registration date + - **TODO: There are even more requirements in this document that should be listed here!** +- **JSON-LD:** + - An `@id` value must be represented as an absolute or relative + [Internationalized Resource Identifier](https://tools.ietf.org/html/rfc3987) (IRI) +- **IPLD:** + - Any object must be addressable using its [multihashed](https://github.com/jbenet/multihash) + value (encoded in base58) + - Multihash allows different hash functions to interoperate and stay upgradeable +- And finally, our own requirements to allow for any linked entity data to be put on public ledgers + (e.g. blockchains or registries): + - Elements of the Party's identifier can represent the public part of an asymmetric cryptographic + key-pair + - If so, the public key should be represented by a unified encoding method; as inspiration, + see [Bitcoin's public key addressing](https://en.bitcoin.it/wiki/Technical_background_of_version_1_Bitcoin_addresses) + - A Party can only be created when at least one valid cryptographic key-pair is provided + +Although there is no system currently available that is able to fulfill all of these requirements +and become a registry for RRM Party data, let's pretend, for the sake of completeness, that we have +access to such an identity service in the following examples (preferably, in the future, a +decentralized not-for-profit service is chosen). It lets users: + +- Issue an identity that can be resolved using JSON-LD ([Content Negotiation](https://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html)) + or IPLD, and +- Attach the public part of their key pairs to their identity. + + +Notable, currently-existing services that could be extended to support our use case include: - https://pgp.mit.edu/ - https://keybase.io/ @@ -1147,10 +1121,9 @@ Notable services for this type of use case could be: - https://ipdb.foundation/ -Preferably, a decentralized, non-profit service is chosen. Going back to the previous example and -following the Linked Data JSON-LD approach would mean that we'd have to replace the `@id` of the -data set representing the Party of Andy Warhol ideally with an URI pointing to the data set itself - -being stored on an identity service: +Now equipped with this identity service, we can go back to the example's JSON-LD representation and +replace its `@id` value with an URI pointing to the dataset (the dataset itself living on the +identity service): ```javascript @@ -1166,8 +1139,9 @@ being stored on an identity service: ``` -Using IPLD in contrast, any object can identify itself by being hashed, which means an `@id` -property isn't necessary: +On IPLD, however, we remove the `@id` property; not only are we restricted from using +self-referencing links in IPLD, but such links are actually unnecessary as any object is able to +identify itself by its own hash. Thus, we get: ```javascript @@ -1182,68 +1156,33 @@ property isn't necessary: ``` -As lots of the users' data will be saved on public ledgers - meaning the user is required to sign -the metadata they're submitting - we'll need to make sure to map their cryptographical identity to -their registered identity. Luckily, the [Friend of a Friend Project](http://www.foaf-project.org/) -has us covered already by providing [an RDF ontology for signing RDF documents using Public Key -Cryptography](http://xmlns.com/wot/0.1/), called Web of Trust RDF. Integrating this ontology into -our identity model, it could look like this: - -- TODO: - - Give an code example how WOT could look like in an immutable ledger - - Make sure that the *immutability* is not violated, the WOT ontology as of now only work with - mutability - - -Two other requirements we yet need to resolve are the links proposed in the LCC RRM Party model. As -mentioned previously, there can be a one-to-many relationship from a LCC RRM Party to other LCC RRM -Parties as well as a one-to-many relationship between a LCC RRM Party and LCC RRM Places. Now, when -studying the LCC RRM document, it becomes clear that theoretically these requirements need to be -fulfilled, as there could be use cases where: - -- multiple Parties share a relationship (e.g. Party A and Party B created Creation C) -- Parties might provide Places as a metadata (think: their home location, a contact place or a - billing address) -- multiple Parties may be bundled to an Organization +Finally, to complete the transformation, we need to include support for the possible outgoing links +of an RRM Party: links to other Parties (*RelatedParty*) and links to Places (*RelatedPlace*). To +give some context, a few potential use cases for these links include: +- Multiple Parties sharing a relationship (e.g. Party A and Party B created Creation C) +- Parties providing Places as part of their metadata (e.g. home location, contact place, or billing + address) +- Multiple Parties being bundled together as an Organization -Hence, in this context, it makes sense to define these relationships as a one-to-many relationship. -Using RDF and JSON-LD however, explicitly defining a one-to-many relationship doesn't work out. -Links need to be named and usually express a very specific logical fact in the ontology. While it -means that theoretically these relationships would still be possible, usually in JSON-LD they're -extended as needed, adjusting either the JSON-LD object itself or its underlying RDF implementation. -To give some examples: say we want to specify a Person's home addresses. What we can do in this case -is just use [schema.org/Person](http://schema.org/Person)'s `homeLocation` and specify either -[schema.org/Place](http://schema.org/Place)s or [schema.org/ContactPoint](http://schema.org/ContactPoint)s -and express the link using a named property on Person that points to a location or hash where that -object can be resolved. As another example, imagine we'd like to specify that Party A is a parent -of Party B. This could be useful when trying to express that usage rights of a Creation are -transferred after the death of Party B, to the heir, Party A. Fortunately, schema.org's Person has -us covered again. A [schema.org/Person](http://schema.org/Person) has a property `parent` (accepting -values of type `Person) that maps perfectly. +A few linking possibilities are already covered by schema.org, such as a Person's home address +(schema.org/Person's `homeLocation`; specifying a Place) or parents (schema.org/Person's `parent`; +specifying a Party). If we wanted to define relations that schema.org hadn't already provided, we +could also extend schema.org/Person with our own RDF schema. -Now, assume we wanted to define a relation, schema.org's Person doesn't provide us with yet. What we -would need to do is extend schema.org's Person defining our own RDF schema and then host it -somewhere resolveable within the Internet (IPFS, BigchainDB, a self-hosted webserver, ...). -Since we've now covered all the edge cases of converting a Person to a RDF schema identity, we can -now proceed transforming the `PartyType === lcc:Organization`. +##### Transformation of RRM Party to an RDF Organization - -##### Transform LCC RRM Party to RDF Organization - -In essence, a LCC RRM Party of `PartyType == lcc:Organization` is a group of individuals represented -by a single entity. Using the minimum number of properties the LCC RRM document describes, a LCC -RRM Party of `PartyType == lcc:Organization` in JSON-LD using schema.org's Organization could look -like this: +An RRM Party with `PartyType == lcc:Organization` describes a single entity representing a group of +individuals. Using the minimum number of properties listed in the RRM, the `Entity` type could look +like this as an schema.org Organization. ```javascript // In JSON-LD { - "@context": "http://schema.org/", - "@type": "Organization", + "@type": "http://schema.org/Organization", "@id": "http://identityservice.com/organizations/w3c", "name": "World Wide Web Consortium", "founder": { @@ -1274,9 +1213,24 @@ does the organization just bundle members that act like they were in an organiza independently? -### The LCC Creation Model +##### Allowing Parties to Sign Metadata + +As we envision future identity registries to be built on top of public ledgers, we need to ensure +that users are able to include a cryptographic identity with any registered identities, allowing +them to sign any submitted metadata. Luckily, the [Friend of a Friend Project](http://www.foaf-project.org/) +has already thought about this and provide the [Web of Trust RDF ontology](http://xmlns.com/wot/0.1/) +for signing RDF documents with public-key cryptography. Integrating this ontology into our identity +model, we could get something like: + +- TODO: + - Give an code example how WOT could look like in an immutable ledger + - Make sure that the *immutability* is not violated, the WOT ontology as of now only work with + mutability + + +### The LCC Creation `Entity` -A LCC RRM Creation model describes something directly or indirectly made by human beings. According +A RRM Creation model describes something directly or indirectly made by human beings. According to the specification, it has only a single required property: - **CreationMode:** Can take the values `lcc:Manifestation` or `lcc:Work` From 0078b1a48d95a956642790b8b39c0d26cad79da0 Mon Sep 17 00:00:00 2001 From: tim Date: Thu, 21 Jul 2016 16:35:31 +0200 Subject: [PATCH 5/5] Include feedback --- coala-ip/README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/coala-ip/README.md b/coala-ip/README.md index 64a06ee..3e875fa 100644 --- a/coala-ip/README.md +++ b/coala-ip/README.md @@ -776,7 +776,7 @@ processing required would be incredible, rendering the identification of objects inefficient. Instead, we prevent JSON-LD objects with IPLD links from self-identifying themselves using an `@id` property. This is usually not a problem as objects can also identify themselves through content addressing. ->>>>>>> origin/master + **Sources:** @@ -1003,7 +1003,7 @@ Organization very literally, so as to provide reasoning for the individual steps transformation. This will only be the case for this `Entity`, as the rationale for transforming later `Entity` types will be fairly similar.* -Schema.org makes both a [Person](http://schema.org/Person) and an [Organization](http://schema.org/Organization) +Schema.org makes both a [schema.org/Person](http://schema.org/Person) and an [schema.org/Organization](http://schema.org/Organization) available; hence, there is no need to define either concept as a single model differentiated by `PartyType`. To keep the transformation of the `Entity` into an RDF schema simple, let us first transform a RRM Party with `PartyType == 'lcc:Individual'` and then apply the learnings to an RRM @@ -1070,9 +1070,9 @@ us: In the example, we've used Andy Warhol's Wikipedia page as his Party identifier (`@id`). As an `@id` value is only required to be a resolvable URI or IPLD merkle-link, a JSON-LD parser would validate this without complaining; however, `@id` would ideally point to the location of the data itself to -show the JSON-LD parser how it could be resolved within the Internet. Unfortunately, Wikipedia -doesn't support this and `https://en.wikipedia.org/wiki/Andy_Warhol` doesn't return the required -data so we'll have to implement this ourselves. +show the JSON-LD parser where it could be resolved within the internet. Unfortunately, Wikipedia +doesn't support this, so that `https://en.wikipedia.org/wiki/Andy_Warhol` doesn't return the required +data, which is why we'll have to look for another solution. To start off with, lets look at some limitations and requirements derived from the RRM and JSON-LD / IPLD: @@ -1120,7 +1120,8 @@ decentralized not-for-profit service is chosen). It lets users: - Attach the public part of their key pairs to their identity. -Notable, currently-existing services that could be extended to support our use case include: +Notable, currently-existing services that could be extended to support our use case include +(preferably, a decentralized, non-profit service is chosen): - https://pgp.mit.edu/ - https://keybase.io/