Skip to content

Instantly share code, notes, and snippets.

@tkuhn
Last active May 27, 2022
Embed
What would you like to do?

Knowledge Space

— Draft —

Link for sharing: https://w3id.org/knowledge-space/

Vision

As a researcher or citizen you might sometimes ask yourself questions like this:

Does tea improve human memory? What's the evidence and the scientific consensus on this?

The answer is probably out there on the web, but not in a way that we can automatically gather, interpret, and aggregate. That's why there are currently no websites that give you reliable and up-to-date answers to such questions.

The knowledge space is a vision to change that. It is an ecosystem that allows for sharing knowledge in a radically more efficient and more effective way.

To get a quick idea on how this will work, these rough comparisons might help:

  • The knowledge space is like the Semantic Web, but robust, scalable, and trust-aware via redundancy and cryptography
  • The knowledge space is like a knowledge graph, but open, decentralized, and collaborative
  • The knowledge space is like a blockchain, but without a chain and with logic statements instead of transactions
  • The knowledge space is like a container environment (such as Docker), but for knowledge instead of software

The vision of the knowledge space is described in more detail below in the form of general principles, process sketches, concrete examples, further discussion, and pointers to the existing partial implementations.

General Principles

Knowledge space

  • The knowledge space is an envisaged open and decentralized global socio-technical ecosystem to share human knowledge
  • In the knowledge space, everything is expressed and communicated in formal logic, in a manner that humans can understand and computers can interpret, using a universal and extensible vocabulary

Knowledge records

  • Statements in the knowledge space are expressed and communicated in small knowledge records, making each record individually reusable and referenceable
  • Apart from its main statement, each knowledge record also includes relevant metadata, including information about the creator of the knowledge record and the source of the main statement
  • Each knowledge record is immutable and represented by a unique and cryptographically strong content-based identifier

  • A knowledge record can include a declaration that it retracts or supersedes another knowledge record, thereby allowing for the representation of updates

Knowledge record collections

  • A knowledge record collection is expressed by linking a collection identifier to the identifiers of the knowledge records it includes, and by representing these links in knowledge records themselves

Knowledge agents

  • Users of the knowledge space are called knowledge agents and include people as well as automated agents
  • Knowledge agents connect to the knowledge space via the use of client software that runs locally under their control

Cryptographic signatures

  • A knowledge record may include a cryptographic signature by the knowledge agent that created it, covering the rest of the knowledge record's content
  • Knowledge records with a signature also include the public key needed to verify the signature
  • A knowledge record contains at most one signature, but additional signatures can be declared in separate knowledge records pointing to the knowledge record to be signed

Knowledge services

  • Knowledge agents can interact with the knowledge space via different kinds of online knowledge services
  • To qualify as such, a knowledge service needs to have several independent sister services with equivalent functionality
  • A lookup service is a kind of knowledge service that returns the full content of a knowledge record for a provided record identifier
  • To qualify as such, a lookup service needs to be free to use for any knowledge agent
  • A query service is a kind of knowledge service that returns data, possibly aggregated, by executing a specified query for the provided input
  • To qualify as such, a query service may only use the published knowledge records and publicly observable behavior of knowledge services as input data to run its query
  • Query services can support queries that target the main statement of knowledge records (e.g. that a certain relation is expressed or a certain entity mentioned), their metadata (e.g. that there is a valid signature or that it was created by a certain knowledge agent), their context (e.g. that it was not retracted or that it was positively assessed by somebody), as well as the behavior of knowledge services (e.g. their responsiveness or execution speed)
  • The results returned by query services can have the form of entire knowledge records, knowledge record identifiers, identifiers of things mentioned in knowledge records, aggregated values, or any other structures derived from the published knowledge records or the behavior of knowledge services
  • A publishing service is a kind of knowledge service that allows knowledge agents to permanently publish knowledge records
  • A knowledge record is considered published once it is available via several independent lookup services in a setting that facilitates further replication by others

Introduction records

  • Relevant entities, including knowledge agents and knowledge services, are introduced to the knowledge space by publishing knowledge records that describe them, which are called introduction records
  • An introduction record of a knowledge service includes the nature of the service, the kind of knowledge records it covers, and the conditions under which knowledge agents may use it
  • An introduction record of a query service includes or refers to a full specification of its query
  • An introduction record for a knowledge agent includes the public keys the knowledge agent uses or used to sign knowledge records
  • The public keys linked to a knowledge agent via its introduction record can be of three types: main key, secondary key, and obsolete key
  • The main key of a knowledge agent is the public key with which the knowledge agent signs its introduction record (and possibly other knowledge records)
  • A secondary key is any public key apart from the main key that the knowledge agent uses to sign some of its knowledge records
  • An obsolete key is any public key that the knowledge agent used in the past but has been compromised since
  • Knowledge records signed with an obsolete key are only considered valid if they have additionally been signed with a main or secondary key by the respective knowledge agent via a separate knowledge record

Assessments

  • Knowledge agents can provide assessments of entities by expressing a qualified link (e.g. a link representing approval) to an introduction record of the assessed entity, and by publishing this link as a knowledge record

Knowledge settings

  • A knowledge setting, described in an introduction record, serves as a starting point for establishing trust by providing references to other trusted entities
  • A knowledge setting refers to collections of introduction records of trusted knowledge agents and knowledge services, specifies a trust range algorithm, declares an update strategy, and directly includes the minimal information needed to access several lookup services for bootstrapping purposes
  • A trust range algorithm specifies which knowledge agents and services, and ultimately which knowledge records, should be considered trustworthy, given a knowledge setting and the published knowledge records
  • An update strategy specifies when and how a knowledge setting can be automatically replaced with a newer version

Process Sketches

Finding trusted knowledge agents and services

  • In order to connect to the knowledge space, a knowledge agent needs a local copy of a trusted knowledge setting (e.g. by receiving it from a person the knowledge agent trusts)
  • Via the locally running client software, the knowledge agent can then access the bootstrap lookup services that are listed in the knowledge setting to retrieve the content of the collections of trusted introduction records, thereby obtaining an initial set of trusted knowledge agents and knowledge services
  • The client can then run the trust range algorithm as specified in the knowledge setting, involving calls to the known query and lookup services, to arrive at a larger and more up-to-date set of trusted entities
  • Running the trust range algorithm typically involves querying for knowledge agents' assessments of other agents and services, calculating some sort of score based on the nature and extent of assessments each potentially trustworthy agent or service has received, establishing some sort of a score threshold to delineate the trustworthy entities, and resolving any conflicts (e.g. when two different knowledge agents claim the same identifier)

Updating or changing the knowledge setting

  • The client software can regularly check via query services whether update candidates for the knowledge setting are available, and whether the specified update strategy allows for automatically replacing the current knowledge setting with the updated one
  • Criteria for the update strategy may include that the updated version is signed by the same knowledge agent, that it has received a sufficient number of positive assessments by trusted knowledge agents, and that a certain amount of time has elapsed since the update was first seen by the client
  • The knowledge agent can at any point manually switch to a different knowledge setting, override any automatic update that has happened, or define a new knowledge setting from scratch

Retrieving knowledge records

  • Given a knowledge record identifier, lookup services can be asked to provide the corresponding content, and due to the content-based nature of the identifier, the retrieved content can be automatically checked (and another lookup service can be tried if this check fails)
  • If the knowledge record is digitally signed, this digital signature can also be automatically checked by the client, and the knowledge record can be treated as invalid if the check fails
  • In the case of knowledge record collections, which are published as knowledge records too, this process can be recursively repeated to get the complete content of entire sets of knowledge records

Querying knowledge

  • To query the knowledge space, a knowledge agent can use its client software to contact a trusted knowledge service by sending a query of a form that is supported by this service
  • A knowledge agent may decide to probe the correctness of received results by retrieving a sample of the respective knowledge records via lookup services, and then checking locally whether the query was correctly applied
  • A knowledge agent may decide to probe the correctness and completeness of received results by checking for discrepancies when querying other equivalent or related query services

Publishing knowledge records

  • To publish a new knowledge record, a knowledge agent can use its client software to send the content of the knowledge record to a publishing service
  • A knowledge agent may decide to probe the success of a publication request by using its client software to check whether the knowledge record is returned by several independent lookup and query services

Assessing knowledge services

  • When a knowledge agent has found evidence in favor or against the integrity or quality of a knowledge service, it can publish this finding as an assessment in a knowledge record so others can take it into account

Recovering from compromised private key

  • If a main or secondary key of a knowledge agent has been compromised in the sense that a third party got access to the corresponding private key (and possibly made it unavailable to its legitimate owner), then the affected knowledge agent can recover from it by publishing a new introduction record that declares the compromised key as obsolete and announces a new key, by publishing knowledge records that re-sign with a valid key all legitimate knowledge records previously published with the compromised key, and by convincing other trusted agents to disapprove of its introduction record that includes the compromised key and to approve of the new one

Full Picture

Concrete Examples

Concrete but hypothetical examples are given here in a notation that is deliberately disconnected from the currently existing implementations, in order to emphasize the conceptual core of the approach. Currently existing implementations are discussed afterwards.

Examples of statements

Statements in the examples shown here are written in a predicate logic notation like PREDICATE(ARG1, ARG2, ...). We restrict ourselves here to such atomic formulas and conjunctions thereof. It is easy to see how the expressivity can be arbitrarily expanded by introducing predicates like subclass-of(A, B) or implies(C, D) and agreeing on their semantics (where the latter is aided by technology but is ultimately a social process). Predicate names and constants consist of names with namespaces of the form NAMESPACE/NAME, allowing knowledge agents to define new namespaces when needed.

Constants can also be strings like "some text here". Such strings are formally treated as plain logical constants, but they may contain further informal knowledge for human readers.

A set of atomic formulas conjoint by the logical and are written next to each other on separate lines:

general/related-to(disease/Alzheimers, gene/APOE)
general/related-to(disease/Alzheimers, gene/PSEN1)

Such groups of formulas can be given a label in the form of a hash calculated on their sequence of symbols:

5C2C20:
  general/related-to(disease/Alzheimers, gene/APOE)
  general/related-to(disease/Alzheimers, gene/PSEN1)

For the sake of readability of these examples, fake hashes with just six hexadecimal digits like 5C2C20 are shown here. In reality, these need to be hashes long enough to be considered secure and therefore not vulnerable to collisions. These hashes also serve as logical constants and can therefore appear in argument positions of predicates. They can also be used as namespaces, as in 1CBF40/thing.

Formula groups can be nested, and indentation is used to clarify the nesting structure:

5F3EA4:
  5C2C20:
    general/related-to(disease/Alzheimers, gene/APOE)
    general/related-to(disease/Alzheimers, gene/PSEN1)
  E88197:
    prov/creator(5C2C20, agents/john-doe)

Public keys and signatures are treated and shown similarly to hashes:

sec/has-sig-for-pubkey(5F3EA4, F2E847, 075DF5)

Here, 5F3EA4 is the formula group being signed, F2E847 is the signature, and 075DF5 is the public key.

Hashes can be representing statements containing the hash itself, and signatures can cover statements containing the signature. This can be achieved with a simple trick of introducing an otherwise unused symbol representing the hash to be calculated, another symbol representing the signature to be calculated, and by performing the respective replacement operations during calculation and verification of hash or signature, respectively.

Examples of knowledge records

This is an example of a knowledge record in the domain of diseases and genes:

66C05D:
  234F0C:
    general/related-to(disease/Alzheimers, gene/APOE)
  prov/creator(234F0C, agents/john-doe)
  general/has-date(66C05D, date/20210618-112311)

This is an example of a knowledge record collection (here and below we are showing only minimal metadata and are often omitting the signature for the sake of brevity of these examples):

E3448C:
  8BE0E8:
    collection/has-element(E3448C, 66C05D)
    collection/has-element(E3448C, 22EF2B)
    collection/has-element(E3448C, 0B268E)
  prov/creator(8BE0E8, agents/john-doe)

We are here using the knowledge record identifier E3448C also as the identifier for the collection. To compose collections out of other collections and to break down large collections into small records, we can use sub-collections:

21D64B:
  569B89:
    collection/has-all-elements-of(21D64B, E3448C)
    collection/has-element(21D64B, F0EC38)
  prov/creator(569B89, agents/john-doe)

Published knowledge records cannot be deleted, but only retracted. Retraction happens by publishing another knowledge record stating that the agent retracts the previous record:

2DCBA3:
  E12BBB:
    general/retracts(agents/john-doe, 66C05D)
  prov/creator(E12BBB, agents/john-doe)

A published knowledge record can be updated by publishing a new one that declares to supersede the previous one:

8AD9DC:
  3BF4C8:
    general/related-to(disease/Alzheimers, gene/APOE2)
  prov/creator(3BF4C8, agents/john-doe)
  sec/has-sig-for-pubkey(8AD9DC, CB0CA6, 075DF5)
  general/supersedes(8AD9DC, 66C05D)

Examples of knowledge agents

Knowledge agents can use their own public/private key pair to digitally sign knowledge records:

0C9BBD:
  9977B3:
    general/related-to(disease/Alzheimers, gene/APOE)
  prov/creator(9977B3, agents/john-doe)
  sec/has-sig-for-pubkey(0C9BBD, 574E6A, 075DF5)

Before signing and publishing such knowledge records, knowledge agents should introduce themselves to the knowledge space by publishing an introduction record:

C69CD0:
  E12BBB:
    agent/is-person(agents/john-doe)
    sec/has-main-pubkey(agents/john-doe, 075DF5)
  prov/creator(E12BBB, agents/john-doe)
  general/introduces(C69CD0, agents/john-doe)
  sec/has-sig-for-pubkey(C69CD0, C4B48A, 075DF5)

Knowledge agents can also be bots, i.e. automated computational agents:

C81863:
  E71019:
    agent/is-bot(agents/sai-bot)
    agent/controls(agents/john-doe, agents/sai-bot)
    sec/has-main-pubkey(agents/sai-bot, 1B0B3C)
  prov/creator(E71019, agents/john-doe)
  general/introduces(C81863, agents/sai-bot)
  sec/has-sig-for-pubkey(C81863, B3E959, 1B0B3C)

Knowledge records can only be directly signed with one key pair, but additional signatures can be linked with separate records:

47A8C9:
  520268:
    general/signs(agents/jane-smith, 0C9BBD)
  prov/creator(520268, agents/jane-smith)
  sec/has-sig-for-pubkey(47A8C9, 5AD51E, 35DFD8)

To sign a larger number of knowledge records, they can be grouped in a collection (e.g. E3448C) and signed collectively:

D4DEE3:
  12A93A:
    general/signs-all(agents/jane-smith, E3448C)
  prov/creator(12A93A, agents/jane-smith)
  sec/has-sig-for-pubkey(D4DEE3, AE3275, 35DFD8)

Examples of knowledge services

We assume here that network addresses are represented as logical constants. Therefore some constants, such as the constants representing knowledge services, can be interpreted as places in the network where requests can be sent to and responses can be received from. We use here this simple notation to show examples of requests sent to a network address with an example of a response:

  REQUEST
>> ADDRESS >>
  RESPONSE

We assume here that request and response are logical constants or statements (possibly conjoint and/or nested).

This is an example of an invocation of a lookup service:

  0C9BBD
>> service/alpha-lookup >>
  0C9BBD:
    9977B3:
      general/related-to(disease/Alzheimers, gene/APOE)
    prov/creator(9977B3, agents/john-doe)
    sec/has-sig-for-pubkey(0C9BBD, 574E6A, 075DF5)

This is an example of a query service:

  general/related-to(disease/Alzheimers, var/x)
>> service/beta-query >>
  result/match-in(var/x, gene/APOE, 0C9BBD)
  result/match-in(var/x, gene/PSEN1, 329E88)

This is an example of a publishing service:

  0C9BBD:
    9977B3:
      general/related-to(disease/Alzheimers, gene/APOE)
    prov/creator(9977B3, agents/john-doe)
    sec/has-sig-for-pubkey(0C9BBD, 574E6A, 075DF5)
>> service/gamma-publish >>
  status/is-published-at(0C9BBD, service/alpha-lookup)
  status/is-published-at(0C9BBD, service/yellow-lookup)
  status/is-published-at(0C9BBD, service/vanilla-lookup)

Examples of assessments

This is an assessment in the form of an approval of another knowledge record:

ECA7D7:
  8DECE4:
    general/approves-of(agents/john-doe, 95784D)
  prov/creator(8DECE4, agents/john-doe)

Approval can be seen as the simplest kind of positive assessment, but more detailed assessments are possible with more nuanced relations. A simple negative assessments can look as follows:

6D0C40:
  EEE9FF:
    general/disapproves-of(agents/jane-smith, 95784D)
  prov/creator(EEE9FF, agents/jane-smith)

Example of a knowledge setting

This is an example of a knowledge setting, A2F7B3/ks, introduced in the introduction record A2F7B3:

A2F7B3:
  346797:
    setting/has-trusted-agent-collection(A2F7B3/ks, EC02B0)
    setting/has-trusted-service-collection(A2F7B3/ks, 24385D)
    setting/has-trustrange-algorithm(A2F7B3/ks, setting/basic-tr-algorithm)
    setting/has-update-strategy(A2F7B3/ks, setting/basic-update-strategy)
    setting/has-bootstrap-lookup-service(A2F7B3/ks, service/alpha-lookup)
    setting/has-bootstrap-lookup-service(A2F7B3/ks, service/yellow-lookup)
    setting/has-bootstrap-lookup-service(A2F7B3/ks, service/vanilla-lookup)
  prov/creator(346797, agents/john-doe)
  general/introduces(A2F7B3, A2F7B3/ks)
  sec/has-sig-for-pubkey(A2F7B3, 574E6A, FF1A89)

Example of a trust range algorithm

Like everything in the knowledge space, a trust range algorithm is introduced with an introduction record:

AE952A:
  FAA335:
    setting/is-tr-algorithm(setting/basic-tr-algorithm)
    setting/has-definition(setting/basic-tr-algorithm,
      "Create a set T with all the knowledge agents and services from the given initial trusted
       knowledge records. Then execute these steps:
       1. For every agent in T, query the knowledge space to find all valid knowledge records
          where it expresses approval or disapproval of knowledge services or agents, and add
          these (dis)approval relations to a new set A. Try two suitable sister services in T
          for every query, and use the union of their responses.
       2. For every entity in T, if the number of received disapprovals in A exceeds approvals,
          remove it from T. For every entity not in T, if approvals exceed disapprovals by two
          or more, add it to T.
       3. For all agents in T that have a shared public key (main/secondary/obsolete), keep
          only the one with the most net-approvals in A and remove the others (if tied, remove
          them all). Similarly, remove extra entities in T that share the same identifier.
       All final elements in T are considered trusted entities. Knowledge records are
       considered trusted if and only if they are signed by a trusted knowledge agent or the
       number of trusted knowledge agents who expressed their approval exceeds the ones that
       expressed their disapproval.
      "
    )
  prov/creator(FAA335, agents/john-doe)
  general/introduces(AE952A, setting/basic-tr-algorithm)

With valid knowledge records, we mean here those with a valid signature, and for which no retraction or superseding knowledge record signed with the same public key is published.

The actual algorithm is here defined in a string, so a human has to implement this in client software before this algorithm is recognized and can be used. It is easy to imagine, however, how such algorithms can be parametrized or even fully specified in logic, and therefore allow for the definition of new algorithms without the need to change the client implementation.

Example of an update strategy

Update strategies are of course published as knowledge records too:

1861EA:
  370EC5:
    setting/is-update-strategy(setting/basic-update-strategy)
    setting/has-definition(setting/basic-update-strategy,
      "This basic update strategy works as follows:
       - Check once a day whether settings have been published that claim to be updates of the
         current one. Check for each of them how many valid assessments exist by a trusted
         knowledge agent.
       - Out of the update candidates that the client software has first seen more than 2 days
         ago, if exactly one of them has at least two net-approvals then replace the current
         setting with it. 
      "
    )
  prov/creator(370EC5, agents/john-doe)
  general/introduces(1861EA, setting/basic-update-strategy)

Example of a process to determine trusted entities

To automatically determine which entities are to be trusted in the knowledge space, we only need a local copy of an introduction record describing a trusted knowledge setting and the locally running client software. For this example, we take the knowledge setting in example A2F7B3 above. The client software can find three bootstrap lookup services in it (service/alpha-lookup, service/yellow-lookup, and service/vanilla-lookup), which it can use to retrieve knowledge records. It can also find the identifier of a collection of trusted agents: EC02B0. The client software can now request the content of this record from one of the bootstrap lookup services:

  EC02B0
>> service/alpha-lookup >>
  B04A1A:
    EF3AAF:
      collection/has-element(B04A1A, C69CD0)
      collection/has-element(B04A1A, 562EDA)
      collection/has-element(B04A1A, 32C073)
      collection/has-element(B04A1A, E711E9)
      collection/has-element(B04A1A, 4C549F)
    prov/creator(EF3AAF, agents/sue-smith)

The client software first checks whether the content matches the content-based identifier (and if not discards the result and re-tries the request).

The client can now request the content of the introduction records of this collection one by one:

  C69CD0
>> service/yellow-lookup >>
  C69CD0:
    E12BBB:
      agent/is-person(agents/john-doe)
      sec/has-main-pubkey(agents/john-doe, 075DF5)
    prov/creator(E12BBB, agents/john-doe)
    general/introduces(C69CD0, agents/john-doe)
    sec/has-sig-for-pubkey(C69CD0, C4B48A, 075DF5)

  562EDA
>> service/vanilla-lookup >>
  562EDA:
    5911F8:
      agent/is-person(agents/kate-brown)
      sec/has-main-pubkey(agents/kate-brown, 3DEFFF)
    prov/creator(5911F8, agents/kate-brown)
    general/introduces(562EDA, agents/kate-brown)
    sec/has-sig-for-pubkey(562EDA, 915BE0, 3DEFFF)

...

The client then repeats the same process for the trusted service collection to get introduction records of knowledge services.

Next, the client checks the trust range algorithm, which is in our case setting/basic-tr-algorithm. We can assume here that the client recognizes this algorithm and has a module to execute it as specified in the introduction records AE952A above (if not, it would not be able to use this knowledge setting).

The client starts by creating an initial set of trusted agents and services T, which in this case could look as follows:

T = {
  agent-id-main-pubkey(agents/john-doe, 075DF5),
  agent-id-main-pubkey(agents/kate-brown, 5911F8),
  agent-id-main-pubkey(agents/sue-smith, 682EE1),
  agent-id-main-pubkey(agents/robin-lee, C5ABF5),
  agent-id-main-pubkey(agents/rob-jones, E51A43),
  service-id-type(service/alpha-lookup, servicetype/lookup),
  service-id-type(service/yellow-lookup, servicetype/lookup),
  service-id-type(service/vanilla-lookup, servicetype/lookup),
  service-id-type(service/lion-query, servicetype/basic-query),
  service-id-type(service/hippo-query, servicetype/basic-query),
  service-id-type(service/zebra-query, servicetype/basic-query),
  service-id-type(service/foo-publish, servicetype/publish),
  service-id-type(service/bar-publish, servicetype/publish)
}

Agents and services are here treated as tuples of their identifying information (identifier plus main public key for agents, and identifier plus type for services). Them being stored in a set, multiple occurrences of identical identifying information are treated as a single element.

For each agent in this set, the client queries for the approvals and disapprovals it has published in order to execute step 1 of the algorithm:

  basic-query/get-approval-query(agents/john-doe, 075DF5)
>> service/lion-query >>
  general/approves-of-agent(agents/john-doe, 075DF5, agents/kate-brown, 5911F8)
  general/approves-of-agent(agents/john-doe, 075DF5, agents/alma-gomez, B36CC2)
  general/disapproves-of-agent(agents/john-doe, 075DF5, agents/ray-rich, 2F7EA0)
  general/approves-of-service(agents/john-doe, 075DF5, service/alpha-lookup, servicetype/lookup)
  general/approves-of-service(agents/john-doe, 075DF5, service/kudu-query, servicetype/basic-query)
  general/disapproves-of-service(agents/john-doe, 075DF5, service/vulture-query, servicetype/basic-query)
  ...

The client runs each of these queries on another service of the same type, e.g. service/hippo-query, and creates the union of their results. Finally, the results of all these queries are merged into a set A of (dis)approvals:

A = {
  general/approves-of-agent(agents/john-doe, 075DF5, agents/kate-brown, 5911F8),
  general/approves-of-agent(agents/kate-brown, 075DF5, agents/bill-taylor, 78F067),
  general/disapproves-of-agent(agents/sue-smith, 682EE1, agents/ray-rich, 2F7EA0),
  general/approves-of-service(agents/robin-lee, C5ABF5, service/hippo-query, servicetype/basic-query),
  general/disapproves-of-service(agents/rob-jones, E51A43, service/zebra-query, servicetype/basic-query),
  ...
}

Next, the client aggregates A to tally the (dis)approvals of the entities. This could look as follows:

a  d  T  entity
-------------------------------------------------------------------------
2  0  X  agent-id-main-pubkey(agents/john-doe, 075DF5)
2  1  X  agent-id-main-pubkey(agents/kate-brown, 5911F8)
3  0  X  agent-id-main-pubkey(agents/sue-smith, 682EE1)
3  1  X  agent-id-main-pubkey(agents/robin-lee, C5ABF5)
1  3  X  agent-id-main-pubkey(agents/rob-jones, E51A43)
2  0     agent-id-main-pubkey(agents/bill-taylor, 78F067)
1  1     agent-id-main-pubkey(agents/ray-rich, 2F7EA0)
3  1     agent-id-main-pubkey(agents/rob-jones, 871524)
2  0     agent-id-main-pubkey(agents/sue-smith, 5A7171)
1  0  X  service-id-type(service/alpha-lookup, servicetype/lookup)
0  0  X  service-id-type(service/yellow-lookup, servicetype/lookup)
1  1  X  service-id-type(service/vanilla-lookup, servicetype/lookup)
3  0  X  service-id-type(service/lion-query, servicetype/basic-query)
1  0  X  service-id-type(service/hippo-query, servicetype/basic-query)
1  2  X  service-id-type(service/zebra-query, servicetype/basic-query)
3  1  X  service-id-type(service/foo-publish, servicetype/publish)
2  0  X  service-id-type(service/bar-publish, servicetype/publish)
2  0     service-id-type(service/kudu-query, servicetype/basic-query)
1  1     service-id-type(service/vulture-query, servicetype/basic-query)

For step 2, the client checks whether the number of approvals a minus the number of disapprovals d is high enough for inclusion in the final set of trusted entities. The threshold differs depending on whether the entity is in the current version of T or not (as indicated in the third column). For entities in T, if the net approval count is negative, they are removed from T. This applies here to these two entities, which are consequently removed from T:

agent-id-main-pubkey(agents/rob-jones, E51A43)
service-id-type(service/zebra-query, servicetype/basic-query)

For entities not in T, if net approval is at least two, they are added to T. This is here the case for the following entities, which are therefore added to T:

agent-id-main-pubkey(agents/rob-jones, 871524)
agent-id-main-pubkey(agents/sue-smith, 5A7171)
service-id-type(service/kudu-query, servicetype/basic-query)

T looks now as follows, with the new entries shown at the bottom:

T = {
  agent-id-main-pubkey(agents/john-doe, 075DF5),
  agent-id-main-pubkey(agents/kate-brown, 5911F8),
  agent-id-main-pubkey(agents/sue-smith, 682EE1),
  agent-id-main-pubkey(agents/robin-lee, C5ABF5),
  service-id-type(service/alpha-lookup, servicetype/lookup),
  service-id-type(service/yellow-lookup, servicetype/lookup),
  service-id-type(service/vanilla-lookup, servicetype/lookup),
  service-id-type(service/lion-query, servicetype/basic-query),
  service-id-type(service/hippo-query, servicetype/basic-query),
  service-id-type(service/foo-publish, servicetype/publish),
  service-id-type(service/bar-publish, servicetype/publish),
  agent-id-main-pubkey(agents/rob-jones, 871524),
  agent-id-main-pubkey(agents/sue-smith, 5A7171),
  service-id-type(service/kudu-query, servicetype/basic-query)
}

Note that agents/rob-jones now has a new public key assigned, because the previous entry was removed and a new one added. As we will show in the example below, this can be the result of him successfully recovering from somebody else compromising his public key.

In the above version of T, two different agents claim the identity of agents/sue-smith assigning it two different public keys. This is resolved in step 3. In such cases of identifier or public key collisions, only the one with more net approvals is kept, which is in this case the one with public key 682EE1. This could have been, for example, the result of somebody wrongfully and unsuccessfully trying to steal the established identity of this agent. Such collisions can also happen with secondary and obsolete keys, with the same consequence, but for simplicity we only show the main keys here.

After having resolved this collision, we end up with our final set of trusted entities, consisting of five agents and eight services.

Example of a process of recovering from compromised private key

As an example of a process to recover from a compromised private key, let us have a closer look at agent agents/rob-jones above (let us call him Rob). This shows his introduction record:

4C549F:
  53DD82:
    agent/is-person(agents/rob-jones)
    sec/has-main-pubkey(agents/rob-jones, E51A43)
    sec/has-secondary-pubkey(agents/rob-jones, 95063D)
  prov/creator(53DD82, agents/rob-jones)
  general/introduces(4C549F, agents/rob-jones)
  sec/has-sig-for-pubkey(4C549F, 4D4EB7, E51A43)

Rob declares his main key E51A43 and a secondary key 95063D. Now, let us assume the worst-case scenario of a malicious third party managing to get access to the private key of his main public key E51A43, and moreover making this private key inaccessible to Rob. The challenge is now to re-establish the identity of agents/rob-jones with non-compromised keys.

There are slightly less serious variations of this worst-case scenario, e.g. if a secondary key is affected or the private key is still accessible to its original owner, which are a bit easier to recover from, but for simplicity, we will only cover the worst-case scenario here.

To recover from this, first Rob has to create a new key as a replacement (871524) and to publish a new introduction record where the compromised key is labeled as obsolete:

D3258B:
  3B813A:
    agent/is-person(agents/rob-jones)
    sec/has-main-pubkey(agents/rob-jones, 871524)
    sec/has-secondary-pubkey(agents/rob-jones, 95063D)
    sec/has-obsolete-pubkey(agents/rob-jones, E51A43)
  prov/creator(3B813A, agents/rob-jones)
  general/introduces(D3258B, agents/rob-jones)
  sec/has-sig-for-pubkey(D3258B, E83246, 871524)

Next, Rob has to find all his legitimate knowledge records previously published and signed with the obsolete key. Because the malicious attacker can also use this key and it can therefore no longer be trusted, these knowledge records have to be re-signed with one of the valid keys. Rob therefore comes up with a (potentially large) set of knowledge records to be re-signed (here FE9F55, 39912B, 6C3DC6, ...) and publishes this set as a collection:

2DFD66:
  40D532:
    collection/has-element(2DFD66, FE9F55)
    collection/has-element(2DFD66, 39912B)
    collection/has-element(2DFD66, 6C3DC6)
    ...
  prov/creator(40D532, agents/rob-jones)

This allows Rob now to re-sign all these knowledge records with his new key in a single knowledge record (he could also re-sign them individually):

BDA007:
  32D52B:
    general/signs-all(agents/rob-jones, 2DFD66)
  prov/creator(32D52B, agents/rob-jones)
  sec/has-sig-for-pubkey(BDA007, 94DE0D, 871524)

Given Rob's new introduction record, everything seems fixed, but so far other knowledge agents have no reason to trust it more than the previous one. In fact, it is likely that they will see the new record as an intruder and the old one as the legitimate one.

For Rob to fully recover his identity, he now has to use his connections to convince other trusted knowledge agents to disapprove of the old introduction record and approving of the new one. The part of convincing other agents is a social process, happening for the most part outside of the knowledge space.

After a few trusted knowledge agents have been convinced, this identity clash can show up in automatically created lists of contested identities (which can be implemented as a knowledge service) and this can trigger further attention by other trusted knowledge agents, who might do some further investigation and then contribute to the approval of the new identity.

Once a sufficient number of new approvals have been accumulated (and disapprovals for the old record), trust range algorithms will start selecting the new introduction record as trustworthy and discard the old one. At this point, Rob has succeeded in re-establishing his identity, with a new public key but with the same identifier.

Discussion

Trust Range Algorithms

The trust range algorithm described above is just an example. It lets you build a reasonable level of trust but it can be improved in various ways. For example, knowledge agents who are removed from T because they are deemed untrustworthy still have their vote counted when the algorithm decides which other entities should be deemed trustworthy. Moreover, the algorithm doesn't correct for the fact that knowledge services deemed untrustworthy and removed from T might have been used in earlier steps. Another shortcoming of this particular algorithm is that approvals from the initial trusted agents are needed for an entity to end up in the final set of trusted entities; second-degree approvals are not counted.

Despite these shortcomings, it is difficult for malicious agents to manipulate the system. If such an agent manages to get included in the initial collection of trusted entities, it gets just one vote and is easily outnumbered by the other members. As long as the non-malicious agents hold a solid majority allowing them to get two net-approvals against the malicious ones, the update strategy shown above allows them to fix any such problem and exclude malicious agents with a new update. Getting a malicious service included also only helps if there are at least two of them, in which case with a bit of luck they are both chosen by a client and they can then return manipulated results. But this can also be fixed with a simple update of the knowledge setting.

It is quite easy to imagine algorithms that handle the problems above in a better way, and therefore these are not inherent problems of the knowledge space. They can be solved by defining and implementing more advanced algorithms. Different situations and different problems might require different algorithms anyway. In the knowledge space, such algorithms can co-exist and compete in an open ecosystem.

While such an algorithm can be arbitrarily complex, there is no such thing as a perfect trust range algorithm. This is because one cannot be perfectly sure about anything, ever, no matter what system one is using. It is all about levels of trust.

Knowledge Settings

A knowledge agent's knowledge setting defines what will be shown as trustworthy to the agent by its client software and what will be shown as non-trustworthy or not shown at all. It can like that be perceived as something similar to a filter bubble, but with two important differences. First, it is fully transparent what is happening and why certain things are shown and others not. Second, the knowledge setting can be freely changed by the knowledge agent to check out other respective perspectives. A knowledge agent can even define its own knowledge setting.

Knowledge settings can have a narrow focus, for example by including respected members of a given scientific field, or they can have a broader focus, such as including respected scientists of all kinds of disciplines and other kinds of trusted public figures. These two are not in conflict, as they simply provide different perspectives that the knowledge agents can choose and switch at any moment.

Anybody can define knowledge settings but nobody is forced to use them. One can imagine that big international bodies, such as the UN or the European Commission, could publish such knowledge settings in the future, and they might provide a good default choice for knowledge agents to use.

Openness and Decentralization

The knowledge space is open for everybody to access. By their definition, lookup services are free to use, accessible to everybody, and available in the form of several independent and redundant instances. Query services on the other hand are not required to be free and open, as they can be arbitrarily complex and therefore also costly to run. However, by their definition, query services only work on data that has been published to the knowledge space and therefore is available via the free and open lookup services. Every query service is therefore open for competitors that fetch the same data and run the same query. Market forces can therefore make sure that all knowledge services are provided at a fair price, and we can assume that free instances will be available for the simpler kinds of queries.

Interpretations

For the knowledge space to work in the ways outlined above, there needs to be an agreement on the interpretation of some core predicates like general/supersedes, general/approves-of, and sec/has-sig-for-pubkey. Further predicates and namespaces can be introduced by the knowledge agents as needed, by publishing corresponding introduction records. These introduction records can be conflicting, however, when several agents introduce the same predicates in incompatible ways. The knowledge space does not provide a fixed definition of how this is resolved, but provides at least three techniques that allow knowledge agents to find agreement. First, newly minted predicates can use the identifier of their introduction record as namespace, e.g. 7BA3B2/has-property-x, which thereby reliably and unambiguously links to the definition stated in its introduction record. Second, as identifiers can also act as network locations, an identifier's corresponding network location can be set up in such a way that it returns the authoritative introduction record when asked to, thereby delegating the authority question to the networking layer. Third, agreement about an interpretation can be found collectively by publishing approving and disapproving assessments of the respective introduction records, as it is done by trust range algorithms for knowledge agents and knowledge services. These techniques can be applied for single predicates but also for entire namespaces.

Current Implementations

The knowledge space as a whole is still a vision, but most aspects already have partial or full implementations.

Identifiers with namespaces of the kind required by the knowledge space are implemented with IRIs/URIs. With HTTP(S), such identifiers can also serve as network locations, as required for knowledge services. The inclusion of hash values for content-based identifiers can be achieved with Trusty URIs.

The Resource Description Framework (RDF) is an implementation of the logic language that is used to write statements that can then be published as knowledge records.

Nanopublications can be seen as an implementation of knowledge records. Knowledge record collections are implemented with nanopublication indexes.

Nanopub Server is an implementation of both, a lookup service as well as a publishing service. The nanopub API is an implementation of a query service. Nanobench is an implementation of a client software to assist knowledge agents to access the knowledge space.

License

This text is available under the CC BY 4.0 license. The images were created with Excalidraw, using several of its libraries of visual elements, and are available under the MIT license.

@lyubomirpenev
Copy link

lyubomirpenev commented Nov 18, 2021

Knowledge records
Statements in the knowledge space are expressed and communicated in small knowledge records, making each record individually reusable and referenceable
Apart from its main statements, each knowledge record also includes relevant metadata, including information about who created the knowledge record ....

[ADD PERHAPS: ...and where from / how the statement has been created.

@lyubomirpenev
Copy link

lyubomirpenev commented Nov 18, 2021

"Knowledge record" seems a bit too wide in meaning to me. Any record of knowledge (e.g. observation) can be easily termed "knowledge record" without being a formal, machine-readable piece of knowledge. Why deviate from the term "nanopublication"?

@tkuhn
Copy link
Author

tkuhn commented Nov 19, 2021

Thank you for your comments, @lyubomirpenev. Much appreciated.

I agree with your first point on adding a requirement on information about how the statement came about. I wanted to keep it minimal, but I agree that this is important. I rephrased and expanded it as follows:

... including information about the creator of the knowledge record and the source of the main statement

I might have to expand the examples a bit too, in order to reflect that...

With respect to your doubts about the term "knowledge record", I understand that the intuitive reading would be broader than how I define it here, but that's almost always the case when defining technical terms. The sentences above about "knowledge records" are meant as technical definitions, and as such the intuitive reading of the term does not apply.

I did not want to use the term "nanopublication" here, because I wanted to keep it general and agnostic of any specific implementation or format, a bit like the FAIR principles do it. While nanopublications were initially also defined in a general manner, they have since become associated with a particular structure and format (RDF + graphs). The FAIR principles don't mention RDF, not because there exists at the moment a better language for the task, but because we don't want to commit to it prematurely on the long run. I am applying the same considerations here, at the risk of some occasional confusion about the differences between "knowledge records" and "nanopublications".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment