Skip to content

Instantly share code, notes, and snippets.

@kirlat
Last active January 13, 2021 08:11
Show Gist options
  • Save kirlat/5c36baaf26e3ea399bfe36d0a354c7b1 to your computer and use it in GitHub Desktop.
Save kirlat/5c36baaf26e3ea399bfe36d0a354c7b1 to your computer and use it in GitHub Desktop.
Annotation GraphQL types
# Custom scalars and enums
scalar Date
scalar Language
scalar URI
enum MIMEType {
TEXT_PLAIN
TEXT_HTML
}
enum FeatureType {
WORD
FULL_FORM
HDWD
PART
NUMBER
CASE
GRMCASE
DECLENSION
GENDER
TYPE
CLASS
GRMCLASS
CONJUGATION
COMPARISON
TENSE
VOICE
MOOD
PERSON
FREQUENCY
MEANING
SOURCE
FOOTNOTE
DIALECT
NOTE
PRONUNCIATION
AGE
AREA
GEO
KIND
DERIVTYPE
STEMTYPE
MORPH
VAR
RADICAL
KAYLO
STATE
}
# Output types
type FeatureValue {
value: String!
}
type Feature {
id: ID!
schema: String!
type: FeatureType!
value: [FeatureValue!]!
}
type User {
id: ID!
nickname: String
}
type ResourceProvider {
uri: URI!,
description: String!,
right: String!
}
type Comment {
id: ID!
text: String!
language: Language!
dateTime: Date!
author: User!
replies: [Comment!]
}
type Assertion {
id: ID!
confidence: Int!
dateTime: Date!
author: User!
}
type Negation {
id: ID!
confidence: Int!
dateTime: Date!
author: User!
}
type Definition {
id: ID!
text: String!
language: Language!
format: MIMEType!
lemma: Lemma! # The lemma this is defined by the definition text
comments: [Comment!]
}
type DefinitionConnection {
id: ID!
definitions: [Definition!]
assertions: [Assertion!]
negations: [Negation!]
comments: [Comment!]
}
type DefinitionSet {
id: ID!
lemmaWord: String!
language: Language!
shortDefs: [DefinitionConnection!]
fullDefs: [DefinitionConnection!]
}
type Inflection {
id: ID!
language: Language!
stem: String
prefix: String
suffix: String
features: [String!]
example: String
comments: [Comment!]
}
type Lemma {
id: ID!
word: String!
language: Language!
prinicipalParts: [String!]
variants: [Lemma!] # Alternative versions of the lemma
partOfSpeech: Feature!
features: [Feature!] # Part of speech will not be included into the features list
comments: [Comment!]
}
type Lexeme {
id: ID!
lemma: Lemma!
inflections: [Inflection!]
meaning: DefinitionSet! # If there are no definitiions, the DefinitionSet will be empty
comments: [Comment!]
}
type Word {
id: ID!
targetWord: String!
lexemes: [Lexeme!]
}
# Input types
input AssertionInput {
confidence: Int!
dateTime: Date!
authorID: ID!
}
input NegationInput {
confidence: Int!
dateTime: Date!
authorID: ID!
}
input CommentInput {
text: String!
language: Language!
dateTime: Date!
authorID: ID!
}
input CommentReplyInput {
commentID: ID!
comment: CommentInput
}
input LexemeCommentInput {
lexemeID: ID!
comment: CommentInput
}
input DefinitionCommentInput {
definitionID: ID!
comment: CommentInput
}
input DefinitionConnectionCommentInput {
definitionConnectionID: ID!
comment: CommentInput
}
input DefinitionAssertionInput {
definitionConnectionID: ID!
assertion: AssertionInput
}
input DefinitionNegationInput {
definitionConnectionID: ID!
negation: NegationInput
}
# Mutations
type Mutation {
# Creates a new comment and attaches it to the specified lexeme
commentOnLexeme(input: LexemeCommentInput) : Comment
# Creates a new assertion and attaches it to the specified definitionConnection
assertDefinitionConnection(input: DefinitionAssertionInput) : Assertion
# Creates a new negation and attaches it to the specified definitionConnection
negateDefinitionConnection(input: DefinitionNegationInput) : Negation
# Creates a new comment and attaches it to the specified definitionConnection
commentOnDefinitionConnection(input: DefinitionConnectionCommentInput) : Comment
# Creates a new comment and attaches it to the specified definition
commentOnDefinition(input: DefinitionCommentInput) : Comment
# Creates a new reply to the existing comment
replyToComment(input: CommentReplyInput) : Comment
}
@balmas
Copy link

balmas commented Jan 8, 2021

some initial thoughts:

for ResourceProvider, I'm not sure there is any difference between ID and URI. The URI is the unique identifier of the provider. In addition, I think the ResourceProvider needs to be part of the graph of the objects it provides. E.g. so a Definition would have a ResourceProvider, as would Inflection, etc. etc.

for Feature, I'm not sure about SortOrder -- I don't think it should be required. I think I would also like to have a property on Feature which we can use to specify the ontology or schema the feature belongs to. The Feature names/values that we use now adhere to the AlpheiosLexicon schema, but we are going to need to be able to map to and/or support other ontologies (such as the Universal Dependencies tagset) so we should be explicit about that.

For Word, we need Language, and also the ability to optionally include context (prefix, suffix, source).

For Lexeme, I dont know if the altLemmas belongs in the Lexeme graph. We need to be able to be explicit about the isLemmaVariant relationship between lemmas. So maybe this actually belongs in the Lemma Graph? Also, I think the meaning (DefinitionSet) needs to be optional on a Lexeme because we might not have it in all cases (right now we often have that scenario, when we can't find the definition for a lemma)

For Lemma, I am wondering if we need to pull part of speech out of features. a Lemma needs at a minimum the Part of Speech feature and may have other optional features.

I'm not sure about including the InflectionConstraints .. those are mostly used for matching for purposes of the InflectionTables and I'm not sure they belong here.

For Definition, we probably need both lemmaLanguage and definitionLanguage

For Inflection, we should have a required Form: String property. Up until now we have been overloading the stem property, using it to hold the form when we can't identify the stem. Sometimes the stem and the form are one and the same when there isn't a suffix or prefix but sometimes that's not what is meant.

For the Annotation, I'm not sure what we would have in the text field for the AnnotationTypes identified here so far. If it's an assertion of the validity of a Definition as belonging to a Lexeme it's just that. And vice-versa for the negation. The Comment is where someone would supply commentary text.

In addition for Annotation we're going to need a property for Confidence. I think it can be a Number.

Also, I know we are focused on the Definition use case at the moment, but just a note that probably we are going to need a similar structure for Inflections (so that we can have annotations which assert or negate the relationship between a Lexeme and an Inflection, etc.)

@kirlat
Copy link
Author

kirlat commented Jan 11, 2021

for ResourceProvider, I'm not sure there is any difference between ID and URI. The URI is the unique identifier of the provider. In addition, I think the ResourceProvider needs to be part of the graph of the objects it provides. E.g. so a Definition would have a ResourceProvider, as would Inflection, etc. etc.

Let's leave the uri field as an ID (we can rename it to id if that's what DB/GraphQL implementation would require). Maybe we should also add an optional rights field (not sure about the rights translations and where would they come for)?

for Feature, I'm not sure about SortOrder -- I don't think it should be required. I think I would also like to have a property on Feature which we can use to specify the ontology or schema the feature belongs to. The Feature names/values that we use now adhere to the AlpheiosLexicon schema, but we are going to need to be able to map to and/or support other ontologies (such as the Universal Dependencies tagset) so we should be explicit about that.

Agree that sort order might be not needed. We can apply it later (maybe even dynamically) by matching the feature name with the sort order.
What do you think might be a good name for the field specifying the schema? Would we adhere to any standards here, as schema.org?

For Word, we need Language, and also the ability to optionally include context (prefix, suffix, source).

Agree, will add those fields.

For Lexeme, I dont know if the altLemmas belongs in the Lexeme graph. We need to be able to be explicit about the isLemmaVariant relationship between lemmas. So maybe this actually belongs in the Lemma Graph? Also, I think the meaning (DefinitionSet) needs to be optional on a Lexeme because we might not have it in all cases (right now we often have that scenario, when we can't find the definition for a lemma)

I'm for moving the altLemmas to the Lemma object, it would make more sense, on my opinion. Also, maybe we can rename it to something like variants, not including the word lemma, because it will be clear enough that this pertains to lemma?

Regarding DefinitionSet currently we, if there are no definitions available, attach an empty DefinitionSet to the meaning field of the Lexeme. As a result, the Lexeme will always have a DefinitionsSet object in the meaning field, even when this object is empty. Should we keep it the same in GraphQL?

For Lemma, I am wondering if we need to pull part of speech out of features. a Lemma needs at a minimum the Part of Speech feature and may have other optional features.

I think it would be good to separate a part of speech out of the features list. Would the name partOfSpeech for the field be appropriate? I'm afraid pos might be too ambiguous.

I'm not sure about including the InflectionConstraints .. those are mostly used for matching for purposes of the InflectionTables and I'm not sure they belong here.

That's a good point, I will remove it.

For Definition, we probably need both lemmaLanguage and definitionLanguage

If we need both lemma word and the language, maybe it's better to include a simple Lemma object with obligatory fields only (ID, word, and language)? I think the structure of the Definition would be simpler this way. What do you think?

For Inflection, we should have a required Form: String property. Up until now we have been overloading the stem property, using it to hold the form when we can't identify the stem. Sometimes the stem and the form are one and the same when there isn't a suffix or prefix but sometimes that's not what is meant.

Will add that.

For the Annotation, I'm not sure what we would have in the text field for the AnnotationTypes identified here so far. If it's an assertion of the validity of a Definition as belonging to a Lexeme it's just that. And vice-versa for the negation. The Comment is where someone would supply commentary text.

If we'll use Comment for the commentary text then we don't need the text field within an Annotation. I will remove it.

In addition for Annotation we're going to need a property for Confidence. I think it can be a Number.

Will add that.

Also, I know we are focused on the Definition use case at the moment, but just a note that probably we are going to need a similar structure for Inflections (so that we can have annotations which assert or negate the relationship between a Lexeme and an Inflection, etc.)

I can add related fields into the schema.

I'm also working on adding some mutations for the Definition use cases.

@balmas
Copy link

balmas commented Jan 11, 2021

for ResourceProvider, I'm not sure there is any difference between ID and URI. The URI is the unique identifier of the provider. In addition, I think the ResourceProvider needs to be part of the graph of the objects it provides. E.g. so a Definition would have a ResourceProvider, as would Inflection, etc. etc.

Let's leave the uri field as an ID (we can rename it to id if that's what DB/GraphQL implementation would require). Maybe we should also add an optional rights field (not sure about the rights translations and where would they come for)?

Yes, we probably need description and rights fields. Currently these are defined in the adapter config, but we will need to be able to fully define a resource provider in the data store.

for Feature, I'm not sure about SortOrder -- I don't think it should be required. I think I would also like to have a property on Feature which we can use to specify the ontology or schema the feature belongs to. The Feature names/values that we use now adhere to the AlpheiosLexicon schema, but we are going to need to be able to map to and/or support other ontologies (such as the Universal Dependencies tagset) so we should be explicit about that.

Agree that sort order might be not needed. We can apply it later (maybe even dynamically) by matching the feature name with the sort order.
What do you think might be a good name for the field specifying the schema? Would we adhere to any standards here, as schema.org?

Probably not schema.org, but yes it's possible at some point we would switch to Universal Dependencies (https://universaldependencies.org/format.html#morphological-annotation) or other standard such as Lexinfo (lexinfo.net)

For field name, I think schema works as well as anything.

For Lexeme, I dont know if the altLemmas belongs in the Lexeme graph. We need to be able to be explicit about the isLemmaVariant relationship between lemmas. So maybe this actually belongs in the Lemma Graph? Also, I think the meaning (DefinitionSet) needs to be optional on a Lexeme because we might not have it in all cases (right now we often have that scenario, when we can't find the definition for a lemma)

I'm for moving the altLemmas to the Lemma object, it would make more sense, on my opinion. Also, maybe we can rename it to something like variants, not including the word lemma, because it will be clear enough that this pertains to lemma?

Yes agree.

Regarding DefinitionSet currently we, if there are no definitions available, attach an empty DefinitionSet to the meaning field of the Lexeme. As a result, the Lexeme will always have a DefinitionsSet object in the meaning field, even when this object is empty. Should we keep it the same in GraphQL?

I guess that's fine. We should probably have a standard approach to this across the board (i.e whether to use nullable or empty lists)

For Lemma, I am wondering if we need to pull part of speech out of features. a Lemma needs at a minimum the Part of Speech feature and may have other optional features.

I think it would be good to separate a part of speech out of the features list. Would the name partOfSpeech for the field be appropriate? I'm afraid pos might be too ambiguous.

partOfSpeech is fine.

For Definition, we probably need both lemmaLanguage and definitionLanguage

If we need both lemma word and the language, maybe it's better to include a simple Lemma object with obligatory fields only (ID, word, and language)? I think the structure of the Definition would be simpler this way. What do you think?

Yes.

@balmas
Copy link

balmas commented Jan 11, 2021

Thanks.

I think we still should make sortOder on Feature optional.

Regarding the mutations, I understand the recommendation from the cited article (https://www.apollographql.com/blog/designing-graphql-mutations-e09de826ed97/) (and others I've read) to suggest that mutations should have single, nested inputs and outputs, as in

commentOnLexeme(input:
  {
    lexemeID: ID!
    comment: CommentInput!
  }) {
    comment: Comment!
  }  

@kirlat
Copy link
Author

kirlat commented Jan 11, 2021

I think we still should make sortOder on Feature optional.

I removed it from the Feature, but left within the FeatureValue. Should I remove it from there too? Would we not need it there tool? What do you think?

Regarding the mutations, I understand the recommendation from the cited article (https://www.apollographql.com/blog/designing-graphql-mutations-e09de826ed97/) (and others I've read) to suggest that mutations should have single, nested inputs and outputs

That's correct, but I was not sure whether to follow it or not. It seemed a little too radical to me. It felt subversive to the language ideas behind the SDL where it allows multiple input parameters. I agree that it's good to keep the number of parameters at the minimum, but I'm not sure if we always should use the only one: creating a wrapper around several variables in order to present it as a single argument seemed like a way to create an extra verbosity. I also do not see any benefits for the versioning, because for that we can create a new mutation. Some other guides I've seen are using several mutation parameters: https://www.apollographql.com/docs/apollo-server/schema/schema/#designing-mutations. GitLab GraphQL API style guide also uses multiple arguments: https://docs.gitlab.com/ee/development/api_graphql_styleguide.html#arguments. So it seems not a clear-cut solution. I don't have a strong opinion on this, neither I have sufficient experience with GraphQL, so I try to keep my mind open. What do you think? Would it be better for us to always use a single input variable?

@balmas
Copy link

balmas commented Jan 11, 2021

I removed it from the Feature, but left within the FeatureValue. Should I remove it from there too? Would we not need it there tool? What do you think?

the thing about sortOrder is that it doesn't make sense in the context of a single feature. It belongs to the display domain rather than the data. It's presence in the morphology service output is really a legacy thing.

@balmas
Copy link

balmas commented Jan 11, 2021

Regarding the mutations, I understand the recommendation from the cited article (https://www.apollographql.com/blog/designing-graphql-mutations-e09de826ed97/) (and others I've read) to suggest that mutations should have single, nested inputs and outputs

That's correct, but I was not sure whether to follow it or not. It seemed a little too radical to me. It felt subversive to the language ideas behind the SDL where it allows multiple input parameters. I agree that it's good to keep the number of parameters at the minimum, but I'm not sure if we always should use the only one: creating a wrapper around several variables in order to present it as a single argument seemed like a way to create an extra verbosity. I also do not see any benefits for the versioning, because for that we can create a new mutation. Some other guides I've seen are using several mutation parameters: https://www.apollographql.com/docs/apollo-server/schema/schema/#designing-mutations. GitLab GraphQL API style guide also uses multiple arguments: https://docs.gitlab.com/ee/development/api_graphql_styleguide.html#arguments. So it seems not a clear-cut solution. I don't have a strong opinion on this, neither I have sufficient experience with GraphQL, so I try to keep my mind open. What do you think? Would it be better for us to always use a single input variable?

I have read a number of things which support the nesting concept. But as with everything, there are always multiple perspectives. I think we'll see what works best for us as we go.

@kirlat
Copy link
Author

kirlat commented Jan 12, 2021

I have read a number of things which support the nesting concept. But as with everything, there are always multiple perspectives. I think we'll see what works best for us as we go.

I think then we'd better to use the nested input. If it won't work for us, we can switch to using multiple variables.

I've also removed the sortOrder field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment