Based on Speech and Language Processing (3rd edition draft) by Daniel Jurafsky et al.
Word sense:
A discrete representation of one aspect of the meaning of a word.
The meaning of a word can be defined by its co-occurrences, the counts of words that often occur nearby => Word embedding models like Word2Vec or GloVe.
- Using glosses -- a translation or explanation of a word or expression;
- Defining a sense through its relationship with other senses.
Discrete senses:
We might consider two senses discrete if they have independent truth conditions, different syntactic behavior, and independent sense relations, or if they exhibit antagonistic meanings.
One practical technique for determining if two senses are distinct is to conjoin two uses of a word in a single sentence. (This kind of conjunction of antagonistic readings is called zeugma)
e.g. Given three sentences:
- Which of those flights serve breakfast?
- Does Air France serve Philadelphia?
- 【?】Does Air France serve breakfast and Philadelphia?
For education aid, dictionaries tend to capture subtle meaning differences and use many fine-grained senses; for computational purposes, we often group or cluster senses instead.
Synonym:
We say two senses are synonyms when two senses of two different words (lemmas) are identical or nearly identical.
Antonym:
Antonyms are words with an opposite meaning.
Note: Automatically distinguishing synonyms from antonyms can be difficult, because although antonyms differ completely with respect to one aspect of their meaning (position on a scale or direction), they are otherwise very similar, sharing almost all other aspects of meaning.
Hyponym & Hypernym:
A word (or sense) is a hyponym of another word (or sense) if the first is more specific, denoting a subclass of the other.
e.g.
- "dog" is a hyponym of "animal"
- "animal" is a hypernym of "dog"
Since hyponym and hypernym are easily confused, we use superordinate & subordinate more often.
Meronym & Holonym:
Meronymy represents the part-whole relation.
e.g.
- "wheel" is a meronym of "car"
- "car" is a holonym of "wheel"
Structured Polysemy:
We call the relationship between semantically related senses of a word structured polysemy.
e.g. "bank" could represent:
- An organization
- The building associated with an organization
Metonymy:
The use of one aspect of a concept or entity to refer to other aspects of the entity or to the entity itself.
WordNet is a lexical database, and the English WordNet contains three databases, one each for nouns and verbs, and a third for adjectives and adverbs. (Closed class words are not included)
Input:
A word in context and a fixed inventory of potential word senses
Output:
The correct word sense in context
Lexical sample task:
Given a small pre-selected set of target words and an inventory of senses for each word from the lexicon, disambiguate a small number of words.
All-words task:
Given an entire texts and a lexicon with an inventory of senses for each entry, disambiguate all words in the text.
Semantic concordance:
A corpus in which each open-class word in each sentence is labeled with its word sense from a specific dictionary of thesaurus, most often WordNet.
- Supervised machine learning
- Unsupervised machine learning
- Thesaurus / Dictionary-based techniques
- Selectional association
- Lightly supervised
- Compute F1 score against hand-labeled sense tags in a held-out set, wuch as the SemCor corpus or SemEval corpora;
- Another strong baseline is majority vote;
- One sense per discourse: A word appearing multiple times in a text or discourse often appears with the same sense. It works better for coarse-grained senses and particularly for cases of homonymy rather than polysemy.
1-nearest-neighbor algorithm
Training
- Embed each token in a sense-labeled training corpus
- Average each token of each sense of each word to produce a sense embedding
Testing
- Compare test embedding with training embeddings
- Return sense of the nearest neighbor based on a similarity metric such as cosine
Note: For unseen test words, we could
- Fall back to the Most Frequent Sense baseline (majority vote);
- Impute the missing sense embeddings via WordNet taxonomy and supersenses.
A simple representation for each instance of a target word
Collocational features:
Features about words at specific positions near target word.
e.g. ... guitar and bass player stand ...
=> [guitar, NN, and, CC, player, NN, stand, VB]
Bag-of-words features:
Features about words that occur anywhere in the window (regardless of position)
Most frequent sense, one sense per discourse, Lest algorithm, ...
- Assume we have some sense-labeled data (like SemCor);
- Take all the sentences with the relevant word sense;
- Now add these to the gloss + examples for each sense, call it the "signature" of a sense;
- Choose sense with most word overlap between context and signature.
If we don't have enough data to train a system:
- Pick a word that might co-occur with the target word in particular sense;
- Grep through the corpus for the target word and the hypothesized word;
- Assume that the target tag is the right one;
- Generalize from a small hand-labeled seed set.
Static word embeddings have a problem with antonyms. For example, "expensive" is often very similar in embedding cosine to its antonym like "cheap".
To improve both static and contextual word embeddings, we have two families of solutions:
- Retraining: Modify the static embedding loss function for Word2Vec, or modify contextual embedding training;
- Retrofitting / Counterfitting: After embeddings are trained, use a thesaurus to learn a second mapping that shifts antonyms apart and synonyms closer.
To disambiguate a particular token t of w we again have three steps:
Thematic roles are a way to capture the semantic commonality between "breakers" and "eaters", below are some commonly used thematic roles are their examples:
Thematic grid/Case frame: The set of thematic role arguments taken by a verb.
e.g. Possible realization of arguments of the verb "break":
- AGENT, THEME
- AGENT, THEME, INSTRUMENT
- INSTRUMENT, THEME
- THEME
Verb alternations/Diathesis alternations: Sometimes verbs can realize the same arguments in different ways, and we call these multiple argument structure realizations verb alternations or diathesis alternations.
- It is difficult to come up with a standard set of roles, and equally difficult to produce a formal definition of roles like AGENT, THEME, or INSTRUMENT; (e.g. There seem to be at least two kinds of INSTRUMENTS)
- We would like to reason about and generalize across semantic roles, but the finite discrete lists of roles don't let us do this;
- It is difficult to formally define the thematic roles.
There are alternative semantic role models that use either many fewer or many more roles.
- Define generalized semantic roles that abstract over the specific thematic roles;
- Define semantic roles that are specific to a particular verb or a particular group of semantically related verbs or nouns.
Proposition bank/PropBank: A resource of sentences annotated with semantic roles.
In general:
- arg0 - PROTO-AGENT
- arg1 - PROTO-PATIENT
- arg2 - The benefactive, instrument, attribute, or end state
- arg3 - The start point
- arg4 - The end point
PropBank focuses on verbs, while NomBank adds annotations to nouns to noun predicates.
Frame: The holistic background knowledge that unites groups of words. A frame in FrameNet is a background knowledge structure that defines a set of frame-specific roles, called frame elements.
Core roles: Semantic roles that are frame specific.
Non-core roles: Semantic roles which are more like the Arg-M arguments in PropBank, expressing more general properties of time, location, and so on.
FrameNet also codes relationships between frames, allowing frames to inherit from each other, or representing relations between frames like causation.
Semantic role labeling (SRL) is the task of automatically finding the semantic roles of each argument of each predicate in a sentence.
- FrameNet employs many frame-specific frame elements as roles;
- PropBank uses a smaller number of numbered argument labels that can be interpreted as verb specific labels, along with the more general ARGM labels.
- A useful shallow semantic representation
- Improves downstream NLP tasks (like machine translation and question answering)
- Pruning;
- Identification;
- Classification
There is a common final stage to deal with global consistency since the algorithm classifies everything locally -- each decision about a constituent is made independently of all others.
To do the joint inference, we could rerank labels:
- The first stage produces multiple possible labels for each constituent
- The second stage classifies the best global label for all consituents
- Each argument label must be assigned to the exactly correct word sequence or parse constituent
- We could use precision, recall and F-measure
- Two commonly used datasets for evaluation are CoNLL-2005 and CoNLL-2012
Problem: Consider the two interpretations of "I want to eat someplace nearby":
- "someplace nearby" is a location adjunct;
- Speaker is a Godzilla.
We could add new term to the representation:
However, it has two problems:
- Using FOL to perform the simple task of enforcing selectional restrictions is overkill;
- This approach presupposes a large, logical knowledge base of facts about the concepts that make up selectional restrictions.
A more practical approach is to state selectional restrictions in ters of WordNet synsets rather than as logical concepts.
- Kullback-Leibler divergence: The difference between two distributions
- Selectional preference: How much information the verb expresses about the semantic class of its argument
- Selectional association of a verb with a class: The relative contribution of the class to the general preference of the verb
To compute selectional association:
- A probabilistic measure of the strength of association between a predicate and a semantic class of its argument
- A model represents the association of predicate v with a noun n
To evaluate:
- Pseudowords
- Compare to human preferences
Premitive decomposition/Componential analysis is the idea of decomposing meaning into sets of primitive semantics elements or features.
- Discourse structure
- Rhetorical structure
- Entity structure
When a referent is first mentioned in a discourse, a representation is evoked in the model.
- Lexical factors
- Reference type: Inferrability, discontinuous set, generics, one anaphora, pronouns,...
- Discourse factors
- Recency
- Focus/Topic structure, digression
- Repeated mention
- Syntactic factors
- Agreement: Gender, number, person, case
- Parallel construction
- Grammatical role
- Semantic/Lexical factors
- Selectional restrictions
- Verb semantics, thematic role
Finding in a text all the referring expressions that have one and the same denotation.
- Input: Text
- Output: All entities and the coreference links between them (create clusters)
This is the first stage of coreference: finding the spans of text that constitute each mention.
- Input: A candidate anaphor and a candidate antecedent
- Output: Probablistic binary decision about coreference
- Machine learning supervised classifiers
- Need a heuristic for sampling training examples due to class imbalance