Skip to content

Instantly share code, notes, and snippets.

@wenkokke
Last active December 21, 2015 13:09
Show Gist options
  • Save wenkokke/6310866 to your computer and use it in GitHub Desktop.
Save wenkokke/6310866 to your computer and use it in GitHub Desktop.
An attempt to encode Combinatory Categorial Grammar in Agda... which isn't as pretty as I'd like, but this is probably due to CCG. ^^
# Encoding CCGs in Agda.
Bla bla bla.
First attempt using type system to check validity of sentences. But sentences may be ambiguous, and
type raising cannot be properly handled due to the implementation of Agda implicits.
Since this blog post is literate Agda, we'll have some administrive business to get out of the way.
\begin{code}
module CCG where
open import Function using (_$_; _∘_; id; const; flip)
open import Category.Monad
open import Data.Bool using (Bool; true; false; _∧_; _∨_; not)
open import Data.Nat using (ℕ; zero; suc)
open import Data.Fin using (Fin; zero; suc; raise)
open import Data.Vec using (Vec; []; _∷_) renaming (map to vmap; foldr to vfoldr)
import Data.String as String using (String; _++_)
open String using (String)
import Data.List as List using (List; _∷_; []; [_]; _++_; foldr; any; all; map; concatMap; monadPlus)
open List using (List; _∷_; [])
open import Data.Empty using (⊥)
open import Data.Unit using (⊤)
open import Data.Product using (Σ; _,_; proj₁; proj₂; ∃)
open import Relation.Nullary using (yes; no)
open import Relation.Binary using (Decidable)
open import Relation.Binary.PropositionalEquality using (_≡_; refl; cong)
\end{code}
Furthermore, we'll need an implementation of the list monad. Since Agda's support
for typeclasses is currently still lacking and the implementations in the standard
library are a bit obtruse, we'll implement them below.
\begin{code}
listMonadPlus = List.monadPlus
open RawMonadPlus {{...}}
\end{code}
Now, without further ado, an implementation of CCG in Agda.
We can define a datatype of tectogrammatical types as follows.
\begin{code}
infixl 7 _//_
infixr 7 _\\_
data Tecto : Set where
S : Tecto
N : Tecto
NP : Tecto
_\\_ : Tecto → Tecto → Tecto
_//_ : Tecto → Tecto → Tecto
\end{code}
And define a few commonly used aliases. Note that the difference
types `GQ₀` and `GQ₁` are required for the use of generalized
quantifiers in subject versus object position (and one more deinition)
would be needed for usage in the second object position.
\begin{code}
VP : Tecto
VP = NP \\ S
TV : Tecto
TV = VP // NP
DTV : Tecto
DTV = TV // NP
Q₀ : Tecto
Q₀ = S // VP
Q₁ : Tecto
Q₁ = TV \\ VP
GQ₀ : Tecto
GQ₀ = Q₀ // N
GQ₁ : Tecto
GQ₁ = Q₁ // N
\end{code}
We can prove that for the set of tectogrammatical types, equality is decidable.
To do this, we will need a set of elimination rules for the more complex types.
\begin{code}
elimr-\\ : Tecto → Tecto
elimr-\\ (_ \\ b) = b
elimr-\\ a = a
eliml-\\ : Tecto → Tecto
eliml-\\ (a \\ _) = a
eliml-\\ a = a
elimr-// : Tecto → Tecto
elimr-// (_ // a) = a
elimr-// a = a
eliml-// : Tecto → Tecto
eliml-// (b // _) = b
eliml-// a = a
\end{code}
Using these rules, we can construct an instance of decidable equality
for the Tecto datatype.
\begin{code}
_≟_ : Decidable {A = Tecto} _≡_
S ≟ S = yes refl
S ≟ N = no (λ ())
S ≟ NP = no (λ ())
S ≟ (_ \\ _) = no (λ ())
S ≟ (_ // _) = no (λ ())
N ≟ N = yes refl
N ≟ S = no (λ ())
N ≟ NP = no (λ ())
N ≟ (_ \\ _) = no (λ ())
N ≟ (_ // _) = no (λ ())
NP ≟ NP = yes refl
NP ≟ S = no (λ ())
NP ≟ N = no (λ ())
NP ≟ (_ \\ _) = no (λ ())
NP ≟ (_ // _) = no (λ ())
(_ \\ _) ≟ S = no (λ ())
(_ \\ _) ≟ N = no (λ ())
(_ \\ _) ≟ NP = no (λ ())
(_ \\ _) ≟ (_ // _) = no (λ ())
(a \\ b) ≟ ( c \\ d) with (a ≟ c) | (b ≟ d)
(a \\ b) ≟ (.a \\ .b) | yes refl | yes refl = yes refl
(a \\ b) ≟ ( c \\ d) | no ¬ac | _ = no (¬ac ∘ cong eliml-\\)
(a \\ b) ≟ ( c \\ d) | _ | no ¬db = no (¬db ∘ cong elimr-\\)
(_ // _) ≟ S = no (λ ())
(_ // _) ≟ N = no (λ ())
(_ // _) ≟ NP = no (λ ())
(_ // _) ≟ (_ \\ _) = no (λ ())
(b // a) ≟ ( d // c) with (a ≟ c) | (b ≟ d)
(b // a) ≟ (.b // .a) | yes refl | yes refl = yes refl
(b // a) ≟ ( d // c) | no ¬ac | _ = no (¬ac ∘ cong elimr-//)
(b // a) ≟ ( d // c) | _ | no ¬db = no (¬db ∘ cong eliml-//)
\end{code}
Using our definition of tectogrammatical types we can define denotations
as being a pair of a tectogrammatical type and an interpretation function
over that type, storing only a value of the interpreted type.
\begin{code}
record Den (I : Tecto → Set) (T : Tecto) : Set where
constructor den
field
intp : I T
\end{code}
Because we cannot use the type system to check the validity of our (possibly
ambious) sentences, we hide our representations in a datatype that pairs
a tectogrammatical type with a denotation for that type.
Furthermore, to encode the possibility of faillure, we shall wrap this pair
in a monad. For this we have several options: using the maybe monad will give
us the first valid derivation, if any. In this case we will use the list monad,
which computes the list of *all possible derivations*.
\begin{code}
Repr : (Tecto → Set) → Set
Repr I = List (∃ I)
repr : ∀ {Intp} → (T : Tecto) → Intp T → Repr (Den Intp)
repr T t = return (T , den t)
reprs : ∀ {Intp} → List (Σ Tecto Intp) → Repr (Den Intp)
reprs rs = List.concatMap (λ x → repr (proj₁ x) (proj₂ x)) rs
\end{code}
Now we define a "typeclass" for composition of representations, which should be
instantiated for every interpretation.
\begin{code}
record Composable (Intp : Tecto → Set) : Set where
constructor withRules
infixr 5 _⊕_
field
rules : List ((A B : Tecto) → Intp A → Intp B → Repr Intp)
\end{code}
And we can order all these rules of composition using the alternative
combinator `∣`, and use this to define a composition function from
intepretations to (fallible) representations.
\begin{code}
compose : (A B : Tecto) → Intp A → Intp B → Repr Intp
compose A B a b = List.foldr (λ r rs → rs ∣ r A B a b) ∅ rules
_⊕_ : Repr (Den Intp) → Repr (Den Intp) → Repr (Den Intp)
_⊕_ ma mb with ma >>= λ ja →
mb >>= λ jb →
let A = proj₁ ja
B = proj₁ jb
a = Den.intp (proj₂ ja)
b = Den.intp (proj₂ jb)
in compose A B a b
_⊕_ _ _ | mx = (λ x → (proj₁ x) , (den $ proj₂ x)) <$> mx
open Composable {{...}} using (_⊕_)
\end{code}
Let's try to write a simple interpretation now.
\begin{code}
apply-// : ∀ {I}
→ (∀ {A B} → I (B // A) → I A → I B)
→ (S T : Tecto) → I S → I T → Repr I
apply-// _ (B // A₀) A₁ f x with A₀ ≟ A₁
apply-// ap (B // A ) .A f x | yes refl = return (B , ap f x)
apply-// _ (_ // _ ) _ _ _ | no ¬p = ∅
apply-// _ _ _ _ _ = ∅
apply-\\ : ∀ {I}
→ (∀ {A B} → I A → I (A \\ B) → I B)
→ (S T : Tecto) → I S → I T → Repr I
apply-\\ _ A₀ (A₁ \\ B) x f with A₀ ≟ A₁
apply-\\ ap .A (A \\ B) x f | yes refl = return (B , ap x f)
apply-\\ _ _ (_ \\ _) _ _ | no ¬p = ∅
apply-\\ _ _ _ _ _ = ∅
compose-// : ∀ {I}
→ (∀ {A B C} → I (C // B) → I (B // A) → I (C // A))
→ (S T : Tecto) → I S → I T → Repr I
compose-// _ (C // B₀) ( B₁ // A) g f with B₀ ≟ B₁
compose-// cm (C // B ) (.B // A) g f | yes refl = return (C // A , cm g f)
compose-// _ (_ // _ ) ( _ // _) _ _ | no ¬p = ∅
compose-// _ _ _ _ _ = ∅
compose-\\ : ∀ {I}
→ (∀ {A B C} → I (A \\ B) → I (B \\ C) → I (A \\ C))
→ (S T : Tecto) → I S → I T → Repr I
compose-\\ _ (A \\ B₀) ( B₁ \\ C) f g with B₀ ≟ B₁
compose-\\ cm (A \\ B ) (.B \\ C) f g | yes refl = return (A \\ C , cm f g)
compose-\\ _ (_ \\ _ ) ( _ \\ _) _ _ | no ¬p = ∅
compose-\\ _ _ _ _ _ = ∅
\end{code}
In CCG, the phenological interpretation of any tectogrammatical type
is always the string type.
Below we will define the phenological interpretation, and prove its
correctness with respect to the `Pheno?` predicate.
This (trivial) correctness proof shows that `Pheno.Intp` maps every
tectogrammatical type to a phenogrammatical type.
\begin{code}
module Pheno where
Intp : Tecto → Set
Intp _ = String
Pheno : Tecto → Set
Pheno = Den Intp
data Pheno? : Set → Set₁ where
ST : Pheno? String
correctness : ∀ {T} → Pheno? (Intp T)
correctness = ST
\end{code}
Now we define several rules of composition. Of the six rules of CCG
we will define two (we'll leave type raising for a later time).
Most of these rules are simply restricted versions of string concatination.
\begin{code}
_∙_ : String → String → String
_∙_ a b = String._++_ a (String._++_ " " b)
rules : List ((A B : Tecto) → Intp A → Intp B → Repr Intp)
rules = apply-// _∙_
∷ apply-\\ _∙_
∷ compose-// _∙_
∷ compose-\\ _∙_
∷ []
Composable-Pheno : Composable Intp
Composable-Pheno = withRules rules
\end{code}
Finally we'll import the `Pheno` type and the composablility instance
into scope.
\begin{code}
open Pheno using (Pheno; Composable-Pheno)
\end{code}
\begin{code}
module Sem where
Intp : ℕ → Tecto → Set
Intp _ S = Bool
Intp n N = Fin n → Bool
Intp n NP = Fin n
Intp n (A \\ B) = Intp n A → Intp n B
Intp n (B // A) = Intp n A → Intp n B
data Sem? : Set → Set₁ where
E : ∀ {n} → Sem? (Fin n)
T : Sem? Bool
AP : ∀ {A B} → Sem? A → Sem? B → Sem? (A → B)
correctness : ∀ {n} {T} → Sem? (Intp n T)
correctness {_} {S} = T
correctness {n} {N} = AP E T
correctness {n} {NP} = E
correctness {n} {a \\ b} = AP (correctness {n} {a}) (correctness {n} {b})
correctness {n} {b // a} = AP (correctness {n} {a}) (correctness {n} {b})
\end{code}
\begin{code}
Sem : ℕ → Tecto → Set
Sem n = Den (Intp n)
\end{code}
\begin{code}
rules : ∀ {n} → List ((A B : Tecto) → Intp n A → Intp n B → Repr (Intp n))
rules = apply-// (λ f x → f x)
∷ apply-\\ (λ x f → f x)
∷ compose-// (λ g f → g ∘ f)
∷ compose-\\ (λ f g → g ∘ f)
∷ []
Composable-Sem : ∀ {n} → Composable (Intp n)
Composable-Sem = withRules rules
\end{code}
\begin{code}
open Sem using (Sem; Composable-Sem)
\end{code}
Define a small toy-lexicon to test out our code.
\begin{code}
record Lexicon (I : Tecto → Set) : Set where
field
john : Repr I
mary : Repr I
bill : Repr I
person : Repr I
runs : Repr I
likes : Repr I
some : Repr I
every : Repr I
somebody : Repr I
everybody : Repr I
open Lexicon {{...}}
\end{code}
\begin{code}
Phenology : Lexicon Pheno
Phenology = record
{ john = repr NP "John"
; mary = repr NP "Mary"
; bill = repr NP "Bill"
; person = repr N "person"
; runs = repr VP "runs"
; likes = repr TV "likes"
; some = reprs ( ( GQ₀ , "some" ) ∷ ( GQ₁ , "some" ) ∷ [] )
; every = reprs ( ( GQ₀ , "every" ) ∷ ( GQ₁ , "every" ) ∷ [] )
; somebody = reprs ( ( Q₀ , "somebody" ) ∷ ( Q₁ , "somebody" ) ∷ [] )
; everybody = reprs ( ( Q₀ , "everybody" ) ∷ ( Q₁ , "everybody" ) ∷ [] )
}
\end{code}
\begin{code}
p₁ : Repr Pheno
p₁ = john ⊕ runs
p₂ : Repr Pheno
p₂ = john ⊕ likes ⊕ mary
p₃ : Repr Pheno
p₃ = somebody ⊕ likes ⊕ bill
p₄ : Repr Pheno
p₄ = mary ⊕ likes ⊕ somebody
\end{code}
\begin{code}
domain : {n : ℕ} → Vec (Fin n) n
domain {zero} = []
domain {suc n} = zero ∷ vmap (raise 1) domain
any : ∀ {ℓ} {n} {A : Set ℓ} → (A → Bool) → Vec A n → Bool
any p = vfoldr (const Bool) _∧_ true ∘ vmap p
all : ∀ {ℓ} {n} {A : Set ℓ} → (A → Bool) → Vec A n → Bool
all p = vfoldr (const Bool) _∨_ false ∘ vmap p
\end{code}
\begin{code}
Semantics : Lexicon (Sem 3)
Semantics = record
{ john = repr NP zero
; mary = repr NP (suc zero)
; bill = repr NP (suc (suc zero))
; person = repr N person'
; runs = repr VP runs'
; likes = repr TV _islikedby_
; some = reprs ( ( GQ₀ , some₀ ) ∷ ( GQ₁ , some₁ ) ∷ [] )
; every = reprs ( ( GQ₀ , every₀) ∷ ( GQ₁ , every₁) ∷ [] )
; somebody = reprs ( ( Q₀ , some₀ person' ) ∷ ( Q₁ , some₁ person' ) ∷ [] )
; everybody = reprs ( ( Q₀ , every₀ person' ) ∷ ( Q₁ , every₁ person' ) ∷ [] )
} where
person' : Fin 3 → Bool
person' _ = true
runs' : Fin 3 → Bool
runs' zero = true
runs' (suc zero) = false
runs' (suc (suc zero)) = false
runs' (suc (suc (suc ())))
_islikedby_ : Fin 3 → Fin 3 → Bool
zero islikedby suc zero = true
suc zero islikedby zero = true
suc zero islikedby suc (suc zero) = true
_ islikedby _ = false
some₀ : (Fin 3 → Bool) → (Fin 3 → Bool) → Bool
some₀ n p = any (λ x → n x ∧ p x) domain
every₀ : (Fin 3 → Bool) → (Fin 3 → Bool) → Bool
every₀ n p = all (λ x → not (n x) ∧ p x) domain
some₁ : (Fin 3 → Bool) → (Fin 3 → Fin 3 → Bool) → Fin 3 → Bool
some₁ n p₂ x = some₀ n (flip p₂ x)
every₁ : (Fin 3 → Bool) → (Fin 3 → Fin 3 → Bool) → Fin 3 → Bool
every₁ n p₂ x = every₀ n (flip p₂ x)
\end{code}
\begin{code}
s₁ : Repr (Sem 3)
s₁ = john ⊕ runs
s₂ : Repr (Sem 3)
s₂ = john ⊕ likes ⊕ mary
s₃ : Repr (Sem 3)
s₃ = somebody ⊕ likes ⊕ bill
s₄ : Repr (Sem 3)
s₄ = mary ⊕ likes ⊕ somebody
\end{code}
TODO: abstract over the multitude of `apply-*` and `compose-*` principles, in
such a fashion that the logic of when composition is possible (due to
the tecto types) is only written once, and hidden from the implementations.
title: CCG in Agda
author: Pepijn Kokke
date: 2013-09-01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment