Last active
December 21, 2015 13:09
-
-
Save wenkokke/6310866 to your computer and use it in GitHub Desktop.
An attempt to encode Combinatory Categorial Grammar in Agda... which isn't as pretty as I'd like, but this is probably due to CCG. ^^
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Encoding CCGs in Agda. | |
Bla bla bla. | |
First attempt using type system to check validity of sentences. But sentences may be ambiguous, and | |
type raising cannot be properly handled due to the implementation of Agda implicits. | |
Since this blog post is literate Agda, we'll have some administrive business to get out of the way. | |
\begin{code} | |
module CCG where | |
open import Function using (_$_; _∘_; id; const; flip) | |
open import Category.Monad | |
open import Data.Bool using (Bool; true; false; _∧_; _∨_; not) | |
open import Data.Nat using (ℕ; zero; suc) | |
open import Data.Fin using (Fin; zero; suc; raise) | |
open import Data.Vec using (Vec; []; _∷_) renaming (map to vmap; foldr to vfoldr) | |
import Data.String as String using (String; _++_) | |
open String using (String) | |
import Data.List as List using (List; _∷_; []; [_]; _++_; foldr; any; all; map; concatMap; monadPlus) | |
open List using (List; _∷_; []) | |
open import Data.Empty using (⊥) | |
open import Data.Unit using (⊤) | |
open import Data.Product using (Σ; _,_; proj₁; proj₂; ∃) | |
open import Relation.Nullary using (yes; no) | |
open import Relation.Binary using (Decidable) | |
open import Relation.Binary.PropositionalEquality using (_≡_; refl; cong) | |
\end{code} | |
Furthermore, we'll need an implementation of the list monad. Since Agda's support | |
for typeclasses is currently still lacking and the implementations in the standard | |
library are a bit obtruse, we'll implement them below. | |
\begin{code} | |
listMonadPlus = List.monadPlus | |
open RawMonadPlus {{...}} | |
\end{code} | |
Now, without further ado, an implementation of CCG in Agda. | |
We can define a datatype of tectogrammatical types as follows. | |
\begin{code} | |
infixl 7 _//_ | |
infixr 7 _\\_ | |
data Tecto : Set where | |
S : Tecto | |
N : Tecto | |
NP : Tecto | |
_\\_ : Tecto → Tecto → Tecto | |
_//_ : Tecto → Tecto → Tecto | |
\end{code} | |
And define a few commonly used aliases. Note that the difference | |
types `GQ₀` and `GQ₁` are required for the use of generalized | |
quantifiers in subject versus object position (and one more deinition) | |
would be needed for usage in the second object position. | |
\begin{code} | |
VP : Tecto | |
VP = NP \\ S | |
TV : Tecto | |
TV = VP // NP | |
DTV : Tecto | |
DTV = TV // NP | |
Q₀ : Tecto | |
Q₀ = S // VP | |
Q₁ : Tecto | |
Q₁ = TV \\ VP | |
GQ₀ : Tecto | |
GQ₀ = Q₀ // N | |
GQ₁ : Tecto | |
GQ₁ = Q₁ // N | |
\end{code} | |
We can prove that for the set of tectogrammatical types, equality is decidable. | |
To do this, we will need a set of elimination rules for the more complex types. | |
\begin{code} | |
elimr-\\ : Tecto → Tecto | |
elimr-\\ (_ \\ b) = b | |
elimr-\\ a = a | |
eliml-\\ : Tecto → Tecto | |
eliml-\\ (a \\ _) = a | |
eliml-\\ a = a | |
elimr-// : Tecto → Tecto | |
elimr-// (_ // a) = a | |
elimr-// a = a | |
eliml-// : Tecto → Tecto | |
eliml-// (b // _) = b | |
eliml-// a = a | |
\end{code} | |
Using these rules, we can construct an instance of decidable equality | |
for the Tecto datatype. | |
\begin{code} | |
_≟_ : Decidable {A = Tecto} _≡_ | |
S ≟ S = yes refl | |
S ≟ N = no (λ ()) | |
S ≟ NP = no (λ ()) | |
S ≟ (_ \\ _) = no (λ ()) | |
S ≟ (_ // _) = no (λ ()) | |
N ≟ N = yes refl | |
N ≟ S = no (λ ()) | |
N ≟ NP = no (λ ()) | |
N ≟ (_ \\ _) = no (λ ()) | |
N ≟ (_ // _) = no (λ ()) | |
NP ≟ NP = yes refl | |
NP ≟ S = no (λ ()) | |
NP ≟ N = no (λ ()) | |
NP ≟ (_ \\ _) = no (λ ()) | |
NP ≟ (_ // _) = no (λ ()) | |
(_ \\ _) ≟ S = no (λ ()) | |
(_ \\ _) ≟ N = no (λ ()) | |
(_ \\ _) ≟ NP = no (λ ()) | |
(_ \\ _) ≟ (_ // _) = no (λ ()) | |
(a \\ b) ≟ ( c \\ d) with (a ≟ c) | (b ≟ d) | |
(a \\ b) ≟ (.a \\ .b) | yes refl | yes refl = yes refl | |
(a \\ b) ≟ ( c \\ d) | no ¬ac | _ = no (¬ac ∘ cong eliml-\\) | |
(a \\ b) ≟ ( c \\ d) | _ | no ¬db = no (¬db ∘ cong elimr-\\) | |
(_ // _) ≟ S = no (λ ()) | |
(_ // _) ≟ N = no (λ ()) | |
(_ // _) ≟ NP = no (λ ()) | |
(_ // _) ≟ (_ \\ _) = no (λ ()) | |
(b // a) ≟ ( d // c) with (a ≟ c) | (b ≟ d) | |
(b // a) ≟ (.b // .a) | yes refl | yes refl = yes refl | |
(b // a) ≟ ( d // c) | no ¬ac | _ = no (¬ac ∘ cong elimr-//) | |
(b // a) ≟ ( d // c) | _ | no ¬db = no (¬db ∘ cong eliml-//) | |
\end{code} | |
Using our definition of tectogrammatical types we can define denotations | |
as being a pair of a tectogrammatical type and an interpretation function | |
over that type, storing only a value of the interpreted type. | |
\begin{code} | |
record Den (I : Tecto → Set) (T : Tecto) : Set where | |
constructor den | |
field | |
intp : I T | |
\end{code} | |
Because we cannot use the type system to check the validity of our (possibly | |
ambious) sentences, we hide our representations in a datatype that pairs | |
a tectogrammatical type with a denotation for that type. | |
Furthermore, to encode the possibility of faillure, we shall wrap this pair | |
in a monad. For this we have several options: using the maybe monad will give | |
us the first valid derivation, if any. In this case we will use the list monad, | |
which computes the list of *all possible derivations*. | |
\begin{code} | |
Repr : (Tecto → Set) → Set | |
Repr I = List (∃ I) | |
repr : ∀ {Intp} → (T : Tecto) → Intp T → Repr (Den Intp) | |
repr T t = return (T , den t) | |
reprs : ∀ {Intp} → List (Σ Tecto Intp) → Repr (Den Intp) | |
reprs rs = List.concatMap (λ x → repr (proj₁ x) (proj₂ x)) rs | |
\end{code} | |
Now we define a "typeclass" for composition of representations, which should be | |
instantiated for every interpretation. | |
\begin{code} | |
record Composable (Intp : Tecto → Set) : Set where | |
constructor withRules | |
infixr 5 _⊕_ | |
field | |
rules : List ((A B : Tecto) → Intp A → Intp B → Repr Intp) | |
\end{code} | |
And we can order all these rules of composition using the alternative | |
combinator `∣`, and use this to define a composition function from | |
intepretations to (fallible) representations. | |
\begin{code} | |
compose : (A B : Tecto) → Intp A → Intp B → Repr Intp | |
compose A B a b = List.foldr (λ r rs → rs ∣ r A B a b) ∅ rules | |
_⊕_ : Repr (Den Intp) → Repr (Den Intp) → Repr (Den Intp) | |
_⊕_ ma mb with ma >>= λ ja → | |
mb >>= λ jb → | |
let A = proj₁ ja | |
B = proj₁ jb | |
a = Den.intp (proj₂ ja) | |
b = Den.intp (proj₂ jb) | |
in compose A B a b | |
_⊕_ _ _ | mx = (λ x → (proj₁ x) , (den $ proj₂ x)) <$> mx | |
open Composable {{...}} using (_⊕_) | |
\end{code} | |
Let's try to write a simple interpretation now. | |
\begin{code} | |
apply-// : ∀ {I} | |
→ (∀ {A B} → I (B // A) → I A → I B) | |
→ (S T : Tecto) → I S → I T → Repr I | |
apply-// _ (B // A₀) A₁ f x with A₀ ≟ A₁ | |
apply-// ap (B // A ) .A f x | yes refl = return (B , ap f x) | |
apply-// _ (_ // _ ) _ _ _ | no ¬p = ∅ | |
apply-// _ _ _ _ _ = ∅ | |
apply-\\ : ∀ {I} | |
→ (∀ {A B} → I A → I (A \\ B) → I B) | |
→ (S T : Tecto) → I S → I T → Repr I | |
apply-\\ _ A₀ (A₁ \\ B) x f with A₀ ≟ A₁ | |
apply-\\ ap .A (A \\ B) x f | yes refl = return (B , ap x f) | |
apply-\\ _ _ (_ \\ _) _ _ | no ¬p = ∅ | |
apply-\\ _ _ _ _ _ = ∅ | |
compose-// : ∀ {I} | |
→ (∀ {A B C} → I (C // B) → I (B // A) → I (C // A)) | |
→ (S T : Tecto) → I S → I T → Repr I | |
compose-// _ (C // B₀) ( B₁ // A) g f with B₀ ≟ B₁ | |
compose-// cm (C // B ) (.B // A) g f | yes refl = return (C // A , cm g f) | |
compose-// _ (_ // _ ) ( _ // _) _ _ | no ¬p = ∅ | |
compose-// _ _ _ _ _ = ∅ | |
compose-\\ : ∀ {I} | |
→ (∀ {A B C} → I (A \\ B) → I (B \\ C) → I (A \\ C)) | |
→ (S T : Tecto) → I S → I T → Repr I | |
compose-\\ _ (A \\ B₀) ( B₁ \\ C) f g with B₀ ≟ B₁ | |
compose-\\ cm (A \\ B ) (.B \\ C) f g | yes refl = return (A \\ C , cm f g) | |
compose-\\ _ (_ \\ _ ) ( _ \\ _) _ _ | no ¬p = ∅ | |
compose-\\ _ _ _ _ _ = ∅ | |
\end{code} | |
In CCG, the phenological interpretation of any tectogrammatical type | |
is always the string type. | |
Below we will define the phenological interpretation, and prove its | |
correctness with respect to the `Pheno?` predicate. | |
This (trivial) correctness proof shows that `Pheno.Intp` maps every | |
tectogrammatical type to a phenogrammatical type. | |
\begin{code} | |
module Pheno where | |
Intp : Tecto → Set | |
Intp _ = String | |
Pheno : Tecto → Set | |
Pheno = Den Intp | |
data Pheno? : Set → Set₁ where | |
ST : Pheno? String | |
correctness : ∀ {T} → Pheno? (Intp T) | |
correctness = ST | |
\end{code} | |
Now we define several rules of composition. Of the six rules of CCG | |
we will define two (we'll leave type raising for a later time). | |
Most of these rules are simply restricted versions of string concatination. | |
\begin{code} | |
_∙_ : String → String → String | |
_∙_ a b = String._++_ a (String._++_ " " b) | |
rules : List ((A B : Tecto) → Intp A → Intp B → Repr Intp) | |
rules = apply-// _∙_ | |
∷ apply-\\ _∙_ | |
∷ compose-// _∙_ | |
∷ compose-\\ _∙_ | |
∷ [] | |
Composable-Pheno : Composable Intp | |
Composable-Pheno = withRules rules | |
\end{code} | |
Finally we'll import the `Pheno` type and the composablility instance | |
into scope. | |
\begin{code} | |
open Pheno using (Pheno; Composable-Pheno) | |
\end{code} | |
\begin{code} | |
module Sem where | |
Intp : ℕ → Tecto → Set | |
Intp _ S = Bool | |
Intp n N = Fin n → Bool | |
Intp n NP = Fin n | |
Intp n (A \\ B) = Intp n A → Intp n B | |
Intp n (B // A) = Intp n A → Intp n B | |
data Sem? : Set → Set₁ where | |
E : ∀ {n} → Sem? (Fin n) | |
T : Sem? Bool | |
AP : ∀ {A B} → Sem? A → Sem? B → Sem? (A → B) | |
correctness : ∀ {n} {T} → Sem? (Intp n T) | |
correctness {_} {S} = T | |
correctness {n} {N} = AP E T | |
correctness {n} {NP} = E | |
correctness {n} {a \\ b} = AP (correctness {n} {a}) (correctness {n} {b}) | |
correctness {n} {b // a} = AP (correctness {n} {a}) (correctness {n} {b}) | |
\end{code} | |
\begin{code} | |
Sem : ℕ → Tecto → Set | |
Sem n = Den (Intp n) | |
\end{code} | |
\begin{code} | |
rules : ∀ {n} → List ((A B : Tecto) → Intp n A → Intp n B → Repr (Intp n)) | |
rules = apply-// (λ f x → f x) | |
∷ apply-\\ (λ x f → f x) | |
∷ compose-// (λ g f → g ∘ f) | |
∷ compose-\\ (λ f g → g ∘ f) | |
∷ [] | |
Composable-Sem : ∀ {n} → Composable (Intp n) | |
Composable-Sem = withRules rules | |
\end{code} | |
\begin{code} | |
open Sem using (Sem; Composable-Sem) | |
\end{code} | |
Define a small toy-lexicon to test out our code. | |
\begin{code} | |
record Lexicon (I : Tecto → Set) : Set where | |
field | |
john : Repr I | |
mary : Repr I | |
bill : Repr I | |
person : Repr I | |
runs : Repr I | |
likes : Repr I | |
some : Repr I | |
every : Repr I | |
somebody : Repr I | |
everybody : Repr I | |
open Lexicon {{...}} | |
\end{code} | |
\begin{code} | |
Phenology : Lexicon Pheno | |
Phenology = record | |
{ john = repr NP "John" | |
; mary = repr NP "Mary" | |
; bill = repr NP "Bill" | |
; person = repr N "person" | |
; runs = repr VP "runs" | |
; likes = repr TV "likes" | |
; some = reprs ( ( GQ₀ , "some" ) ∷ ( GQ₁ , "some" ) ∷ [] ) | |
; every = reprs ( ( GQ₀ , "every" ) ∷ ( GQ₁ , "every" ) ∷ [] ) | |
; somebody = reprs ( ( Q₀ , "somebody" ) ∷ ( Q₁ , "somebody" ) ∷ [] ) | |
; everybody = reprs ( ( Q₀ , "everybody" ) ∷ ( Q₁ , "everybody" ) ∷ [] ) | |
} | |
\end{code} | |
\begin{code} | |
p₁ : Repr Pheno | |
p₁ = john ⊕ runs | |
p₂ : Repr Pheno | |
p₂ = john ⊕ likes ⊕ mary | |
p₃ : Repr Pheno | |
p₃ = somebody ⊕ likes ⊕ bill | |
p₄ : Repr Pheno | |
p₄ = mary ⊕ likes ⊕ somebody | |
\end{code} | |
\begin{code} | |
domain : {n : ℕ} → Vec (Fin n) n | |
domain {zero} = [] | |
domain {suc n} = zero ∷ vmap (raise 1) domain | |
any : ∀ {ℓ} {n} {A : Set ℓ} → (A → Bool) → Vec A n → Bool | |
any p = vfoldr (const Bool) _∧_ true ∘ vmap p | |
all : ∀ {ℓ} {n} {A : Set ℓ} → (A → Bool) → Vec A n → Bool | |
all p = vfoldr (const Bool) _∨_ false ∘ vmap p | |
\end{code} | |
\begin{code} | |
Semantics : Lexicon (Sem 3) | |
Semantics = record | |
{ john = repr NP zero | |
; mary = repr NP (suc zero) | |
; bill = repr NP (suc (suc zero)) | |
; person = repr N person' | |
; runs = repr VP runs' | |
; likes = repr TV _islikedby_ | |
; some = reprs ( ( GQ₀ , some₀ ) ∷ ( GQ₁ , some₁ ) ∷ [] ) | |
; every = reprs ( ( GQ₀ , every₀) ∷ ( GQ₁ , every₁) ∷ [] ) | |
; somebody = reprs ( ( Q₀ , some₀ person' ) ∷ ( Q₁ , some₁ person' ) ∷ [] ) | |
; everybody = reprs ( ( Q₀ , every₀ person' ) ∷ ( Q₁ , every₁ person' ) ∷ [] ) | |
} where | |
person' : Fin 3 → Bool | |
person' _ = true | |
runs' : Fin 3 → Bool | |
runs' zero = true | |
runs' (suc zero) = false | |
runs' (suc (suc zero)) = false | |
runs' (suc (suc (suc ()))) | |
_islikedby_ : Fin 3 → Fin 3 → Bool | |
zero islikedby suc zero = true | |
suc zero islikedby zero = true | |
suc zero islikedby suc (suc zero) = true | |
_ islikedby _ = false | |
some₀ : (Fin 3 → Bool) → (Fin 3 → Bool) → Bool | |
some₀ n p = any (λ x → n x ∧ p x) domain | |
every₀ : (Fin 3 → Bool) → (Fin 3 → Bool) → Bool | |
every₀ n p = all (λ x → not (n x) ∧ p x) domain | |
some₁ : (Fin 3 → Bool) → (Fin 3 → Fin 3 → Bool) → Fin 3 → Bool | |
some₁ n p₂ x = some₀ n (flip p₂ x) | |
every₁ : (Fin 3 → Bool) → (Fin 3 → Fin 3 → Bool) → Fin 3 → Bool | |
every₁ n p₂ x = every₀ n (flip p₂ x) | |
\end{code} | |
\begin{code} | |
s₁ : Repr (Sem 3) | |
s₁ = john ⊕ runs | |
s₂ : Repr (Sem 3) | |
s₂ = john ⊕ likes ⊕ mary | |
s₃ : Repr (Sem 3) | |
s₃ = somebody ⊕ likes ⊕ bill | |
s₄ : Repr (Sem 3) | |
s₄ = mary ⊕ likes ⊕ somebody | |
\end{code} | |
TODO: abstract over the multitude of `apply-*` and `compose-*` principles, in | |
such a fashion that the logic of when composition is possible (due to | |
the tecto types) is only written once, and hidden from the implementations. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
title: CCG in Agda | |
author: Pepijn Kokke | |
date: 2013-09-01 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment