Last active
November 27, 2024 11:23
-
-
Save nitaku/3ff57e04251ca1ff241b to your computer and use it in GitHub Desktop.
World cloud intersection II
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
John Christopher "Johnny" Depp II (born June 9, 1963) is an American actor, producer, and musician. He has won the Golden Globe Award and Screen Actors Guild Award for Best Actor. He rose to prominence on the 1980s television series 21 Jump Street, becoming a teen idol. | |
Since then, Depp has taken on challenging and "larger-than-life" roles, starting with a supporting role in Oliver Stone's Vietnam War film Platoon in 1986, then playing the title character in the romantic dark fantasy Edward Scissorhands (1990). He later found box office success in the fantasy adventure film Sleepy Hollow (1999), the fantasy swashbuckler film Pirates of the Caribbean: The Curse of the Black Pearl (2003) and its sequels, the musical adventure film Charlie and the Chocolate Factory (2005), the fantasy film Alice in Wonderland (2010) and voicing the title character in the animated action comedy western Rango (2011). He has collaborated on eight films with director and friend Tim Burton. | |
Depp is regarded as one of the world's biggest movie stars. He has gained worldwide critical acclaim for his portrayals of such people as screenwriter-director Ed Wood in Ed Wood, undercover FBI agent Joseph D. Pistone in Donnie Brasco, "gonzo" journalist Hunter S. Thompson in Fear and Loathing in Las Vegas, cocaine kingpin George Jung in Blow, Peter Pan author J. M. Barrie in Finding Neverland, and the Depression Era outlaw John Dillinger in Michael Mann's Public Enemies. Films featuring Depp have grossed over $3.1 billion at the United States box office and over $7.6 billion worldwide. His most commercially successful films are the Pirates of the Caribbean films, which have grossed $3 billion; Alice in Wonderland which grossed $1 billion; Charlie and the Chocolate Factory which grossed $474 million; and The Tourist which grossed $278 million worldwide. | |
Depp has been nominated for major acting awards, including three nominations for Academy Award for Best Actor. Depp won the Golden Globe Award for Best Actor – Motion Picture Musical or Comedy for Sweeney Todd: The Demon Barber of Fleet Street and the Screen Actors Guild Award for Outstanding Performance by a Male Actor in a Leading Role for Pirates of the Caribbean: The Curse of the Black Pearl. He has been listed in the 2012 Guinness World Records as the highest paid actor, with earnings of $75 million. Depp was inducted as a Disney Legend in 2015. | |
Depp was born in Owensboro, Kentucky, the youngest of four children of Betty Sue Palmer (née Wells), a waitress, and John Christopher Depp, a civil engineer. Depp moved frequently during his childhood. He and his siblings lived in more than 20 different places, eventually settling in Miramar, Florida in 1970. In 1978, when he was 15, Depp's parents divorced. His mother remarried to Robert Palmer (died 2000), whom Depp has called "an inspiration to me". | |
With the gift of a guitar from his mother when he was 12, Depp began playing in various garage bands. A year after his parents' divorce, Depp dropped out of high school to become a rock musician. He attempted to go back to school two weeks later, but the principal told him to follow his dream of being a musician. He played with The Kids, a band that enjoyed modest local success. The Kids set out together for Los Angeles in pursuit of a record deal, changing their name to Six Gun Method, but the group split up before signing a record deal. Depp subsequently collaborated with the band Rock City Angels and co-wrote their song "Mary", which appeared on Rock City Angels' debut Geffen Records album, Young Man's Blues. | |
On December 20, 1983 Depp married Lori Anne Allison, the sister of his band's bass player and singer. During their marriage she worked as a makeup artist while he worked a variety of odd jobs, including a telemarketer for pens. His wife introduced him to actor Nicolas Cage, who advised Depp to pursue an acting career. Depp and his wife divorced in 1985. Both Depp and his subsequent fiancée Sherilyn Fenn auditioned for the 1986 film Thrashin'. They were both cast, with Depp being chosen by the film's director to star as the lead, which would have been Depp's second major role. Depp was later turned down by the film's producer, who rejected the director's decision. | |
Depp's first major role was in the 1984 classic horror film A Nightmare on Elm Street, as the boyfriend of heroine Nancy Thompson (played by Heather Langenkamp) and one of Freddy Krueger's victims. The director of the 1986 American skating drama Thrashin' then cast Depp for the film's lead role; however, his decision was later overridden by the film's producer. In 1986, Depp appeared in a secondary role as a Vietnamese-speaking private in Oliver Stone's Platoon. Depp's first release in 1990 was Cry-Baby. Although the film did not achieve high audience numbers upon its initial release, over the years it has gained a cult classic status. Depp's next release that year saw him undertake the quirky title role of Tim Burton's film Edward Scissorhands, a critical and commercial success that established Depp as leading Hollywood actor and began his long association with Burton. | |
In 1994, Depp played the title role in Tim Burton's comedy-drama biographical film, Ed Wood, about one of history's most inept film directors. It received immense critical acclaim, with Janet Maslin of The New York Times writing that Depp had "proved himself as an established, certified great actor" and "captured all the can-do optimism that kept Ed Wood going, thanks to an extremely funny ability to look at the silver lining of any cloud." Depp was nominated for Golden Globe Award for Best Actor – Motion Picture Musical or Comedy for his performance. | |
In 1995 Depp starred in three films. He played opposite Marlon Brando in the box-office hit Don Juan DeMarco, as a man who believes he is Don Juan, the world's greatest lover. He next appeared in Dead Man, a Western shot entirely in black-and-white; it did poor business and had mixed critical reviews. Depp then appeared in the financial and critical failure Nick of Time, playing an accountant who is told to kill a politician to save his kidnapped daughter. | |
The 2003 Walt Disney Pictures film Pirates of the Caribbean: The Curse of the Black Pearl was a major success, in which Depp's performance as the suave but shambling pirate Captain Jack Sparrow was highly praised. Studio bosses were more ambivalent at first, but the character became popular with the movie-going public. According to a survey taken by Fandango, Depp was a major draw for audiences. The film's director, Gore Verbinski, has said that Depp's character closely resembles the actor's personality, but Depp said he modeled the character after The Rolling Stones' guitarist Keith Richards and cartoon skunk Pepé Le Pew. Depp was nominated for an Academy Award for Best Actor for the role. | |
In 2004, Depp was again nominated for the Best Actor Academy Award for his performance as Scottish author J. M. Barrie in the film Finding Neverland. He next starred as Willy Wonka in 2005's Charlie and the Chocolate Factory, a major box-office success that earned him a nomination for the Golden Globe Award for Best Actor in a Musical or Comedy. | |
Depp reprised the role of Jack Sparrow in the Pirates sequels Dead Man's Chest (2006), At World's End (2007) and On Stranger Tides (2011), each of which were major successes as well. Depp has said that Sparrow is "definitely a big part of me", and he even voiced the character in the video game Pirates of the Caribbean: The Legend of Jack Sparrow. | |
The swashbuckling sword talents Depp acquired for his role as Sparrow were highlighted in the documentary film Reclaiming the Blade. In the film, swordmaster Bob Anderson shared his experiences working with Depp on the choreography for The Curse of the Black Pearl, and described Depp's abilities as a sword-wielding actor to be "about as good as you can get." | |
Depp and Gore Verbinski were executive producers of the album Rogues Gallery, Pirate Ballads, Sea Songs and Chanteys. Depp played the title role of Sweeney Todd in Tim Burton's film adaptation of the musical, for which he won a Golden Globe Award for Best Actor – Motion Picture Musical or Comedy. Depp thanked the Hollywood Foreign Press Association and praised Tim Burton for his "unwavering trust and support." | |
In director Terry Gilliam's 2009 film The Imaginarium of Doctor Parnassus, Depp, Jude Law and Colin Farrell each played the character initially portrayed by their friend Heath Ledger, who died before the film was completed. All three actors gave their salaries to Ledger's daughter, Matilda. | |
Depp played the Mad Hatter in Burton's 2010 re-imagining of Alice in Wonderland, and the protagonist in the 2011 animated film Rango. | |
Depp played convicted Boston crime boss Whitey Bulger in director Scott Cooper's Black Mass (2015). | |
Depp has collaborated with director and close friend Tim Burton in films, beginning with Edward Scissorhands (1990), opposite Winona Ryder and Vincent Price. His next role with Burton was in the 1994 film Ed Wood. Depp later said that "within 10 minutes of hearing about the project, I was committed." At the time, the actor was depressed about films and filmmaking. This part gave him a "chance to stretch out and have some fun"; he said working with Martin Landau "rejuvenated my love for acting". Producer Scott Rudin once said "Basically Johnny Depp is playing Tim Burton in all his movies", although Burton personally disapproved of the comment. Depp, however, agrees with Rudin's statement. According to Depp, Edward Scissorhands represented Burton's inability to communicate as a teenager. Ed Wood reflected Burton's relationship with Vincent Price (very similar to Edward D. Wood, Jr. and Bela Lugosi). | |
Depp's next venture with Burton was the role of Ichabod Crane in Sleepy Hollow (1999), opposite Christina Ricci. Sleepy Hollow reflected Burton's battle with the Hollywood studio system. For his performance, Depp took inspiration from Angela Lansbury, Roddy McDowall and Basil Rathbone. Depp stated, "I always thought of Ichabod as a very delicate, fragile person who was maybe a little too in touch with his feminine side, like a frightened little girl." | |
Depp did not work with Burton again until 2005 in Charlie and the Chocolate Factory, in which he played Willy Wonka. The film was a box office success and received positive critical reception. Gene Wilder, who played Willy Wonka in the 1971 film, initially criticized this version. Charlie and the Chocolate Factory was released in July, followed by Corpse Bride, for which Depp voiced the character Victor Van Dort, in September. | |
Sweeney Todd: The Demon Barber of Fleet Street (2007) followed, bringing Depp his second major award win, the Golden Globe Award for Best Actor – Motion Picture Musical or Comedy as well as his third nomination for the Academy Award for Best Actor. Burton first gave him an original cast recording of the 1979 stage musical in 2000. Although not a fan of the musical genre, Depp grew to like the tale's treatment. He cited Peter Lorre in Mad Love (1935) as his main influence for the role, and practiced the songs his character would perform while filming Pirates of the Caribbean: At World's End. Although he had performed in musical groups, Depp was initially unsure that he would be able to sustain Stephen Sondheim's lyrics. Depp recorded demos and worked with Bruce Witkin to shape his vocals without a qualified voice coach. In the DVD Reviews section, Entertainment Weekly's Chris Nashawaty gave the film an A minus, stating, "Depp's soaring voice makes you wonder what other tricks he's been hiding ... Watching Depp's barber wield his razors ... it's hard not to be reminded of Edward Scissorhands frantically shaping hedges into animal topiaries 18 years ago ... and all of the twisted beauty we would've missed out on had [Burton and Depp] never met." In his introduction to Burton on Burton, a book of interviews with the director, Depp called Burton "... a brother, a friend, ... and [a] brave soul". The next Depp-Burton collaboration was Alice in Wonderland (2010). Depp played the Mad Hatter alongside Helena Bonham Carter, Anne Hathaway and Alan Rickman. In 2012, he starred in the Burton-directed Dark Shadows, a film based on the 1966–1971 gothic soap opera of the same name, alongside fellow Tim Burton regular Helena Bonham Carter, as well as Michelle Pfeiffer and Eva Green. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Daniel Robert "Danny" Elfman (born May 29, 1953) is an American composer, singer, songwriter, and record producer. He is known as the lead singer and songwriter for the rock band Oingo Boingo, from 1976 to 1995 and later for scoring music for television and film and creating The Simpsons main title theme as well as the 1989 Batman film theme. He has scored the majority of his long-time friend Tim Burton's films. | |
Elfman re-entered the film industry in 1976, initially as an actor. He made his film scoring debut in 1982 for the film Forbidden Zone directed by his older brother Richard Elfman. He has since been nominated for four Academy Awards and won a Grammy Award for Best Instrumental Composition Written for a Motion Picture, Television or Other Visual Media for Tim Burton's Batman and an Emmy Award for his Desperate Housewives theme. Elfman was honored with the Richard Kirk Award at the 2002 BMI Film and TV Awards; the award is given annually to a composer who has made significant contributions to film and television music. He was also inducted as a Disney Legend in 2015. | |
Danny Elfman was born in Los Angeles, California, into a Jewish family. He is the son of Blossom Elfman (née Bernstein), a writer and teacher, and Milton Elfman, a teacher who was in the Air Force. He was raised in a racially mixed community in the Baldwin Hills area of Los Angeles. He spent much of his time in the local movie theatre, adoring the music of such film composers as Bernard Herrmann and Franz Waxman. Stating that he hung out with the "band geeks" in high school, he started a ska band. After dropping out of high school, he followed his brother Richard to France, where he performed with Le Grand Magic Circus, an avant-garde musical theater group. Violin in tow, Elfman next journeyed to Africa where he traveled through Ghana, Mali, and Upper Volta, absorbing new musical styles, including the Ghanaian highlife genre which would eventually influence his own music.[citation needed] | |
He contracted malaria during his one-year stay and was often sick. Eventually he returned home to the United States, where he began to take Balinese music lessons at CalArts. During this time, he was romantically involved with Kim Gordon, who would later go on to form Sonic Youth. He was never officially a student at the institute; nonetheless, the instructor encouraged him to continue learning. Elfman stated, "He just laughed, and said, 'Sit. Play.' I continued to sit and play for a couple years." At this time, his brother was forming a new musical theater group. | |
In 1972 Richard Elfman founded the American new wave band/performance art group, originally called The Mystic Knights of the Oingo Boingo. They played several shows throughout the 1970s until Richard Elfman left the band to become a filmmaker. As a send-off to the band's original concept, Richard Elfman created the film Forbidden Zone based on their stage performances. Danny Elfman composed his first score for the film and played the role of Satan (the other band members played his minions). By the time the movie was completed, they had taken the name Oingo Boingo and begun recording and touring as a rock group. From 1976 and on, it was led by Danny Elfman, until 1995 when they suddenly retired. The semi-theatrical music and comedy troupe had transformed into a ska-influenced new wave band in 1979, and then changed again towards a more guitar-oriented rock sound, in the late 1980s.[citation needed]. Oingo Boingo, still led by Danny Elfman, performed as themselves in the 1986 movie Back to School. | |
In 1985, Tim Burton and Paul Reubens invited Elfman to write the score for their first feature film, Pee-wee's Big Adventure. Elfman was apprehensive at first because of his lack of formal training, but with orchestration assistance from Oingo Boingo guitarist and arranger Steve Bartek, he achieved his goal of emulating the mood of such composers as Nino Rota and Bernard Herrmann. In the booklet for the first volume of Music for a Darkened Theatre, Elfman described the first time he heard his music played by a full orchestra as one of the most thrilling experiences of his life. Elfman immediately developed a rapport with Burton and has gone on to score all but two of Burton's major studio releases: Ed Wood which was under production while Elfman and Burton were having a serious disagreement, and Sweeney Todd. Elfman also provided the singing voice for Jack Skellington in Tim Burton's The Nightmare Before Christmas and the voices of both Barrel and the "Clown with the Tear-Away Face". Years later he provided the voice for Bonejangles the skeleton in Corpse Bride. | |
Burton has said of his relationship with Elfman: "We don't even have to talk about the music. We don't even have to intellectualize – which is good for both of us, we're both similar that way. We're very lucky to connect" (Breskin, 1997). | |
Modern classicist composers, including Béla Bartók, Philip Glass, Lou Harrison, Carl Orff, Harry Partch, Sergei Prokofiev, Maurice Ravel, Erik Satie, Igor Stravinsky, and Pyotr Ilyich Tchaikovsky have influenced the style of Elfman's music. Elfman cited his first time noticing film music being when he heard Bernard Hermann's score to The Day the Earth Stood Still as an eleven-year-old and being a fan of film music since then. Other influences based in film music include Erich Wolfgang Korngold, Max Steiner, David Tamkin, and Franz Waxman. Also, Nino Rota served as a significant influence and was the main inspiration for Elfman's score to Pee-wee's Big Adventure. | |
When asked during a 2007 phone-in interview on XETRA-FM if he ever had any notions of performing in an Oingo Boingo reunion, Elfman immediately rejected the idea and stated that in the last few years with the band he had begun to develop significant and irreversible hearing damage as a result of his continuous exposure to the high noise levels involved in performing in a rock band. He went on to say that he believes his hearing damage is partially due to a genetic predisposition to hearing loss, and that he will never return to the stage for fear of worsening not only his condition but also that of his band mates. | |
Elfman recently composed the music for the Cirque du Soleil Show Iris, which was performed at the Dolby Theatre in Hollywood. The production began on July 21, 2011, and ended on January 19, 2013. This is Elfman's most significant non-film work since he composed Serenada Schizophrana for the American Composers Orchestra. It was conducted by John Mauceri on its recording and by Steven Sloane at its premiere at Carnegie Hall in New York City on February 23, 2005. After its premiere, it was recorded in studio and released onto SACD on October 3, 2006. The meeting with Mauceri proved fruitful as the composer was encouraged then to write a new concert piece for Mauceri and the Hollywood Bowl Orchestra. Elfman composed an "overture to a non-existent musical" and called the piece "The Overeager Overture". He also continues to compose his film scores in addition to these other projects. In November 2010, it was reported that Danny Elfman is writing the music for a planned musical based on the life of Harry Houdini. But, as of January 2012, he was no longer attached to the project. | |
In October 2013, Elfman returned to the stage to sing his vocal parts to a handful of Nightmare Before Christmas songs as part of a concert titled Danny Elfman's Music from the Films of Tim Burton. He composed the film score for Oz the Great and Powerful (2013), and composed additional music for Avengers: Age of Ultron (2015). | |
Elfman has three children: Lola (born 1979), Mali (born 1984), and Oliver (born 2005). On November 29, 2003, he married actress Bridget Fonda. In 1997, he scored A Simple Plan, his only score for one of her films to date (although he did compose a cue for the film Army of Darkness, in which Fonda has a cameo). In the late 1960s and early 1970s, he dated Sonic Youth's Kim Gordon. | |
He is the uncle of actor Bodhi Elfman, who is married to actress Jenna Elfman. | |
Describing his politics during the 1980s, Elfman said, "I'm not a doomist. My attitude is always to be critical of what's around you, but not ever to forget how lucky we are. I've traveled around the world. I left thinking I was a revolutionary. I came back real right-wing patriotic. Since then, I've kind of mellowed in between." In 2008, he expressed support for Barack Obama and said that Sarah Palin was his "worst nightmare". | |
Elfman's scores for Batman and Edward Scissorhands were nominated for AFI's 100 Years of Film Scores. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
word | |
danny | |
johnny | |
elfman's | |
depp | |
depp's | |
elfman | |
will | |
one | |
two | |
three | |
four | |
five | |
six | |
seven | |
eight | |
nine | |
ten | |
- | |
a | |
able | |
about | |
above | |
abst | |
accordance | |
according | |
accordingly | |
across | |
act | |
actually | |
added | |
adj | |
affected | |
affecting | |
affects | |
after | |
afterwards | |
again | |
against | |
ah | |
all | |
almost | |
alone | |
along | |
already | |
also | |
although | |
always | |
am | |
among | |
amongst | |
an | |
and | |
announce | |
another | |
any | |
anybody | |
anyhow | |
anymore | |
anyone | |
anything | |
anyway | |
anyways | |
anywhere | |
apparently | |
approximately | |
are | |
aren | |
arent | |
aren't | |
arise | |
around | |
as | |
aside | |
ask | |
asking | |
at | |
auth | |
available | |
away | |
awfully | |
b | |
back | |
be | |
became | |
because | |
become | |
becomes | |
becoming | |
been | |
before | |
beforehand | |
begin | |
beginning | |
beginnings | |
begins | |
behind | |
being | |
believe | |
below | |
beside | |
besides | |
between | |
beyond | |
biol | |
both | |
brief | |
briefly | |
but | |
by | |
c | |
ca | |
came | |
can | |
cannot | |
can't | |
cause | |
causes | |
certain | |
certainly | |
co | |
com | |
come | |
comes | |
contain | |
containing | |
contains | |
could | |
couldnt | |
couldn't | |
d | |
date | |
did | |
didn't | |
do | |
does | |
doesnt | |
doesn't | |
doing | |
done | |
dont | |
don't | |
down | |
downwards | |
due | |
during | |
e | |
each | |
ed | |
edu | |
effect | |
eg | |
eight | |
eighty | |
either | |
else | |
elsewhere | |
end | |
ending | |
enough | |
especially | |
et | |
et-al | |
etc | |
even | |
evenly | |
ever | |
every | |
everybody | |
everyone | |
everything | |
everywhere | |
except | |
f | |
far | |
few | |
ff | |
following | |
follows | |
for | |
former | |
formerly | |
found | |
from | |
further | |
furthermore | |
g | |
gave | |
get | |
gets | |
getting | |
give | |
given | |
gives | |
giving | |
go | |
goes | |
gone | |
got | |
gotten | |
h | |
had | |
happens | |
hardly | |
has | |
hasnt | |
hasn't | |
have | |
havent | |
haven't | |
having | |
he | |
hed | |
he'd | |
he'll | |
hence | |
her | |
here | |
hereafter | |
hereby | |
herein | |
heres | |
hereupon | |
hers | |
herself | |
hes | |
he's | |
hi | |
him | |
himself | |
his | |
how | |
how's | |
howbeit | |
however | |
i | |
id | |
i'd | |
ie | |
if | |
i'll | |
im | |
i'm | |
immediate | |
immediately | |
in | |
inc | |
indeed | |
index | |
instead | |
into | |
inward | |
is | |
isnt | |
isn't | |
it | |
itd | |
it'd | |
itll | |
it'll | |
its | |
it's | |
itself | |
ive | |
i've | |
j | |
just | |
k | |
keep | |
keeps | |
kept | |
know | |
known | |
knows | |
l | |
largely | |
last | |
lately | |
later | |
latter | |
latterly | |
least | |
less | |
lest | |
let | |
lets | |
like | |
liked | |
likely | |
line | |
little | |
'll | |
look | |
looking | |
looks | |
ltd | |
m | |
made | |
mainly | |
make | |
makes | |
many | |
may | |
maybe | |
me | |
mean | |
means | |
meantime | |
meanwhile | |
merely | |
mg | |
might | |
million | |
miss | |
ml | |
more | |
moreover | |
most | |
mostly | |
mr | |
mrs | |
much | |
mug | |
must | |
my | |
myself | |
n | |
na | |
name | |
namely | |
nay | |
nd | |
near | |
nearly | |
necessarily | |
necessary | |
need | |
needs | |
neither | |
never | |
nevertheless | |
new | |
next | |
no | |
nobody | |
non | |
none | |
nonetheless | |
noone | |
nor | |
normally | |
nos | |
not | |
noted | |
nothing | |
now | |
nowhere | |
o | |
obviously | |
of | |
off | |
often | |
oh | |
ok | |
okay | |
on | |
once | |
ones | |
one's | |
only | |
onto | |
or | |
ord | |
other | |
others | |
otherwise | |
ought | |
our | |
ours | |
ourselves | |
out | |
outside | |
over | |
overall | |
owing | |
own | |
p | |
part | |
particular | |
particularly | |
past | |
per | |
perhaps | |
please | |
plus | |
possible | |
possibly | |
potentially | |
pp | |
previously | |
primarily | |
probably | |
promptly | |
put | |
q | |
que | |
quickly | |
quite | |
qv | |
r | |
rather | |
rd | |
re | |
're | |
readily | |
really | |
recent | |
recently | |
ref | |
refs | |
regarding | |
regardless | |
regards | |
related | |
relatively | |
respectively | |
resulted | |
resulting | |
retweet | |
rt | |
s | |
's | |
said | |
same | |
saw | |
say | |
saying | |
says | |
sec | |
seem | |
seemed | |
seeming | |
seems | |
seen | |
self | |
selves | |
sent | |
several | |
shall | |
she | |
she'd | |
she'll | |
shes | |
she's | |
should | |
shouldn't | |
showed | |
shown | |
showns | |
shows | |
significant | |
significantly | |
similar | |
similarly | |
since | |
slightly | |
so | |
some | |
somebody | |
somehow | |
someone | |
somethan | |
something | |
sometime | |
sometimes | |
somewhat | |
somewhere | |
soon | |
sorry | |
specifically | |
specified | |
specify | |
specifying | |
still | |
stop | |
strongly | |
sub | |
substantially | |
successfully | |
such | |
sufficiently | |
sup | |
'sup | |
sure | |
t | |
take | |
taken | |
taking | |
tell | |
tends | |
th | |
than | |
thank | |
thanks | |
thanx | |
that | |
that'll | |
thats | |
that's | |
that've | |
the | |
their | |
theirs | |
them | |
themselves | |
then | |
thence | |
there | |
thereafter | |
thereby | |
thered | |
there'd | |
therefore | |
therein | |
there'll | |
thereof | |
therere | |
there're | |
theres | |
there's | |
thereto | |
thereupon | |
there've | |
these | |
they | |
theyd | |
they'd | |
they'll | |
theyre | |
they're | |
theyve | |
they've | |
think | |
thinks | |
this | |
those | |
thou | |
though | |
thoughh | |
thousand | |
throug | |
through | |
throughout | |
thru | |
thus | |
til | |
to | |
together | |
too | |
took | |
tooks | |
toward | |
towards | |
tried | |
tries | |
truly | |
try | |
trying | |
ts | |
twice | |
u | |
un | |
under | |
unfortunately | |
unless | |
unlike | |
unlikely | |
until | |
unto | |
up | |
upon | |
ups | |
us | |
use | |
used | |
useful | |
usefully | |
usefulness | |
uses | |
using | |
usually | |
v | |
value | |
various | |
've | |
very | |
via | |
viz | |
vol | |
vols | |
vs | |
w | |
want | |
wants | |
was | |
wasnt | |
wasn't | |
way | |
we | |
wed | |
we'd | |
welcome | |
we'll | |
well | |
went | |
were | |
we're | |
weren't | |
we've | |
what | |
whatever | |
what'll | |
whats | |
what's | |
when | |
whence | |
whenever | |
where | |
whereafter | |
whereas | |
whereby | |
wherein | |
wheres | |
where's | |
whereupon | |
wherever | |
whether | |
which | |
while | |
whim | |
who | |
whod | |
who'd | |
whoever | |
whole | |
who'll | |
whom | |
whomever | |
whos | |
who's | |
whose | |
why | |
widely | |
willing | |
wish | |
with | |
within | |
without | |
won't | |
words | |
world | |
would | |
wouldn't | |
x | |
y | |
yes | |
yet | |
you | |
youd | |
you'd | |
youll | |
you'll | |
your | |
you're | |
youre | |
yours | |
yourself | |
yourselves | |
youve | |
you've | |
z |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
svg = d3.select('svg') | |
width = svg.node().getBoundingClientRect().width | |
height = svg.node().getBoundingClientRect().height | |
treemap = d3.layout.treemap() | |
.size([width, height]) | |
.value((node) -> node.count) | |
.sort((a,b) -> | |
return +1 if a.name is 'a' or b.name is 'b' | |
return -1 if a.name is 'b' or b.name is 'a' | |
return a.count-b.count | |
) | |
.ratio(3) | |
.padding((node) -> | |
if node.depth is 0 | |
return [0,0,40,0] # make room for set labels | |
else if node.depth is 1 | |
return 4 | |
else | |
return 0 | |
) | |
.round(false) # bugfix: d3 wrong ordering | |
correct_x = d3.scale.linear() | |
.domain([0, width]) | |
.range([0, width*1.05]) | |
correct_y = d3.scale.linear() | |
.domain([0, height]) | |
.range([0, height*3/4]) | |
# define a stable color scale to differentiate words and sets | |
color = (txt, set) -> | |
iset = {'a': 0, 'intersection': 1, 'b': 2}[set] | |
Math.seedrandom(txt+'abcdef') | |
noise = (W) -> Math.random()*W - W/2 | |
d3.hcl(45+iset*90+noise(90), 40, 50) | |
# translate the viewBox to have (0,0) at the center of the vis | |
svg | |
.attr | |
viewBox: "#{-width/2} #{-height/2} #{width} #{height}" | |
# append a group for zoomable content | |
zoomable_layer = svg.append('g') | |
# define a zoom behavior | |
zoom = d3.behavior.zoom() | |
.scaleExtent([1,10]) # min-max zoom | |
.on 'zoom', () -> | |
# GEOMETRIC ZOOM | |
zoomable_layer | |
.attr | |
transform: "translate(#{zoom.translate()})scale(#{zoom.scale()})" | |
# bind the zoom behavior to the main SVG | |
svg.call(zoom) | |
# group the visualization | |
vis = zoomable_layer.append('g') | |
.attr | |
transform: "translate(#{-width/2},#{-height/2})" | |
d3.csv 'english_stopwords_long.txt', (stopwords_array) -> | |
# build an index of stopwords | |
stopwords = {} | |
stopwords_array.forEach (w) -> stopwords[w.word] = true | |
d3.text 'depp.txt', (infovis_txt) -> | |
data_a = nlp.ngram(infovis_txt, {min_count: 1, max_size: 1})[0].filter (w) -> w.word not of stopwords | |
index_a = {} | |
data_a.forEach (d) -> | |
index_a[d.word] = d | |
d3.text 'elfman.txt', (hci_txt) -> | |
data_b = nlp.ngram(hci_txt, {min_count: 1, max_size: 1})[0].filter (w) -> w.word not of stopwords | |
index_b = {} | |
data_b.forEach (d) -> | |
index_b[d.word] = d | |
diff_a = data_a.filter (a) -> a.word not of index_b | |
diff_b = data_b.filter (b) -> b.word not of index_a | |
intersection = [] | |
data_a.forEach (a) -> | |
data_b.forEach (b) -> | |
if a.word is b.word | |
min = Math.min(a.count, b.count) | |
intersection.push {word: a.word, count: min} | |
if a.count-min > 0 | |
diff_a.push {word: a.word, count: a.count-min} | |
if b.count-min > 0 | |
diff_b.push {word: b.word, count: b.count-min} | |
a = { | |
children: (diff_a.filter (d) -> d.count > 1), | |
name: "a" | |
} | |
intersection = { | |
children: (intersection.filter (d) -> d.count > 1), | |
name: "intersection" | |
} | |
b = { | |
children: (diff_b.filter (d) -> d.count > 1), | |
name: "b" | |
} | |
tree = { | |
children: [a,intersection,b], | |
name: "root" | |
} | |
nodes_data = treemap.nodes(tree) | |
labels = vis.selectAll('.label') | |
.data(nodes_data.filter((node) -> node.depth is 2)) | |
enter_labels = labels.enter().append('svg') | |
.attr | |
class: 'label' | |
enter_labels.append('text') | |
.text((node) -> node.word.toUpperCase()) | |
.attr | |
dy: '0.35em' | |
fill: (node) -> color(node.word, node.parent.name) | |
.each (node) -> | |
bbox = this.getBBox() | |
bbox_aspect = bbox.width / bbox.height | |
node_bbox = {width: node.dx, height: node.dy} | |
node_bbox_aspect = node_bbox.width / node_bbox.height | |
rotate = bbox_aspect >= 1 and node_bbox_aspect < 1 or bbox_aspect < 1 and node_bbox_aspect >= 1 | |
node.label_bbox = { | |
x: bbox.x+(bbox.width-correct_x(bbox.width))/2, | |
y: bbox.y+(bbox.height-correct_y(bbox.height))/2, | |
width: correct_x(bbox.width), | |
height: correct_y(bbox.height) | |
} | |
if rotate | |
node.label_bbox = { | |
x: node.label_bbox.y, | |
y: node.label_bbox.x, | |
width: node.label_bbox.height, | |
height: node.label_bbox.width | |
} | |
d3.select(this).attr('transform', 'rotate(-90)') | |
enter_labels | |
.attr | |
x: (node) -> node.x | |
y: (node) -> node.y | |
width: (node) -> node.dx | |
height: (node) -> node.dy | |
viewBox: (node) -> "#{node.label_bbox.x} #{node.label_bbox.y} #{node.label_bbox.width} #{node.label_bbox.height}" | |
preserveAspectRatio: 'none' | |
# draw set labels | |
vis.append('text') | |
.text('Johnny Depp') | |
.attr | |
class: 'set_label' | |
x: a.x + a.dx/2 | |
y: height - 22 | |
dy: '0.35em' | |
vis.append('text') | |
.text('D ∩ E') | |
.attr | |
class: 'set_label' | |
x: intersection.x + intersection.dx/2 | |
y: height - 22 | |
dy: '0.35em' | |
vis.append('text') | |
.text('Danny Elfman') | |
.attr | |
class: 'set_label' | |
x: b.x + b.dx/2 | |
y: height - 22 | |
dy: '0.35em' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
svg { | |
background: white; | |
} | |
.node { | |
shape-rendering: crispEdges; | |
vector-effect: non-scaling-stroke; | |
stroke: white; | |
stroke-width: 2; | |
} | |
.label { | |
pointer-events: none; | |
text-anchor: middle; | |
font-family: Impact; | |
} | |
.set_label { | |
fill: #444; | |
font-family: serif; | |
font-size: 26px; | |
text-anchor: middle; | |
font-weight: bold; | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!DOCTYPE html> | |
<html> | |
<head> | |
<meta charset="utf-8"> | |
<script src="http://davidbau.com/encode/seedrandom-min.js"></script> | |
<script src="http://d3js.org/d3.v3.min.js"></script> | |
<script src="nlp.js"></script> | |
<link rel="stylesheet" type="text/css" href="index.css"> | |
<title>Word cloud intersection II</title> | |
</head> | |
<body> | |
<svg width="960px" height="500px"></svg> | |
<script src="index.js"></script> | |
</body> | |
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Generated by CoffeeScript 1.10.0 | |
(function() { | |
var color, correct_x, correct_y, height, svg, treemap, vis, width, zoom, zoomable_layer; | |
svg = d3.select('svg'); | |
width = svg.node().getBoundingClientRect().width; | |
height = svg.node().getBoundingClientRect().height; | |
treemap = d3.layout.treemap().size([width, height]).value(function(node) { | |
return node.count; | |
}).sort(function(a, b) { | |
if (a.name === 'a' || b.name === 'b') { | |
return +1; | |
} | |
if (a.name === 'b' || b.name === 'a') { | |
return -1; | |
} | |
return a.count - b.count; | |
}).ratio(3).padding(function(node) { | |
if (node.depth === 0) { | |
return [0, 0, 40, 0]; | |
} else if (node.depth === 1) { | |
return 4; | |
} else { | |
return 0; | |
} | |
}).round(false); | |
correct_x = d3.scale.linear().domain([0, width]).range([0, width * 1.05]); | |
correct_y = d3.scale.linear().domain([0, height]).range([0, height * 3 / 4]); | |
color = function(txt, set) { | |
var iset, noise; | |
iset = { | |
'a': 0, | |
'intersection': 1, | |
'b': 2 | |
}[set]; | |
Math.seedrandom(txt + 'abcdef'); | |
noise = function(W) { | |
return Math.random() * W - W / 2; | |
}; | |
return d3.hcl(45 + iset * 90 + noise(90), 40, 50); | |
}; | |
svg.attr({ | |
viewBox: (-width / 2) + " " + (-height / 2) + " " + width + " " + height | |
}); | |
zoomable_layer = svg.append('g'); | |
zoom = d3.behavior.zoom().scaleExtent([1, 10]).on('zoom', function() { | |
return zoomable_layer.attr({ | |
transform: "translate(" + (zoom.translate()) + ")scale(" + (zoom.scale()) + ")" | |
}); | |
}); | |
svg.call(zoom); | |
vis = zoomable_layer.append('g').attr({ | |
transform: "translate(" + (-width / 2) + "," + (-height / 2) + ")" | |
}); | |
d3.csv('english_stopwords_long.txt', function(stopwords_array) { | |
var stopwords; | |
stopwords = {}; | |
stopwords_array.forEach(function(w) { | |
return stopwords[w.word] = true; | |
}); | |
return d3.text('depp.txt', function(infovis_txt) { | |
var data_a, index_a; | |
data_a = nlp.ngram(infovis_txt, { | |
min_count: 1, | |
max_size: 1 | |
})[0].filter(function(w) { | |
return !(w.word in stopwords); | |
}); | |
index_a = {}; | |
data_a.forEach(function(d) { | |
return index_a[d.word] = d; | |
}); | |
return d3.text('elfman.txt', function(hci_txt) { | |
var a, b, data_b, diff_a, diff_b, enter_labels, index_b, intersection, labels, nodes_data, tree; | |
data_b = nlp.ngram(hci_txt, { | |
min_count: 1, | |
max_size: 1 | |
})[0].filter(function(w) { | |
return !(w.word in stopwords); | |
}); | |
index_b = {}; | |
data_b.forEach(function(d) { | |
return index_b[d.word] = d; | |
}); | |
diff_a = data_a.filter(function(a) { | |
return !(a.word in index_b); | |
}); | |
diff_b = data_b.filter(function(b) { | |
return !(b.word in index_a); | |
}); | |
intersection = []; | |
data_a.forEach(function(a) { | |
return data_b.forEach(function(b) { | |
var min; | |
if (a.word === b.word) { | |
min = Math.min(a.count, b.count); | |
intersection.push({ | |
word: a.word, | |
count: min | |
}); | |
if (a.count - min > 0) { | |
diff_a.push({ | |
word: a.word, | |
count: a.count - min | |
}); | |
} | |
if (b.count - min > 0) { | |
return diff_b.push({ | |
word: b.word, | |
count: b.count - min | |
}); | |
} | |
} | |
}); | |
}); | |
a = { | |
children: diff_a.filter(function(d) { | |
return d.count > 1; | |
}), | |
name: "a" | |
}; | |
intersection = { | |
children: intersection.filter(function(d) { | |
return d.count > 1; | |
}), | |
name: "intersection" | |
}; | |
b = { | |
children: diff_b.filter(function(d) { | |
return d.count > 1; | |
}), | |
name: "b" | |
}; | |
tree = { | |
children: [a, intersection, b], | |
name: "root" | |
}; | |
nodes_data = treemap.nodes(tree); | |
labels = vis.selectAll('.label').data(nodes_data.filter(function(node) { | |
return node.depth === 2; | |
})); | |
enter_labels = labels.enter().append('svg').attr({ | |
"class": 'label' | |
}); | |
enter_labels.append('text').text(function(node) { | |
return node.word.toUpperCase(); | |
}).attr({ | |
dy: '0.35em', | |
fill: function(node) { | |
return color(node.word, node.parent.name); | |
} | |
}).each(function(node) { | |
var bbox, bbox_aspect, node_bbox, node_bbox_aspect, rotate; | |
bbox = this.getBBox(); | |
bbox_aspect = bbox.width / bbox.height; | |
node_bbox = { | |
width: node.dx, | |
height: node.dy | |
}; | |
node_bbox_aspect = node_bbox.width / node_bbox.height; | |
rotate = bbox_aspect >= 1 && node_bbox_aspect < 1 || bbox_aspect < 1 && node_bbox_aspect >= 1; | |
node.label_bbox = { | |
x: bbox.x + (bbox.width - correct_x(bbox.width)) / 2, | |
y: bbox.y + (bbox.height - correct_y(bbox.height)) / 2, | |
width: correct_x(bbox.width), | |
height: correct_y(bbox.height) | |
}; | |
if (rotate) { | |
node.label_bbox = { | |
x: node.label_bbox.y, | |
y: node.label_bbox.x, | |
width: node.label_bbox.height, | |
height: node.label_bbox.width | |
}; | |
return d3.select(this).attr('transform', 'rotate(-90)'); | |
} | |
}); | |
enter_labels.attr({ | |
x: function(node) { | |
return node.x; | |
}, | |
y: function(node) { | |
return node.y; | |
}, | |
width: function(node) { | |
return node.dx; | |
}, | |
height: function(node) { | |
return node.dy; | |
}, | |
viewBox: function(node) { | |
return node.label_bbox.x + " " + node.label_bbox.y + " " + node.label_bbox.width + " " + node.label_bbox.height; | |
}, | |
preserveAspectRatio: 'none' | |
}); | |
vis.append('text').text('Johnny Depp').attr({ | |
"class": 'set_label', | |
x: a.x + a.dx / 2, | |
y: height - 22, | |
dy: '0.35em' | |
}); | |
vis.append('text').text('D ∩ E').attr({ | |
"class": 'set_label', | |
x: intersection.x + intersection.dx / 2, | |
y: height - 22, | |
dy: '0.35em' | |
}); | |
return vis.append('text').text('Danny Elfman').attr({ | |
"class": 'set_label', | |
x: b.x + b.dx / 2, | |
y: height - 22, | |
dy: '0.35em' | |
}); | |
}); | |
}); | |
}); | |
}).call(this); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(function e(t,n,r){function s(o,u){if(!n[o]){if(!t[o]){var a=typeof require=="function"&&require;if(!u&&a)return a(o,!0);if(i)return i(o,!0);var f=new Error("Cannot find module '"+o+"'");throw f.code="MODULE_NOT_FOUND",f}var l=n[o]={exports:{}};t[o][0].call(l.exports,function(e){var n=t[o][1][e];return s(n?n:e)},l,l.exports,e,t,n,r)}return n[o].exports}var i=typeof require=="function"&&require;for(var o=0;o<r.length;o++)s(r[o]);return s})({1:[function(require,module,exports){ | |
// nlp_comprimise by @spencermountain in 2014 | |
// most files are self-contained modules that optionally export for nodejs | |
// this file loads them all together | |
// if we're server-side, grab files, otherwise assume they're prepended already | |
// console.time('nlp_boot') | |
var parents = require("./src/parents/parents") | |
var sentence_parser = require('./src/methods/tokenization/sentence'); | |
var tokenize = require('./src/methods/tokenization/tokenize'); | |
var ngram = require('./src/methods/tokenization/ngram'); | |
//tokenize | |
var normalize = require('./src/methods/transliteration/unicode_normalisation') | |
var syllables = require('./src/methods/syllables/syllable'); | |
//localization | |
var americanize = require('./src/methods/localization/americanize') | |
var britishize = require('./src/methods/localization/britishize') | |
//part of speech tagging | |
var pos = require('./src/pos'); | |
//named_entity_recognition | |
var spot = require('./src/spot'); | |
/// | |
// define the api | |
var nlp = { | |
noun: parents.noun, | |
adjective: parents.adjective, | |
verb: parents.verb, | |
adverb: parents.adverb, | |
value: parents.value, | |
sentences: sentence_parser, | |
ngram: ngram, | |
tokenize: tokenize, | |
americanize: americanize, | |
britishize: britishize, | |
syllables: syllables, | |
normalize: normalize.normalize, | |
denormalize: normalize.denormalize, | |
pos: pos, | |
spot: spot | |
} | |
//export it for client-side | |
if (typeof window!=="undefined") { //is this right? | |
window.nlp = nlp | |
} | |
//export it for server-side | |
module.exports = nlp; | |
// console.timeEnd('nlp_boot') | |
// console.log( nlp.pos('she sells seashells by the seashore').sentences[0].negate().text() ) | |
// console.log( nlp.pos('i will slouch')); | |
// console.log( nlp.pos('Sally Davidson sells seashells by the seashore. Joe Biden said so.').people() ) | |
// console.log(nlp.pos("Tony Danza is great. He works in the bank.").sentences[1].tokens[0].analysis.reference_to()) | |
// console.log(nlp.pos("the FBI was hacked. He took their drugs.").sentences[1].tokens[2].analysis.reference_to()) | |
},{"./src/methods/localization/americanize":17,"./src/methods/localization/britishize":18,"./src/methods/syllables/syllable":19,"./src/methods/tokenization/ngram":20,"./src/methods/tokenization/sentence":21,"./src/methods/tokenization/tokenize":22,"./src/methods/transliteration/unicode_normalisation":23,"./src/parents/parents":35,"./src/pos":45,"./src/spot":48}],2:[function(require,module,exports){ | |
//the lexicon is a large hash of words and their predicted part-of-speech. | |
// it plays a bootstrap-role in pos tagging in this library. | |
// to save space, most of the list is derived from conjugation methods, | |
// and other forms are stored in a compact way | |
var multiples = require("./lexicon/multiples") | |
var values = require("./lexicon/values") | |
var demonyms = require("./lexicon/demonyms") | |
var abbreviations = require("./lexicon/abbreviations") | |
var honourifics = require("./lexicon/honourifics") | |
var uncountables = require("./lexicon/uncountables") | |
var firstnames = require("./lexicon/firstnames") | |
var irregular_nouns = require("./lexicon/irregular_nouns") | |
//verbs | |
var verbs = require("./lexicon/verbs") | |
var verb_conjugate = require("../parents/verb/conjugate/conjugate") | |
var verb_irregulars = require("../parents/verb/conjugate/verb_irregulars") | |
var phrasal_verbs = require("./lexicon/phrasal_verbs") | |
var adjectives = require("./lexicon/adjectives") | |
var adj_to_adv = require("../parents/adjective/conjugate/to_adverb") | |
var to_superlative = require("../parents/adjective/conjugate/to_superlative") | |
var to_comparative = require("../parents/adjective/conjugate/to_comparative") | |
var convertables = require("../parents/adjective/conjugate/convertables") | |
var main = { | |
"etc": "FW", //foreign words | |
"ie": "FW", | |
"there": "EX", | |
"better": "JJR", | |
"earlier": "JJR", | |
"has": "VB", | |
"more": "RBR", | |
"sounds": "VBZ" | |
} | |
var compact = { | |
//conjunctions | |
"CC": [ | |
"yet", | |
"therefore", | |
"or", | |
"while", | |
"nor", | |
"whether", | |
"though", | |
"because", | |
"but", | |
"for", | |
"and", | |
"if", | |
"however", | |
"before", | |
"although", | |
"how", | |
"plus", | |
"versus", | |
"not" | |
], | |
"VBD": [ | |
"where'd", | |
"when'd", | |
"how'd", | |
"what'd", | |
"said", | |
"had", | |
"been", | |
"began", | |
"came", | |
"did", | |
"meant", | |
"went" | |
], | |
"VBN": [ | |
"given", | |
"known", | |
"shown", | |
"seen", | |
"born", | |
], | |
"VBG": [ | |
"going", | |
"being", | |
"according", | |
"resulting", | |
"developing", | |
"staining" | |
], | |
//copula | |
"CP": [ | |
"is", | |
"will be", | |
"are", | |
"was", | |
"were", | |
"am", | |
"isn't", | |
"ain't", | |
"aren't" | |
], | |
//determiners | |
"DT": [ | |
"this", | |
"any", | |
"enough", | |
"each", | |
"whatever", | |
"every", | |
"which", | |
"these", | |
"another", | |
"plenty", | |
"whichever", | |
"neither", | |
"an", | |
"a", | |
"least", | |
"own", | |
"few", | |
"both", | |
"those", | |
"the", | |
"that", | |
"various", | |
"what", | |
"either", | |
"much", | |
"some", | |
"else", | |
"no", | |
//some other languages (what could go wrong?) | |
"la", | |
"le", | |
"les", | |
"des", | |
"de", | |
"du", | |
"el" | |
], | |
//prepositions | |
"IN": [ | |
"until", | |
"onto", | |
"of", | |
"into", | |
"out", | |
"except", | |
"across", | |
"by", | |
"between", | |
"at", | |
"down", | |
"as", | |
"from", | |
"around", | |
"with", | |
"among", | |
"upon", | |
"amid", | |
"to", | |
"along", | |
"since", | |
"about", | |
"off", | |
"on", | |
"within", | |
"in", | |
"during", | |
"per", | |
"without", | |
"throughout", | |
"through", | |
"than", | |
"via", | |
"up", | |
"unlike", | |
"despite", | |
"below", | |
"unless", | |
"towards", | |
"besides", | |
"after", | |
"whereas", | |
"'o", | |
"amidst", | |
"amongst", | |
"apropos", | |
"atop", | |
"barring", | |
"chez", | |
"circa", | |
"mid", | |
"midst", | |
"notwithstanding", | |
"qua", | |
"sans", | |
"vis-a-vis", | |
"thru", | |
"till", | |
"versus", | |
"without", | |
"w/o", | |
"o'", | |
"a'", | |
], | |
//modal verbs | |
"MD": [ | |
"can", | |
"may", | |
"could", | |
"might", | |
"will", | |
"ought to", | |
"would", | |
"must", | |
"shall", | |
"should", | |
"ought", | |
"shouldn't", | |
"wouldn't", | |
"couldn't", | |
"mustn't", | |
"shan't", | |
"shant", | |
"lets", //arguable | |
"who'd", | |
"let's" | |
], | |
//posessive pronouns | |
"PP": [ | |
"mine", | |
"something", | |
"none", | |
"anything", | |
"anyone", | |
"theirs", | |
"himself", | |
"ours", | |
"his", | |
"my", | |
"their", | |
"yours", | |
"your", | |
"our", | |
"its", | |
"nothing", | |
"herself", | |
"hers", | |
"themselves", | |
"everything", | |
"myself", | |
"itself", | |
"her", //this one is pretty ambiguous | |
"who", | |
"whom", | |
"whose" | |
], | |
//personal pronouns (nouns) | |
"PRP": [ | |
"it", | |
"they", | |
"i", | |
"them", | |
"you", | |
"she", | |
"me", | |
"he", | |
"him", | |
"ourselves", | |
"us", | |
"we", | |
"thou", | |
"il", | |
"elle", | |
"yourself", | |
"'em" | |
], | |
//some manual adverbs (the rest are generated) | |
"RB": [ | |
"now", | |
"again", | |
"already", | |
"soon", | |
"directly", | |
"toward", | |
"forever", | |
"apart", | |
"instead", | |
"yes", | |
"alone", | |
"ago", | |
"indeed", | |
"ever", | |
"quite", | |
"perhaps", | |
"where", | |
"then", | |
"here", | |
"thus", | |
"very", | |
"often", | |
"once", | |
"never", | |
"why", | |
"when", | |
"away", | |
"always", | |
"sometimes", | |
"also", | |
"maybe", | |
"so", | |
"just", | |
"well", | |
"several", | |
"such", | |
"randomly", | |
"too", | |
"rather", | |
"abroad", | |
"almost", | |
"anyway", | |
"twice", | |
"aside", | |
"moreover", | |
"anymore", | |
"newly", | |
"damn", | |
"somewhat", | |
"somehow", | |
"meanwhile", | |
"hence", | |
"further", | |
"furthermore" | |
], | |
//interjections | |
"UH": [ | |
"uhh", | |
"uh-oh", | |
"ugh", | |
"sheesh", | |
"eww", | |
"pff", | |
"voila", | |
"oy", | |
"eep", | |
"hurrah", | |
"yuck", | |
"ow", | |
"duh", | |
"oh", | |
"hmm", | |
"yeah", | |
"whoa", | |
"ooh", | |
"whee", | |
"ah", | |
"bah", | |
"gah", | |
"yaa", | |
"phew", | |
"gee", | |
"ahem", | |
"eek", | |
"meh", | |
"yahoo", | |
"oops", | |
"d'oh", | |
"psst", | |
"argh", | |
"grr", | |
"nah", | |
"shhh", | |
"whew", | |
"mmm", | |
"yay", | |
"uh-huh", | |
"boo", | |
"wow", | |
"nope" | |
], | |
//nouns that shouldnt be seen as a verb | |
"NN": [ | |
"president", | |
"dollar", | |
"student", | |
"patent", | |
"funding", | |
"morning", | |
"banking", | |
"ceiling", | |
"energy", | |
"secretary", | |
"purpose", | |
"friends", | |
"event" | |
] | |
} | |
//unpack the compact terms into the main lexicon.. | |
var i, arr; | |
var keys = Object.keys(compact) | |
var l = keys.length | |
for (i = 0; i < l; i++) { | |
arr = compact[keys[i]] | |
for (i2 = 0; i2 < arr.length; i2++) { | |
main[arr[i2]] = keys[i]; | |
} | |
} | |
//add values | |
keys = Object.keys(values) | |
l = keys.length | |
for (i = 0; i < l; i++) { | |
main[keys[i]] = "CD" | |
} | |
//add demonyms | |
l = demonyms.length | |
for (i = 0; i < l; i++) { | |
main[demonyms[i]] = "JJ" | |
} | |
//add abbreviations | |
l = abbreviations.length | |
for (i = 0; i < l; i++) { | |
main[abbreviations[i]] = "NNAB" | |
} | |
//add honourifics | |
l = honourifics.length | |
for (i = 0; i < l; i++) { | |
main[honourifics[i]] = "NNAB" | |
} | |
//add uncountable nouns | |
l = uncountables.length | |
for (i = 0; i < l; i++) { | |
main[uncountables[i]] = "NN" | |
} | |
//add irregular nouns | |
l = irregular_nouns.length | |
for (i = 0; i < l; i++) { | |
main[irregular_nouns[i][0]] = "NN" | |
main[irregular_nouns[i][1]] = "NNS" | |
} | |
//add firstnames | |
Object.keys(firstnames).forEach(function (k) { | |
main[k] = "NNP" | |
}) | |
//add multiple-word terms | |
Object.keys(multiples).forEach(function (k) { | |
main[k] = multiples[k] | |
}) | |
//add phrasal verbs | |
Object.keys(phrasal_verbs).forEach(function (k) { | |
main[k] = phrasal_verbs[k] | |
}) | |
//add verbs | |
//conjugate all verbs. takes ~8ms. triples the lexicon size. | |
var c; | |
l = verbs.length; | |
for (i = 0; i < l; i++) { | |
//add conjugations | |
c = verb_conjugate(verbs[i]) | |
main[c.infinitive] = main[c.infinitive] || "VBP" | |
main[c.past] = main[c.past] || "VBD" | |
main[c.gerund] = main[c.gerund] || "VBG" | |
main[c.present] = main[c.present] || "VBZ" | |
if (c.doer) { | |
main[c.doer] = main[c.doer] || "NNA" | |
} | |
if (c.participle) { | |
main[c.participle] = main[c.participle] || "VBN" | |
} | |
} | |
//add irregular verbs | |
l = verb_irregulars.length; | |
for (i = 0; i < l; i++) { | |
c = verb_irregulars[i] | |
main[c.infinitive] = main[c.infinitive] || "VBP" | |
main[c.gerund] = main[c.gerund] || "VBG" | |
main[c.past] = main[c.past] || "VBD" | |
main[c.present] = main[c.present] || "VBZ" | |
if (c.doer) { | |
main[c.doer] = main[c.doer] || "NNA" | |
} | |
if (c.participle) { | |
main[c.future] = main[c.future] || "VB" | |
} | |
} | |
//add adjectives | |
//conjugate all of these adjectives to their adverbs. (13ms) | |
var tmp, j; | |
l = adjectives.length; | |
for (i = 0; i < l; i++) { | |
main[adjectives[i]] = "JJ" | |
} | |
keys = Object.keys(convertables) | |
l = keys.length; | |
for (i = 0; i < l; i++) { | |
j = keys[i] | |
main[j] = "JJ" | |
//add adverb form | |
tmp = adj_to_adv(j) | |
if (tmp && tmp !== j && !main[tmp]) { | |
main[tmp] = main[tmp] || "RB" | |
} | |
//add comparative form | |
tmp = to_comparative(j) | |
if (tmp && !tmp.match(/^more ./) && tmp !== j && !main[tmp]) { | |
main[tmp] = main[tmp] || "JJR" | |
} | |
//add superlative form | |
tmp = to_superlative(j) | |
if (tmp && !tmp.match(/^most ./) && tmp !== j && !main[tmp]) { | |
main[tmp] = main[tmp] || "JJS" | |
} | |
} | |
module.exports = main; | |
// console.log(lexicon['once again']=="RB") | |
// console.log(lexicon['seven']=="CD") | |
// console.log(lexicon['sleep']=="VBP") | |
// console.log(lexicon['slept']=="VBD") | |
// console.log(lexicon['sleeping']=="VBG") | |
// // console.log(lexicon['completely']) | |
// console.log(lexicon['pretty']=="JJ") | |
// console.log(lexicon['canadian']=="JJ") | |
// console.log(lexicon['july']=="CD") | |
// console.log(lexicon[null]===undefined) | |
// console.log(lexicon["dr"]==="NNAB") | |
// console.log(lexicon["hope"]==="NN") | |
// console.log(lexicon["higher"]==="JJR") | |
// console.log(lexicon["earlier"]==="JJR") | |
// console.log(lexicon["larger"]==="JJR") | |
// console.log(lexicon["says"]==="VBZ") | |
// console.log(lexicon["sounds"]==="VBZ") | |
// console.log(lexicon["means"]==="VBZ") | |
// console.log(lexicon["look after"]==="VBP") | |
// console.log(Object.keys(lexicon).length) | |
// console.log(lexicon['prettier']=="JJR") | |
// console.log(lexicon['prettiest']=="JJS") | |
// console.log(lexicon['tony']=="NNP") | |
// console.log(lexicon['loaf']=="NN") | |
// console.log(lexicon['loaves']=="NNS") | |
// console.log(lexicon['he']=="PRP") | |
},{"../parents/adjective/conjugate/convertables":24,"../parents/adjective/conjugate/to_adverb":25,"../parents/adjective/conjugate/to_comparative":26,"../parents/adjective/conjugate/to_superlative":28,"../parents/verb/conjugate/conjugate":39,"../parents/verb/conjugate/verb_irregulars":42,"./lexicon/abbreviations":3,"./lexicon/adjectives":4,"./lexicon/demonyms":5,"./lexicon/firstnames":6,"./lexicon/honourifics":7,"./lexicon/irregular_nouns":8,"./lexicon/multiples":9,"./lexicon/phrasal_verbs":10,"./lexicon/uncountables":11,"./lexicon/values":12,"./lexicon/verbs":13}],3:[function(require,module,exports){ | |
//these are common word shortenings used in the lexicon and sentence segmentation methods | |
//there are all nouns, or at the least, belong beside one. | |
var honourifics = require("./honourifics") //stored seperately, for 'noun.is_person()' | |
var main = [ | |
//common abbreviations | |
"arc", "al", "ave", "blvd", "cl", "ct", "cres", "exp", "rd", "st", "dist", "mt", "ft", "fy", "hwy", "la", "pd", "pl", "plz", "tce", "vs", "etc", "esp", "llb", "md", "bl", "ma", "ba", "lit", "fl", "ex", "eg", "ie", | |
//place main | |
"ala", "ariz", "ark", "cal", "calif", "col", "colo", "conn", "del", "fed", "fla", "ga", "ida", "ind", "ia", "kan", "kans", "ken", "ky", "la", "md", "mich", "minn", "mont", "neb", "nebr", "nev", "okla", "penna", "penn", "pa", "dak", "tenn", "tex", "ut", "vt", "va", "wash", "wis", "wisc", "wy", "wyo", "usafa", "alta", "ont", "que", "sask", "yuk", "bc", | |
//org main | |
"dept", "univ", "assn", "bros", "inc", "ltd", "co", "corp", | |
//proper nouns with exclamation marks | |
"yahoo", "joomla", "jeopardy" | |
] | |
//person titles like 'jr', (stored seperately) | |
main = main.concat(honourifics) | |
module.exports = main; | |
},{"./honourifics":7}],4:[function(require,module,exports){ | |
//adjectives that either aren't covered by rules, or have superlative/comparative forms | |
//this list is the seed, from which various forms are conjugated | |
module.exports= [ | |
"colonial", | |
"moody", | |
"literal", | |
"actual", | |
"probable", | |
"apparent", | |
"usual", | |
"aberrant", | |
"ablaze", | |
"able", | |
"absolute", | |
"aboard", | |
"abrupt", | |
"absent", | |
"absorbing", | |
"abundant", | |
"accurate", | |
"adult", | |
"afraid", | |
"agonizing", | |
"ahead", | |
"aloof", | |
"amazing", | |
"arbitrary", | |
"arrogant", | |
"asleep", | |
"astonishing", | |
"average", | |
"awake", | |
"aware", | |
"awkward", | |
"back", | |
"bad", | |
"bankrupt", | |
"bawdy", | |
"beneficial", | |
"bent", | |
"best", | |
"better", | |
"bizarre", | |
"bloody", | |
"bouncy", | |
"brilliant", | |
"broken", | |
"burly", | |
"busy", | |
"cagey", | |
"careful", | |
"caring", | |
"certain", | |
"chief", | |
"chilly", | |
"civil", | |
"clever", | |
"closed", | |
"cloudy", | |
"colossal", | |
"commercial", | |
"common", | |
"complete", | |
"complex", | |
"concerned", | |
"concrete", | |
"congruent", | |
"constant", | |
"cooing", | |
"correct", | |
"cowardly", | |
"craven", | |
"cuddly", | |
"daily", | |
"damaged", | |
"damaging", | |
"dapper", | |
"dashing", | |
"deadpan", | |
"deeply", | |
"defiant", | |
"degenerate", | |
"delicate", | |
"delightful", | |
"desperate", | |
"determined", | |
"didactic", | |
"difficult", | |
"discreet", | |
"done", | |
"double", | |
"doubtful", | |
"downtown", | |
"dreary", | |
"east", | |
"eastern", | |
"elderly", | |
"elegant", | |
"elfin", | |
"elite", | |
"eminent", | |
"encouraging", | |
"entire", | |
"erect", | |
"ethereal", | |
"exact", | |
"expert", | |
"extra", | |
"exuberant", | |
"exultant", | |
"false", | |
"fancy", | |
"faulty", | |
"female", | |
"fertile", | |
"fierce ", | |
"financial", | |
"first", | |
"fit", | |
"fixed", | |
"flagrant", | |
"foamy", | |
"foolish", | |
"foregoing", | |
"foreign", | |
"former", | |
"fortunate", | |
"frantic", | |
"freezing", | |
"frequent", | |
"fretful", | |
"friendly", | |
"fun", | |
"furry", | |
"future", | |
"gainful", | |
"gaudy", | |
"giant", | |
"giddy", | |
"gigantic", | |
"gleaming", | |
"global", | |
"gold", | |
"gone", | |
"good", | |
"goofy", | |
"graceful", | |
"grateful", | |
"gratis", | |
"gray", | |
"grey", | |
"groovy", | |
"gross", | |
"guarded", | |
"half", | |
"handy", | |
"hanging", | |
"hateful", | |
"heady", | |
"heavenly", | |
"hellish", | |
"helpful", | |
"hesitant", | |
"highfalutin", | |
"homely", | |
"honest", | |
"huge", | |
"humdrum", | |
"hurried", | |
"hurt", | |
"icy", | |
"ignorant", | |
"ill", | |
"illegal", | |
"immediate", | |
"immense", | |
"imminent", | |
"impartial", | |
"imperfect", | |
"imported", | |
"initial", | |
"innate", | |
"inner", | |
"inside", | |
"irate", | |
"jolly", | |
"juicy", | |
"junior", | |
"juvenile", | |
"kaput", | |
"kindly", | |
"knowing", | |
"labored", | |
"languid", | |
"latter", | |
"learned", | |
"left", | |
"legal", | |
"lethal", | |
"level", | |
"lewd", | |
"likely", | |
"literate", | |
"lively", | |
"living", | |
"lonely", | |
"longing", | |
"loutish", | |
"lovely", | |
"loving", | |
"lowly", | |
"luxuriant", | |
"lying", | |
"macabre", | |
"madly", | |
"magenta", | |
"main", | |
"major", | |
"makeshift", | |
"male", | |
"mammoth", | |
"measly", | |
"meaty", | |
"medium", | |
"mere", | |
"middle", | |
"miniature", | |
"minor", | |
"miscreant", | |
"mobile", | |
"moldy", | |
"mute", | |
"naive", | |
"nearby", | |
"necessary", | |
"neighborly", | |
"next", | |
"nimble", | |
"nonchalant", | |
"nondescript", | |
"nonstop", | |
"north", | |
"nosy", | |
"obeisant", | |
"obese", | |
"obscene", | |
"observant", | |
"obsolete", | |
"offbeat", | |
"official", | |
"ok", | |
"open", | |
"opposite", | |
"organic", | |
"outdoor", | |
"outer", | |
"outgoing", | |
"oval", | |
"over", | |
"overall", | |
"overt", | |
"overweight", | |
"overwrought", | |
"painful", | |
"past", | |
"peaceful", | |
"perfect", | |
"petite", | |
"picayune", | |
"placid", | |
"plant", | |
"pleasant", | |
"polite", | |
"potential", | |
"pregnant", | |
"premium", | |
"present", | |
"pricey", | |
"prickly", | |
"primary", | |
"prior", | |
"private", | |
"profuse", | |
"proper", | |
"public", | |
"pumped", | |
"puny", | |
"quack", | |
"quaint", | |
"quickest", | |
"rabid", | |
"racial", | |
"ready", | |
"real", | |
"rebel", | |
"recondite", | |
"redundant", | |
"relevant", | |
"remote", | |
"resolute", | |
"resonant", | |
"right", | |
"rightful", | |
"ritzy", | |
"robust", | |
"romantic", | |
"roomy", | |
"rough", | |
"royal", | |
"salty", | |
"same", | |
"scary", | |
"scientific", | |
"screeching", | |
"second", | |
"secret", | |
"secure", | |
"sedate", | |
"seemly", | |
"selfish", | |
"senior", | |
"separate", | |
"severe", | |
"shiny", | |
"shocking", | |
"shut", | |
"shy", | |
"sick", | |
"significant", | |
"silly", | |
"sincere", | |
"single", | |
"skinny", | |
"slight", | |
"slimy", | |
"smelly", | |
"snobbish", | |
"social", | |
"somber", | |
"sordid", | |
"sorry", | |
"southern", | |
"spare", | |
"special", | |
"specific", | |
"spicy", | |
"splendid", | |
"squeamish", | |
"standard", | |
"standing", | |
"steadfast", | |
"steady", | |
"stereotyped", | |
"still", | |
"striped", | |
"stupid", | |
"sturdy", | |
"subdued", | |
"subsequent", | |
"substantial", | |
"sudden", | |
"super", | |
"superb", | |
"superficial", | |
"supreme", | |
"sure", | |
"taboo", | |
"tan", | |
"tasteful", | |
"tawdry", | |
"telling", | |
"temporary", | |
"terrific", | |
"tested", | |
"thoughtful", | |
"tidy", | |
"tiny", | |
"top", | |
"torpid", | |
"tranquil", | |
"trite", | |
"ugly", | |
"ultra", | |
"unbecoming", | |
"understood", | |
"uneven", | |
"unfair", | |
"unlikely", | |
"unruly", | |
"unsightly", | |
"untidy", | |
"unwritten", | |
"upbeat", | |
"upper", | |
"uppity", | |
"upset", | |
"upstairs", | |
"uptight", | |
"used", | |
"useful", | |
"utter", | |
"uttermost", | |
"vagabond", | |
"vanilla", | |
"various", | |
"vengeful", | |
"verdant", | |
"violet", | |
"volatile", | |
"wanting", | |
"wary", | |
"wasteful", | |
"weary", | |
"weekly", | |
"welcome", | |
"western", | |
"whole", | |
"wholesale", | |
"wiry", | |
"wistful", | |
"womanly", | |
"wooden", | |
"woozy", | |
"wound", | |
"wrong", | |
"wry", | |
"zany", | |
"sacred", | |
"unknown", | |
"detailed", | |
"ongoing", | |
"prominent", | |
"permanent", | |
"diverse", | |
"partial", | |
"moderate", | |
"contemporary", | |
"intense", | |
"widespread", | |
"ultimate", | |
"ideal", | |
"adequate", | |
"sophisticated", | |
"naked", | |
"dominant", | |
"precise", | |
"intact", | |
"adverse", | |
"genuine", | |
"subtle", | |
"universal", | |
"resistant", | |
"routine", | |
"distant", | |
"unexpected", | |
"soviet", | |
"blind", | |
"artificial", | |
"mild", | |
"legitimate", | |
"unpublished", | |
"superior", | |
"intermediate", | |
"everyday", | |
"dumb", | |
"excess", | |
"sexy", | |
"fake", | |
"monthly", | |
"premature", | |
"sheer", | |
"generic", | |
"insane", | |
"contrary", | |
"twin", | |
"upcoming", | |
"bottom", | |
"costly", | |
"indirect", | |
"sole", | |
"unrelated", | |
"hispanic", | |
"improper", | |
"underground", | |
"legendary", | |
"reluctant", | |
"beloved", | |
"inappropriate", | |
"corrupt", | |
"irrelevant", | |
"justified", | |
"obscure", | |
"profound", | |
"hostile", | |
"influential", | |
"inadequate", | |
"abstract", | |
"timely", | |
"authentic", | |
"bold", | |
"intimate", | |
"straightforward", | |
"rival", | |
"right-wing", | |
"racist", | |
"symbolic", | |
"unprecedented", | |
"loyal", | |
"talented", | |
"troubled", | |
"noble", | |
"instant", | |
"incorrect", | |
"dense", | |
"blond", | |
"deliberate", | |
"blank", | |
"rear", | |
"feminine", | |
"apt", | |
"stark", | |
"alcoholic", | |
"teenage", | |
"vibrant", | |
"humble", | |
"vain", | |
"covert", | |
"bland", | |
"trendy", | |
"foul", | |
"populist", | |
"alarming", | |
"hooked", | |
"wicked", | |
"deaf", | |
"left-wing", | |
"lousy", | |
"malignant", | |
"stylish", | |
"upscale", | |
"hourly", | |
"refreshing", | |
"cozy", | |
"slick", | |
"dire", | |
"yearly", | |
"inbred", | |
"part-time", | |
"finite", | |
"backwards", | |
"nightly", | |
"unauthorized", | |
"cheesy", | |
"indoor", | |
"surreal", | |
"bald", | |
"masculine", | |
"shady", | |
"spirited", | |
"eerie", | |
"horrific", | |
"smug", | |
"stern", | |
"hefty", | |
"savvy", | |
"bogus", | |
"elaborate", | |
"gloomy", | |
"pristine", | |
"extravagant", | |
"serene", | |
"advanced", | |
"perverse", | |
"devout", | |
"crisp", | |
"rosy", | |
"slender", | |
"melancholy", | |
"faux", | |
"phony", | |
"danish", | |
"lofty", | |
"nuanced", | |
"lax", | |
"adept", | |
"barren", | |
"shameful", | |
"sleek", | |
"solemn", | |
"vacant", | |
"dishonest", | |
"brisk", | |
"fluent", | |
"insecure", | |
"humid", | |
"menacing", | |
"moot", | |
"soothing", | |
"self-loathing", | |
"far-reaching", | |
"harrowing", | |
"scathing", | |
"perplexing", | |
"calming", | |
"unconvincing", | |
"unsuspecting", | |
"unassuming", | |
"surprising", | |
"unappealing", | |
"vexing", | |
"unending", | |
"easygoing", | |
"appetizing", | |
"disgruntled", | |
"retarded", | |
"undecided", | |
"unregulated", | |
"unsupervised", | |
"unrecognized", | |
"crazed", | |
"distressed", | |
"jagged", | |
"paralleled", | |
"cramped", | |
"warped", | |
"antiquated", | |
"fabled", | |
"deranged", | |
"diseased", | |
"ragged", | |
"intoxicated", | |
"hallowed", | |
"crowded", | |
"ghastly", | |
"disorderly", | |
"saintly", | |
"wily", | |
"sly", | |
"sprightly", | |
"ghostly", | |
"oily", | |
"hilly", | |
"grisly", | |
"earthly", | |
"friendly", | |
"unwieldy", | |
"many", | |
"most", | |
"last", | |
"expected", | |
"far", | |
"due", | |
"divine", | |
"all", | |
"together", | |
"only", | |
"outside", | |
"multiple", | |
"appropriate", | |
"evil", | |
"favorite", | |
"limited", | |
"random", | |
"republican", | |
"okay", | |
"essential", | |
"secondary", | |
"gay", | |
"south", | |
"pro", | |
"northern", | |
"urban", | |
"acute", | |
"prime", | |
"arab", | |
"overnight", | |
"mixed", | |
"crucial", | |
"behind", | |
"above", | |
"beyond", | |
"against", | |
"under", | |
"other", | |
"less" | |
] | |
},{}],5:[function(require,module,exports){ | |
//adjectival forms of place names, as adjectives. | |
module.exports= [ | |
"afghan", | |
"albanian", | |
"algerian", | |
"argentine", | |
"armenian", | |
"australian", | |
"aussie", | |
"austrian", | |
"bangladeshi", | |
"belgian", | |
"bolivian", | |
"bosnian", | |
"brazilian", | |
"bulgarian", | |
"cambodian", | |
"canadian", | |
"chilean", | |
"chinese", | |
"colombian", | |
"croat", | |
"cuban", | |
"czech", | |
"dominican", | |
"egyptian", | |
"british", | |
"estonian", | |
"ethiopian", | |
"finnish", | |
"french", | |
"gambian", | |
"georgian", | |
"german", | |
"greek", | |
"haitian", | |
"hungarian", | |
"indian", | |
"indonesian", | |
"iranian", | |
"iraqi", | |
"irish", | |
"israeli", | |
"italian", | |
"jamaican", | |
"japanese", | |
"jordanian", | |
"kenyan", | |
"korean", | |
"kuwaiti", | |
"latvian", | |
"lebanese", | |
"liberian", | |
"libyan", | |
"lithuanian", | |
"macedonian", | |
"malaysian", | |
"mexican", | |
"mongolian", | |
"moroccan", | |
"dutch", | |
"nicaraguan", | |
"nigerian", | |
"norwegian", | |
"omani", | |
"pakistani", | |
"palestinian", | |
"filipino", | |
"polish", | |
"portuguese", | |
"qatari", | |
"romanian", | |
"russian", | |
"rwandan", | |
"samoan", | |
"saudi", | |
"scottish", | |
"senegalese", | |
"serbian", | |
"singaporean", | |
"slovak", | |
"somali", | |
"sudanese", | |
"swedish", | |
"swiss", | |
"syrian", | |
"taiwanese", | |
"thai", | |
"tunisian", | |
"ugandan", | |
"ukrainian", | |
"american", | |
"hindi", | |
"spanish", | |
"venezuelan", | |
"vietnamese", | |
"welsh", | |
"african", | |
"european", | |
"asian", | |
"californian", | |
] | |
},{}],6:[function(require,module,exports){ | |
// common first-names in compressed form. | |
//from http://www.ssa.gov/oact/babynames/limits.html and http://www.servicealberta.gov.ab.ca/pdf/vs/2001_Boys.pdf | |
//not sure what regional/cultural/demographic bias this has. Probably a lot. | |
// 73% of people are represented in the top 1000 names | |
//used to reduce redundant named-entities in longer text. (don't spot the same person twice.) | |
//used to identify gender for coreference resolution | |
var main = [] | |
//an ad-hoc prefix encoding for names. 2ms decompression of names | |
var male_names = { | |
"will": "iam,ie,ard,is,iams", | |
"fred": ",erick,die,rick,dy", | |
"marc": "us,,o,os,el", | |
"darr": "ell,yl,en,el,in", | |
"fran": "k,cis,cisco,klin,kie", | |
"terr": "y,ance,ence,ell", | |
"rand": "y,all,olph,al", | |
"brad": "ley,,ford,y", | |
"jeff": "rey,,ery,ry", | |
"john": ",ny,nie,athan", | |
"greg": "ory,,g,orio", | |
"mar": "k,tin,vin,io,shall,ty,lon,lin", | |
"car": "l,los,lton,roll,y,ey", | |
"ken": "neth,,t,ny,dall,drick", | |
"har": "old,ry,vey,ley,lan,rison", | |
"ste": "ven,phen,ve,wart,phan,rling", | |
"jer": "ry,emy,ome,emiah,maine,ald", | |
"mic": "hael,heal,ah,key,hel", | |
"dar": "yl,in,nell,win,ius", | |
"dan": "iel,ny,,e", | |
"wil": "bur,son,bert,fred,fredo", | |
"ric": "hard,ky,ardo,k,key", | |
"cli": "fford,nton,fton,nt,ff", | |
"cla": "rence,ude,yton,rk,y", | |
"ben": "jamin,,nie,ny,ito", | |
"rod": "ney,erick,olfo,ger,", | |
"rob": "ert,erto,bie,", | |
"gar": "y,ry,rett,land", | |
"sam": "uel,,my,mie", | |
"and": "rew,re,y,res", | |
"jos": "eph,e,hua,h", | |
"joe": ",l,y,sph", | |
"leo": "nard,n,,nardo", | |
"tom": ",my,as,mie", | |
"bry": "an,ant,ce,on", | |
"ant": "hony,onio,oine,on", | |
"jac": "k,ob,kson", | |
"cha": "rles,d,rlie,se", | |
"sha": "wn,ne,un", | |
"bre": "nt,tt,ndan,t", | |
"jes": "se,us,s", | |
"al": "bert,an,len,fred,exander,ex,vin,lan,fredo,berto,ejandro,fonso,ton,,onzo,i,varo", | |
"ro": "nald,ger,y,nnie,land,n,ss,osevelt,gelio,lando,man,cky,yce,scoe,ry", | |
"de": "nnis,rek,an,rrick,lbert,vin,wey,xter,wayne,metrius,nis,smond", | |
"ja": "mes,son,y,red,vier,ke,sper,mal,rrod", | |
"el": "mer,lis,bert,ias,ijah,don,i,ton,liot,liott,vin,wood", | |
"ma": "tthew,nuel,urice,thew,x,tt,lcolm,ck,son", | |
"do": "nald,uglas,n,nnie,ug,minic,yle,mingo,minick", | |
"er": "ic,nest,ik,nesto,ick,vin,nie,win", | |
"ra": "ymond,lph,y,mon,fael,ul,miro,phael", | |
"ed": "ward,win,die,gar,uardo,,mund,mond", | |
"co": "rey,ry,dy,lin,nrad,rnelius", | |
"le": "roy,wis,ster,land,vi", | |
"lo": "uis,nnie,renzo,ren,well,uie,u,gan", | |
"da": "vid,le,ve,mon,llas,mian,mien", | |
"jo": "nathan,n,rge,rdan,nathon,aquin", | |
"ru": "ssell,ben,dolph,dy,fus,ssel,sty", | |
"ke": "vin,ith,lvin,rmit", | |
"ar": "thur,nold,mando,turo,chie,mand", | |
"re": "ginald,x,ynaldo,uben,ggie", | |
"ge": "orge,rald,ne,rard,offrey,rardo", | |
"la": "rry,wrence,nce,urence,mar,mont", | |
"mo": "rris,ses,nte,ises,nty", | |
"ju": "an,stin,lio,lian,lius,nior", | |
"pe": "ter,dro,rry,te,rcy", | |
"tr": "avis,oy,evor,ent", | |
"he": "nry,rbert,rman,ctor,ath", | |
"no": "rman,el,ah,lan,rbert", | |
"em": "anuel,il,ilio,mett,manuel", | |
"wa": "lter,yne,rren,llace,de", | |
"mi": "ke,guel,lton,tchell,les", | |
"sa": "lvador,lvatore,ntiago,ul,ntos", | |
"ch": "ristopher,ris,ester,ristian,uck", | |
"pa": "ul,trick,blo,t", | |
"st": "anley,uart,an", | |
"hu": "gh,bert,go,mberto", | |
"br": "ian,uce,andon,ain", | |
"vi": "ctor,ncent,rgil,cente", | |
"ca": "lvin,meron,leb", | |
"gu": "y,illermo,stavo", | |
"lu": "is,ther,ke,cas", | |
"gr": "ant,ady,over,aham", | |
"ne": "il,lson,al,d", | |
"t": "homas,imothy,odd,ony,heodore,im,yler,ed,yrone,aylor,erence,immy,oby,eddy,yson", | |
"s": "cott,ean,idney,ergio,eth,pencer,herman,ylvester,imon,heldon,cotty,olomon", | |
"r": "yan", | |
"n": "icholas,athan,athaniel,ick,icolas", | |
"a": "dam,aron,drian,ustin,ngelo,braham,mos,bel,gustin,ugust,dolfo", | |
"b": "illy,obby,arry,ernard,ill,ob,yron,lake,ert,oyd,illie,laine,art,uddy,urton", | |
"e": "ugene,arl,verett,nrique,van,arnest,frain,than,steban", | |
"h": "oward,omer,orace,ans,al", | |
"p": "hillip,hilip,reston,hil,ierre", | |
"c": "raig,urtis,lyde,ecil,esar,edric,leveland,urt", | |
"j": "immy,im,immie", | |
"g": "lenn,ordon,len,ilbert,abriel,ilberto", | |
"m": "elvin,yron,erle,urray", | |
"k": "yle,arl,urt,irk,ristopher", | |
"o": "scar,tis,liver,rlando,mar,wen,rville,tto", | |
"l": "loyd,yle,ionel", | |
"f": "loyd,ernando,elix,elipe,orrest,abian,idel", | |
"w": "esley,endell,m,oodrow,inston", | |
"d": "ustin,uane,wayne,wight,rew,ylan", | |
"z": "achary", | |
"v": "ernon,an,ance", | |
"i": "an,van,saac,ra,rving,smael,gnacio,rvin", | |
"q": "uentin,uinton", | |
"x": "avier" | |
} | |
var female_names = { | |
"mari": "a,e,lyn,an,anne,na,ssa,bel,sa,sol,tza", | |
"kris": "ten,tin,tina,ti,tine,ty,ta,tie", | |
"jean": "ette,ne,nette,nie,ine,nine", | |
"chri": "stine,stina,sty,stie,sta,sti", | |
"marg": "aret,ie,arita,uerite,ret,o", | |
"ange": "la,lica,lina,lia,line", | |
"fran": "ces,cine,cisca", | |
"kath": "leen,erine,y,ryn,arine", | |
"sher": "ry,ri,yl,i,rie", | |
"caro": "l,lyn,line,le,lina", | |
"dian": "e,a,ne,na", | |
"jenn": "ifer,ie,y,a", | |
"luci": "lle,a,nda,le", | |
"kell": "y,i,ey,ie", | |
"rosa": ",lie,lind", | |
"jani": "ce,e,s,ne", | |
"stac": "y,ey,ie,i", | |
"shel": "ly,ley,ia", | |
"laur": "a,en,ie,el", | |
"trac": "y,ey,i,ie", | |
"jane": "t,,lle,tte", | |
"bett": "y,ie,e,ye", | |
"rose": "mary,marie,tta", | |
"joan": ",ne,n,na", | |
"mar": "y,tha,jorie,cia,lene,sha,yann,cella,ta,la,cy,tina", | |
"lor": "i,raine,etta,a,ena,ene,na,ie", | |
"sha": "ron,nnon,ri,wna,nna,na,una", | |
"dor": "othy,is,a,een,thy,othea", | |
"cla": "ra,udia,ire,rice,udette", | |
"eli": "zabeth,sa,sabeth,se,za", | |
"kar": "en,la,a,i,in", | |
"tam": "my,ara,i,mie,ika", | |
"ann": "a,,e,ie,ette", | |
"car": "men,rie,la,a,mela", | |
"mel": "issa,anie,inda", | |
"ali": "ce,cia,son,sha,sa", | |
"bri": "ttany,dget,ttney,dgette", | |
"lyn": "n,da,ne,ette", | |
"del": "ores,la,ia,oris", | |
"ter": "esa,ri,i", | |
"son": "ia,ya,ja,dra", | |
"deb": "orah,ra,bie,ora", | |
"jac": "queline,kie,quelyn,lyn", | |
"lat": "oya,asha,onya,isha", | |
"che": "ryl,lsea,ri,rie", | |
"vic": "toria,ki,kie,ky", | |
"sus": "an,ie,anne,ana", | |
"rob": "erta,yn", | |
"est": "her,elle,ella,er", | |
"lea": "h,,nne,nn", | |
"lil": "lian,lie,a,y", | |
"ma": "ureen,ttie,xine,bel,e,deline,ggie,mie,ble,ndy,ude,yra,nuela,vis,gdalena,tilda", | |
"jo": "yce,sephine,,di,dy,hanna,sefina,sie,celyn,lene,ni,die", | |
"be": "verly,rtha,atrice,rnice,th,ssie,cky,linda,ulah,rnadette,thany,tsy,atriz", | |
"ca": "therine,thy,ssandra,ndace,ndice,mille,itlin,ssie,thleen,llie", | |
"le": "slie,na,ona,ticia,igh,la,nora,ola,sley,ila", | |
"el": "aine,len,eanor,sie,la,ena,oise,vira,sa,va,ma", | |
"sa": "ndra,rah,ra,lly,mantha,brina,ndy,die,llie", | |
"mi": "chelle,ldred,chele,nnie,riam,sty,ndy,randa,llie", | |
"co": "nnie,lleen,nstance,urtney,ra,rinne,nsuelo,rnelia", | |
"ju": "lie,dith,dy,lia,anita,ana,stine", | |
"da": "wn,nielle,rlene,na,isy,rla,phne", | |
"re": "becca,nee,na,bekah,ba", | |
"al": "ma,lison,berta,exandra,yssa,ta", | |
"ra": "chel,mona,chael,quel,chelle", | |
"an": "drea,ita,a,gie,toinette,tonia", | |
"ge": "raldine,rtrude,orgia,nevieve,orgina", | |
"de": "nise,anna,siree,na,ana,e", | |
"ja": "smine,na,yne", | |
"lu": "cy,z,la,pe,ella,isa", | |
"je": "ssica,nifer,well,ri", | |
"ad": "a,rienne,die,ele,riana,eline", | |
"pa": "tricia,mela,ula,uline,tsy,m,tty,ulette,tti,trice,trica,ige", | |
"ke": "ndra,rri,isha,ri", | |
"mo": "nica,lly,nique,na,llie", | |
"lo": "uise,is,la", | |
"he": "len,ather,idi,nrietta,lene,lena", | |
"me": "gan,rcedes,redith,ghan,agan", | |
"wi": "lma,lla,nnie", | |
"ga": "il,yle,briela,brielle,le", | |
"er": "in,ica,ika,ma,nestine", | |
"ce": "cilia,lia,celia,leste,cile", | |
"ka": "tie,y,trina,yla,te", | |
"ol": "ga,ivia,lie,a", | |
"li": "nda,sa,ndsay,ndsey,zzie", | |
"na": "ncy,talie,omi,tasha,dine", | |
"la": "verne,na,donna,ra", | |
"vi": "rginia,vian,ola", | |
"ha": "rriet,nnah", | |
"pe": "ggy,arl,nny,tra", | |
"br": "enda,andi,ooke", | |
"ki": "mberly,m,mberley,rsten", | |
"au": "drey,tumn,dra", | |
"bo": "nnie,bbie,nita,bbi", | |
"do": "nna,lores,lly,minique", | |
"gl": "oria,adys,enda,enna", | |
"tr": "icia,ina,isha,udy", | |
"ta": "ra,nya,sha,bitha", | |
"ro": "sie,xanne,chelle,nda", | |
"am": "y,anda,ber,elia", | |
"fa": "ye,nnie,y", | |
"ni": "cole,na,chole,kki", | |
"ve": "ronica,ra,lma,rna", | |
"gr": "ace,etchen,aciela,acie", | |
"b": "arbara,lanca,arbra,ianca", | |
"r": "uth,ita,honda", | |
"s": "hirley,tephanie,ylvia,heila,uzanne,ue,tella,ophia,ilvia,ophie,tefanie,heena,ummer,elma,ocorro,ybil,imone", | |
"c": "ynthia,rystal,indy,harlene,ristina,leo", | |
"e": "velyn,mily,dna,dith,thel,mma,va,ileen,unice,ula,ssie,ffie,tta,ugenia", | |
"a": "shley,pril,gnes,rlene,imee,bigail,ida,bby,ileen", | |
"t": "heresa,ina,iffany,helma,onya,oni,herese,onia", | |
"i": "rene,da,rma,sabel,nez,ngrid,va,mogene,sabelle", | |
"w": "anda,endy,hitney", | |
"p": "hyllis,riscilla,olly", | |
"n": "orma,ellie,ora,ettie,ell", | |
"f": "lorence,elicia,lora,reda,ern,rieda", | |
"v": "alerie,anessa", | |
"j": "ill,illian", | |
"y": "vonne,olanda,vette", | |
"g": "ina,wendolyn,wen,oldie", | |
"l": "ydia", | |
"m": "yrtle,yra,uriel,yrna", | |
"h": "ilda", | |
"o": "pal,ra,felia", | |
"k": "rystal", | |
"d": "ixie,ina", | |
"u": "rsula" | |
} | |
var ambiguous = [ | |
"casey", | |
"jamie", | |
"lee", | |
"jaime", | |
"jessie", | |
"morgan", | |
"rene", | |
"robin", | |
"devon", | |
"kerry", | |
"alexis", | |
"guadalupe", | |
"blair", | |
"kasey", | |
"jean", | |
"marion", | |
"aubrey", | |
"shelby", | |
"jan", | |
"shea", | |
"jade", | |
"kenyatta", | |
"kelsey", | |
"shay", | |
"lashawn", | |
"trinity", | |
"regan", | |
"jammie", | |
"cassidy", | |
"cheyenne", | |
"reagan", | |
"shiloh", | |
"marlo", | |
"andra", | |
"devan", | |
"rosario", | |
"lee" | |
] | |
var i, arr, i2, l, keys; | |
//add data into the main obj | |
//males | |
keys = Object.keys(male_names) | |
l = keys.length | |
for (i = 0; i < l; i++) { | |
arr = male_names[keys[i]].split(',') | |
for (i2 = 0; i2 < arr.length; i2++) { | |
main[keys[i] + arr[i2]] = "m" | |
} | |
} | |
//females | |
keys = Object.keys(female_names) | |
l = keys.length | |
for (i = 0; i < l; i++) { | |
arr = female_names[keys[i]].split(',') | |
for (i2 = 0; i2 < arr.length; i2++) { | |
main[keys[i] + arr[i2]] = "f" | |
} | |
} | |
//unisex names | |
l = ambiguous.length | |
for (i = 0; i < l; i += 1) { | |
main[ambiguous[i]] = "a" | |
} | |
module.exports = main; | |
// console.log(firstnames['spencer']) | |
// console.log(firstnames['jill']) | |
// console.log(firstnames['sue']) | |
// console.log(firstnames['jan']) | |
// console.log(JSON.stringify(Object.keys(firstnames).length, null, 2)); | |
},{}],7:[function(require,module,exports){ | |
//these are common person titles used in the lexicon and sentence segmentation methods | |
//they are also used to identify that a noun is a person | |
var main = [ | |
//honourifics | |
"jr", | |
"mr", | |
"mrs", | |
"ms", | |
"dr", | |
"prof", | |
"sr", | |
"sen", | |
"corp", | |
"rep", | |
"gov", | |
"atty", | |
"supt", | |
"det", | |
"rev", | |
"col", | |
"gen", | |
"lt", | |
"cmdr", | |
"adm", | |
"capt", | |
"sgt", | |
"cpl", | |
"maj", | |
"miss", | |
"misses", | |
"mister", | |
"sir", | |
"esq", | |
"mstr", | |
"phd", | |
"adj", | |
"adv", | |
"asst", | |
"bldg", | |
"brig", | |
"comdr", | |
"hon", | |
"messrs", | |
"mlle", | |
"mme", | |
"op", | |
"ord", | |
"pvt", | |
"reps", | |
"res", | |
"sens", | |
"sfc", | |
"surg", | |
] | |
module.exports = main; | |
},{}],8:[function(require,module,exports){ | |
//nouns with irregular plural/singular forms | |
//used in noun.inflect, and also in the lexicon. | |
//compressed with '_' to reduce some redundancy. | |
var main=[ | |
["child", "_ren"], | |
["person", "people"], | |
["leaf", "leaves"], | |
["database", "_s"], | |
["quiz", "_zes"], | |
["child", "_ren"], | |
["stomach", "_s"], | |
["sex", "_es"], | |
["move", "_s"], | |
["shoe", "_s"], | |
["goose", "geese"], | |
["phenomenon", "phenomena"], | |
["barracks", "_"], | |
["deer", "_"], | |
["syllabus", "syllabi"], | |
["index", "indices"], | |
["appendix", "appendices"], | |
["criterion", "criteria"], | |
["man", "men"], | |
["sex", "_es"], | |
["rodeo", "_s"], | |
["epoch", "_s"], | |
["zero", "_s"], | |
["avocado", "_s"], | |
["halo", "_s"], | |
["tornado", "_s"], | |
["tuxedo", "_s"], | |
["sombrero", "_s"], | |
["addendum", "addenda"], | |
["alga", "_e"], | |
["alumna", "_e"], | |
["alumnus", "alumni"], | |
["bacillus", "bacilli"], | |
["cactus", "cacti"], | |
["beau", "_x"], | |
["château", "_x"], | |
["chateau", "_x"], | |
["tableau", "_x"], | |
["corpus", "corpora"], | |
["curriculum", "curricula"], | |
["echo", "_es"], | |
["embargo", "_es"], | |
["foot", "feet"], | |
["genus", "genera"], | |
["hippopotamus", "hippopotami"], | |
["larva", "_e"], | |
["libretto", "libretti"], | |
["loaf", "loaves"], | |
["matrix", "matrices"], | |
["memorandum", "memoranda"], | |
["mosquito", "_es"], | |
["opus", "opera"], | |
["ovum", "ova"], | |
["ox", "_en"], | |
["radius", "radii"], | |
["referendum", "referenda"], | |
["thief", "thieves"], | |
["tooth", "teeth"] | |
] | |
main = main.map(function (a) { | |
a[1] = a[1].replace('_', a[0]) | |
return a | |
}) | |
module.exports = main; | |
},{}],9:[function(require,module,exports){ | |
//common terms that are multi-word, but one part-of-speech | |
//these should not include phrasal verbs, like 'looked out'. These are handled elsewhere. | |
module.exports = { | |
"of course": "RB", | |
"at least": "RB", | |
"no longer": "RB", | |
"sort of": "RB", | |
"at first": "RB", | |
"once again": "RB", | |
"once more": "RB", | |
"up to": "RB", | |
"by now": "RB", | |
"all but": "RB", | |
"just about": "RB", | |
"on board": "JJ", | |
"a lot": "RB", | |
"by far": "RB", | |
"at best": "RB", | |
"at large": "RB", | |
"for good": "RB", | |
"vice versa": "JJ", | |
"en route": "JJ", | |
"for sure": "RB", | |
"upside down": "JJ", | |
"at most": "RB", | |
"per se": "RB", | |
"at worst": "RB", | |
"upwards of": "RB", | |
"en masse": "RB", | |
"point blank": "RB", | |
"up front": "JJ", | |
"in situ": "JJ", | |
"in vitro": "JJ", | |
"ad hoc": "JJ", | |
"de facto": "JJ", | |
"ad infinitum": "JJ", | |
"ad nauseam": "RB", | |
"for keeps": "JJ", | |
"a priori": "FW", | |
"et cetera": "FW", | |
"off guard": "JJ", | |
"spot on": "JJ", | |
"ipso facto": "JJ", | |
"not withstanding": "RB", | |
"de jure": "RB", | |
"a la": "IN", | |
"ad hominem": "NN", | |
"par excellence": "RB", | |
"de trop": "RB", | |
"a posteriori": "RB", | |
"fed up": "JJ", | |
"brand new": "JJ", | |
"old fashioned": "JJ", | |
"bona fide": "JJ", | |
"well off": "JJ", | |
"far off": "JJ", | |
"straight forward": "JJ", | |
"hard up": "JJ", | |
"sui generis": "JJ", | |
"en suite": "JJ", | |
"avant garde": "JJ", | |
"sans serif": "JJ", | |
"gung ho": "JJ", | |
"super duper": "JJ", | |
"new york":"NN", | |
"new england":"NN", | |
"new hampshire":"NN", | |
"new delhi":"NN", | |
"new jersey":"NN", | |
"new mexico":"NN", | |
"united states":"NN", | |
"united kingdom":"NN", | |
"great britain":"NN", | |
"head start":"NN" | |
} | |
},{}],10:[function(require,module,exports){ | |
//phrasal verbs are two words that really mean one verb. | |
//'beef up' is one verb, and not some direction of beefing. | |
//by @spencermountain, 2015 mit | |
//many credits to http://www.allmyphrasalverbs.com/ | |
var verb_conjugate = require("../../parents/verb/conjugate/conjugate") | |
//start the list with some randoms | |
var main = [ | |
"be onto", | |
"fall behind", | |
"fall through", | |
"fool with", | |
"get across", | |
"get along", | |
"get at", | |
"give way", | |
"hear from", | |
"hear of", | |
"lash into", | |
"make do", | |
"run across", | |
"set upon", | |
"take aback", | |
"keep from" | |
] | |
//if there's a phrasal verb "keep on", there's often a "keep off" | |
var opposites = { | |
"away": "back", | |
"in": "out", | |
"on": "off", | |
"over": "under", | |
"together": "apart", | |
"up": "down" | |
} | |
//forms that have in/out symmetry | |
var symmetric = { | |
"away": "blow,bounce,bring,call,come,cut,drop,fire,get,give,go,keep,pass,put,run,send,shoot,switch,take,tie,throw", | |
"in": "bang,barge,bash,beat,block,book,box,break,bring,burn,butt,carve,cash,check,come,cross,drop,fall,fence,fill,give,grow,hand,hang,head,jack,keep,leave,let,lock,log,move,opt,pack,peel,pull,put,rain,reach,ring,rub,send,set,settle,shut,sign,smash,snow,strike,take,try,turn,type,warm,wave,wean,wear,wheel", | |
"on": "add,call,carry,catch,count,feed,get,give,go,grind,head,hold,keep,lay,log,pass,pop,power,put,send,show,snap,switch,take,tell,try,turn,wait", | |
"over": "come,go,look,read,run,talk", | |
"together": "come,pull,put", | |
"up": "add,back,beat,bend,blow,boil,bottle,break,bring,buckle,bundle,call,carve,clean,cut,dress,fill,flag,fold,get,give,grind,grow,hang,hold,keep,let,load,lock,look,man,mark,melt,move,pack,pin,pipe,plump,pop,power,pull,put,rub,scale,scrape,send,set,settle,shake,show,sit,slow,smash,square,stand,strike,take,tear,tie,turn,use,wash,wind", | |
} | |
Object.keys(symmetric).forEach(function (k) { | |
symmetric[k].split(',').forEach(function (s) { | |
//add the given form | |
main.push(s + " " + k) | |
//add its opposite form | |
main.push(s + " " + opposites[k]) | |
}) | |
}) | |
//forms that don't have in/out symmetry | |
var asymmetric = { | |
"about": "bring,fool,gad,go,root", | |
"after": "go,look,take", | |
"ahead": "get,go,press", | |
"along": "bring,move", | |
"apart": "fall,take", | |
"around": "ask,boss,bring,call,come,fool,get,horse,joke,lie,mess,play", | |
"away": "back,carry,file,frighten,hide,wash", | |
"back": "fall,fight,hit,hold,look,pay,stand,think", | |
"by": "drop,get,go,stop,swear,swing,tick,zip", | |
"down": "bog,calm,fall,hand,hunker,jot,knock,lie,narrow,note,pat,pour,run,tone,trickle,wear", | |
"for": "fend,file,gun,hanker,root,shoot", | |
"forth": "bring,come", | |
"forward": "come,look", | |
"in": "cave,chip,hone,jump,key,pencil,plug,rein,shade,sleep,stop,suck,tie,trade,tuck,usher,weigh,zero", | |
"into": "look,run", | |
"it": "go,have", | |
"off": "auction,be,beat,blast,block,brush,burn,buzz,cast,cool,drop,end,face,fall,fend,frighten,goof,jack,kick,knock,laugh,level,live,make,mouth,nod,pair,pay,peel,read,reel,ring,rip,round,sail,shave,shoot,sleep,slice,split,square,stave,stop,storm,strike,tear,tee,tick,tip,top,walk,work,write", | |
"on": "bank,bargain,egg,frown,hit,latch,pile,prattle,press,spring,spur,tack,urge,yammer", | |
"out": "act,ask,back,bail,bear,black,blank,bleed,blow,blurt,branch,buy,cancel,cut,eat,edge,farm,figure,find,fill,find,fish,fizzle,flake,flame,flare,flesh,flip,geek,get,help,hide,hold,iron,knock,lash,level,listen,lose,luck,make,max,miss,nerd,pan,pass,pick,pig,point,print,psych,rat,read,rent,root,rule,run,scout,see,sell,shout,single,sit,smoke,sort,spell,splash,stamp,start,storm,straighten,suss,time,tire,top,trip,trot,wash,watch,weird,whip,wimp,wipe,work,zone,zonk", | |
"over": "bend,bubble,do,fall,get,gloss,hold,keel,mull,pore,sleep,spill,think,tide,tip", | |
"round": "get,go", | |
"through": "go,run", | |
"to": "keep,see", | |
"up": "act,beef,board,bone,boot,brighten,build,buy,catch,cheer,cook,end,eye,face,fatten,feel,fess,finish,fire,firm,flame,flare,free,freeze,freshen,fry,fuel,gang,gear,goof,hack,ham,heat,hit,hole,hush,jazz,juice,lap,light,lighten,line,link,listen,live,loosen,make,mash,measure,mess,mix,mock,mop,muddle,open,own,pair,patch,pick,prop,psych,read,rough,rustle,save,shack,sign,size,slice,slip,snap,sober,spark,split,spruce,stack,start,stay,stir,stitch,straighten,string,suck,suit,sum,team,tee,think,tidy,tighten,toss,trade,trip,type,vacuum,wait,wake,warm,weigh,whip,wire,wise,word,write,zip", | |
} | |
Object.keys(asymmetric).forEach(function (k) { | |
asymmetric[k].split(',').forEach(function (s) { | |
main.push(s + " " + k) | |
}) | |
}) | |
//at his point all verbs are infinitive. lets make this explicit. | |
main = main.reduce(function (h, s) { | |
h[s] = "VBP" | |
return h | |
}, {}) | |
//conjugate every phrasal verb. takes ~30ms | |
var tags = { | |
present: "VB", | |
past: "VBD", | |
future: "VBF", | |
gerund: "VBG", | |
infinitive: "VBP", | |
} | |
var cache = {} //cache individual verbs to speed it up | |
var split, verb, particle, phrasal; | |
Object.keys(main).forEach(function (s) { | |
split = s.split(' ') | |
verb = split[0] | |
particle = split[1] | |
if (cache[verb] === undefined) { | |
cache[verb] = verb_conjugate(verb) | |
} | |
Object.keys(cache[verb]).forEach(function (k) { | |
phrasal = cache[verb][k] + " " + particle | |
main[phrasal] = tags[k] | |
}) | |
}) | |
module.exports = main; | |
// console.log(JSON.stringify(phrasal_verbs, null, 2)) | |
},{"../../parents/verb/conjugate/conjugate":39}],11:[function(require,module,exports){ | |
//common nouns that have no plural form. These are suprisingly rare | |
//used in noun.inflect(), and added as nouns in lexicon | |
module.exports=[ | |
"aircraft", | |
"bass", | |
"bison", | |
"fowl", | |
"halibut", | |
"moose", | |
"salmon", | |
"spacecraft", | |
"tuna", | |
"trout", | |
"advice", | |
"information", | |
"knowledge", | |
"trouble", | |
"enjoyment", | |
"fun", | |
"recreation", | |
"relaxation", | |
"meat", | |
"rice", | |
"bread", | |
"cake", | |
"coffee", | |
"ice", | |
"water", | |
"oil", | |
"grass", | |
"hair", | |
"fruit", | |
"wildlife", | |
"equipment", | |
"machinery", | |
"furniture", | |
"mail", | |
"luggage", | |
"jewelry", | |
"clothing", | |
"money", | |
"mathematics", | |
"economics", | |
"physics", | |
"civics", | |
"ethics", | |
"gymnastics", | |
"mumps", | |
"measles", | |
"news", | |
"tennis", | |
"baggage", | |
"currency", | |
"soap", | |
"toothpaste", | |
"food", | |
"sugar", | |
"butter", | |
"flour", | |
"research", | |
"leather", | |
"wool", | |
"wood", | |
"coal", | |
"weather", | |
"homework", | |
"cotton", | |
"silk", | |
"patience", | |
"impatience", | |
"vinegar", | |
"art", | |
"beef", | |
"blood", | |
"cash", | |
"chaos", | |
"cheese", | |
"chewing", | |
"conduct", | |
"confusion", | |
"education", | |
"electricity", | |
"entertainment", | |
"fiction", | |
"forgiveness", | |
"gold", | |
"gossip", | |
"ground", | |
"happiness", | |
"history", | |
"honey", | |
"hospitality", | |
"importance", | |
"justice", | |
"laughter", | |
"leisure", | |
"lightning", | |
"literature", | |
"luck", | |
"melancholy", | |
"milk", | |
"mist", | |
"music", | |
"noise", | |
"oxygen", | |
"paper", | |
"pay", | |
"peace", | |
"peanut", | |
"pepper", | |
"petrol", | |
"plastic", | |
"pork", | |
"power", | |
"pressure", | |
"rain", | |
"recognition", | |
"sadness", | |
"safety", | |
"salt", | |
"sand", | |
"scenery", | |
"shopping", | |
"silver", | |
"snow", | |
"softness", | |
"space", | |
"speed", | |
"steam", | |
"sunshine", | |
"tea", | |
"thunder", | |
"time", | |
"traffic", | |
"trousers", | |
"violence", | |
"warmth", | |
"wine", | |
"steel", | |
"soccer", | |
"hockey", | |
"golf", | |
"fish", | |
"gum", | |
"liquid", | |
"series", | |
"sheep", | |
"species", | |
"fahrenheit", | |
"celcius", | |
"kelvin", | |
"hertz" | |
] | |
},{}],12:[function(require,module,exports){ | |
//terms that are "CD", a 'value' term | |
module.exports = [ | |
//numbers | |
'zero', | |
'one', | |
'two', | |
'three', | |
'four', | |
'five', | |
'six', | |
'seven', | |
'eight', | |
'nine', | |
'ten', | |
'eleven', | |
'twelve', | |
'thirteen', | |
'fourteen', | |
'fifteen', | |
'sixteen', | |
'seventeen', | |
'eighteen', | |
'nineteen', | |
'twenty', | |
'thirty', | |
'forty', | |
'fifty', | |
'sixty', | |
'seventy', | |
'eighty', | |
'ninety', | |
'hundred', | |
'thousand', | |
'million', | |
'billion', | |
'trillion', | |
'quadrillion', | |
'quintillion', | |
'sextillion', | |
'septillion', | |
'octillion', | |
'nonillion', | |
'decillion', | |
//months | |
"january", | |
"february", | |
// "march", | |
"april", | |
// "may", | |
"june", | |
"july", | |
"august", | |
"september", | |
"october", | |
"november", | |
"december", | |
"jan", "feb", "mar", "apr", "jun", "jul", "aug", "sep", "oct", "nov", "dec", "sept", "sep", | |
//days | |
"monday", | |
"tuesday", | |
"wednesday", | |
"thursday", | |
"friday", | |
"saturday", | |
"sunday" | |
].reduce(function (h, s) { | |
h[s] = "CD" | |
return h | |
}, {}) | |
},{}],13:[function(require,module,exports){ | |
//most-frequent non-irregular verbs, to be conjugated for the lexicon | |
//this list is the seed, from which various forms are conjugated | |
module.exports = [ | |
"collapse", | |
"stake", | |
"forsee", | |
"suck", | |
"answer", | |
"argue", | |
"tend", | |
"examine", | |
"depend", | |
"form", | |
"figure", | |
"mind", | |
"surround", | |
"suspect", | |
"reflect", | |
"wonder", | |
"hope", | |
"end", | |
"thank", | |
"file", | |
"regard", | |
"report", | |
"imagine", | |
"consider", | |
"ensure", | |
"cause", | |
"work", | |
"enter", | |
"stop", | |
"defeat", | |
"surge", | |
"launch", | |
"turn", | |
"like", | |
"control", | |
"relate", | |
"remember", | |
"join", | |
"listen", | |
"train", | |
"spring", | |
"enjoy", | |
"fail", | |
"recognize", | |
"obtain", | |
"learn", | |
"fill", | |
"announce", | |
"prevent", | |
"achieve", | |
"realize", | |
"involve", | |
"remove", | |
"aid", | |
"visit", | |
"test", | |
"prepare", | |
"ask", | |
"carry", | |
"suppose", | |
"determine", | |
"raise", | |
"love", | |
"use", | |
"pull", | |
"improve", | |
"contain", | |
"offer", | |
"talk", | |
"pick", | |
"care", | |
"express", | |
"remain", | |
"operate", | |
"close", | |
"add", | |
"mention", | |
"support", | |
"decide", | |
"walk", | |
"vary", | |
"demand", | |
"describe", | |
"agree", | |
"happen", | |
"allow", | |
"suffer", | |
"study", | |
"press", | |
"watch", | |
"seem", | |
"occur", | |
"contribute", | |
"claim", | |
"compare", | |
"apply", | |
"direct", | |
"discuss", | |
"indicate", | |
"require", | |
"change", | |
"fix", | |
"reach", | |
"prove", | |
"expect", | |
"exist", | |
"play", | |
"permit", | |
"kill", | |
"charge", | |
"increase", | |
"believe", | |
"create", | |
"continue", | |
"live", | |
"help", | |
"represent", | |
"edit", | |
"serve", | |
"appear", | |
"cover", | |
"maintain", | |
"start", | |
"stay", | |
"move", | |
"extend", | |
"design", | |
"supply", | |
"suggest", | |
"want", | |
"approach", | |
"call", | |
"include", | |
"try", | |
"receive", | |
"save", | |
"discover", | |
"marry", | |
"need", | |
"establish", | |
"keep", | |
"assume", | |
"attend", | |
"unite", | |
"explain", | |
"publish", | |
"accept", | |
"settle", | |
"reduce", | |
"do", | |
"look", | |
"interact", | |
"concern", | |
"labor", | |
"return", | |
"select", | |
"die", | |
"provide", | |
"seek", | |
"wish", | |
"finish", | |
"follow", | |
"disagree", | |
"produce", | |
"attack", | |
"attempt", | |
"brake", | |
"brush", | |
"burn", | |
"bang", | |
"bomb", | |
"budget", | |
"comfort", | |
"cook", | |
"copy", | |
"cough", | |
"crush", | |
"cry", | |
"check", | |
"claw", | |
"clip", | |
"combine", | |
"damage", | |
"desire", | |
"doubt", | |
"drain", | |
"dance", | |
"decrease", | |
"defect", | |
"deposit", | |
"drift", | |
"dip", | |
"dive", | |
"divorce", | |
"dream", | |
"exchange", | |
"envy", | |
"exert", | |
"exercise", | |
"export", | |
"fold", | |
"flood", | |
"focus", | |
"forecast", | |
"fracture", | |
"grip", | |
"guide", | |
"guard", | |
"guarantee", | |
"guess", | |
"hate", | |
"heat", | |
"handle", | |
"hire", | |
"host", | |
"hunt", | |
"hurry", | |
"import", | |
"judge", | |
"jump", | |
"jam", | |
"kick", | |
"kiss", | |
"knock", | |
"laugh", | |
"lift", | |
"lock", | |
"lecture", | |
"link", | |
"load", | |
"loan", | |
"lump", | |
"melt", | |
"message", | |
"murder", | |
"neglect", | |
"overlap", | |
"overtake", | |
"overuse", | |
"print", | |
"protest", | |
"pump", | |
"push", | |
"post", | |
"progress", | |
"promise", | |
"purchase", | |
"regret", | |
"request", | |
"reward", | |
"roll", | |
"rub", | |
"rent", | |
"repair", | |
"sail", | |
"scale", | |
"screw", | |
"shock", | |
"sleep", | |
"slip", | |
"smash", | |
"smell", | |
"smoke", | |
"sneeze", | |
"snow", | |
"surprise", | |
"scratch", | |
"search", | |
"share", | |
"shave", | |
"spit", | |
"splash", | |
"stain", | |
"stress", | |
"switch", | |
"taste", | |
"touch", | |
"trade", | |
"trick", | |
"twist", | |
"trap", | |
"travel", | |
"tune", | |
"undergo", | |
"undo", | |
"uplift", | |
"vote", | |
"wash", | |
"wave", | |
"whistle", | |
"wreck", | |
"yawn", | |
"betray", | |
"restrict", | |
"perform", | |
"worry", | |
"point", | |
"activate", | |
"fear", | |
"plan", | |
"note", | |
"face", | |
"predict", | |
"differ", | |
"deserve", | |
"torture", | |
"recall", | |
"count", | |
"admit", | |
"insist", | |
"lack", | |
"pass", | |
"belong", | |
"complain", | |
"constitute", | |
"rely", | |
"refuse", | |
"range", | |
"cite", | |
"flash", | |
"arrive", | |
"reveal", | |
"consist", | |
"observe", | |
"notice", | |
"trust", | |
"display", | |
"view", | |
"stare", | |
"acknowledge", | |
"owe", | |
"gaze", | |
"treat", | |
"account", | |
"gather", | |
"address", | |
"confirm", | |
"estimate", | |
"manage", | |
"participate", | |
"sneak", | |
"drop", | |
"mirror", | |
"experience", | |
"strive", | |
"arch", | |
"dislike", | |
"favor", | |
"earn", | |
"emphasize", | |
"match", | |
"question", | |
"emerge", | |
"encourage", | |
"matter", | |
"name", | |
"head", | |
"line", | |
"slam", | |
"list", | |
"warn", | |
"ignore", | |
"resemble", | |
"feature", | |
"place", | |
"reverse", | |
"accuse", | |
"spoil", | |
"retain", | |
"survive", | |
"praise", | |
"function", | |
"please", | |
"date", | |
"remind", | |
"deliver", | |
"echo", | |
"engage", | |
"deny", | |
"yield", | |
"center", | |
"gain", | |
"anticipate", | |
"reason", | |
"side", | |
"thrive", | |
"defy", | |
"dodge", | |
"enable", | |
"applaud", | |
"bear", | |
"persist", | |
"pose", | |
"reject", | |
"attract", | |
"await", | |
"inhibit", | |
"declare", | |
"process", | |
"risk", | |
"urge", | |
"value", | |
"block", | |
"confront", | |
"credit", | |
"cross", | |
"amuse", | |
"dare", | |
"resent", | |
"smile", | |
"gloss", | |
"threaten", | |
"collect", | |
"depict", | |
"dismiss", | |
"submit", | |
"benefit", | |
"step", | |
"deem", | |
"limit", | |
"sense", | |
"issue", | |
"embody", | |
"force", | |
"govern", | |
"replace", | |
"bother", | |
"cater", | |
"adopt", | |
"empower", | |
"outweigh", | |
"alter", | |
"enrich", | |
"influence", | |
"prohibit", | |
"pursue", | |
"warrant", | |
"convey", | |
"approve", | |
"reserve", | |
"rest", | |
"strain", | |
"wander", | |
"adjust", | |
"dress", | |
"market", | |
"mingle", | |
"disapprove", | |
"evaluate", | |
"flow", | |
"inhabit", | |
"pop", | |
"rule", | |
"depart", | |
"roam", | |
"assert", | |
"disappear", | |
"envision", | |
"pause", | |
"afford", | |
"challenge", | |
"grab", | |
"grumble", | |
"house", | |
"portray", | |
"revel", | |
"base", | |
"conduct", | |
"review", | |
"stem", | |
"crave", | |
"mark", | |
"store", | |
"target", | |
"unlock", | |
"weigh", | |
"resist", | |
"drag", | |
"pour", | |
"reckon", | |
"assign", | |
"cling", | |
"rank", | |
"attach", | |
"decline", | |
"destroy", | |
"interfere", | |
"paint", | |
"skip", | |
"sprinkle", | |
"wither", | |
"allege", | |
"retire", | |
"score", | |
"monitor", | |
"expand", | |
"honor", | |
"pack", | |
"assist", | |
"float", | |
"appeal", | |
"stretch", | |
"undermine", | |
"assemble", | |
"boast", | |
"bounce", | |
"grasp", | |
"install", | |
"borrow", | |
"crack", | |
"elect", | |
"shout", | |
"contrast", | |
"overcome", | |
"relax", | |
"relent", | |
"strengthen", | |
"conform", | |
"dump", | |
"pile", | |
"scare", | |
"relive", | |
"resort", | |
"rush", | |
"boost", | |
"cease", | |
"command", | |
"excel", | |
"plug", | |
"plunge", | |
"proclaim", | |
"discourage", | |
"endure", | |
"ruin", | |
"stumble", | |
"abandon", | |
"cheat", | |
"convince", | |
"merge", | |
"convert", | |
"harm", | |
"multiply", | |
"overwhelm", | |
"chew", | |
"invent", | |
"bury", | |
"wipe", | |
"added", | |
"took", | |
"define", | |
"goes", | |
"measure", | |
"enhance", | |
"distinguish", | |
"avoid", | |
//contractions | |
"don't", | |
"won't", | |
"what's" //somewhat ambiguous (what does|what are) | |
] | |
},{}],14:[function(require,module,exports){ | |
//the parts of speech used by this library. mostly standard, but some changes. | |
module.exports = { | |
//verbs | |
"VB": { | |
"name": "verb, generic", | |
"parent": "verb", | |
"tag": "VB" | |
}, | |
"VBD": { | |
"name": "past-tense verb", | |
"parent": "verb", | |
"tense": "past", | |
"tag": "VBD" | |
}, | |
"VBN": { | |
"name": "past-participle verb", | |
"parent": "verb", | |
"tense": "past", | |
"tag": "VBN" | |
}, | |
"VBP": { | |
"name": "infinitive verb", | |
"parent": "verb", | |
"tense": "present", | |
"tag": "VBP" | |
}, | |
"VBF": { | |
"name": "future-tense verb", | |
"parent": "verb", | |
"tense": "future", | |
"tag": "VBF" | |
}, | |
"VBZ": { | |
"name": "present-tense verb", | |
"tense": "present", | |
"parent": "verb", | |
"tag": "VBZ" | |
}, | |
"CP": { | |
"name": "copula", | |
"parent": "verb", | |
"tag": "CP" | |
}, | |
"VBG": { | |
"name": "gerund verb", | |
"parent": "verb", | |
"tag": "VBG" | |
}, | |
//adjectives | |
"JJ": { | |
"name": "adjective, generic", | |
"parent": "adjective", | |
"tag": "JJ" | |
}, | |
"JJR": { | |
"name": "comparative adjective", | |
"parent": "adjective", | |
"tag": "JJR" | |
}, | |
"JJS": { | |
"name": "superlative adjective", | |
"parent": "adjective", | |
"tag": "JJS" | |
}, | |
//adverbs | |
"RB": { | |
"name": "adverb", | |
"parent": "adverb", | |
"tag": "RB" | |
}, | |
"RBR": { | |
"name": "comparative adverb", | |
"parent": "adverb", | |
"tag": "RBR" | |
}, | |
"RBS": { | |
"name": "superlative adverb", | |
"parent": "adverb", | |
"tag": "RBS" | |
}, | |
//nouns | |
"NN": { | |
"name": "noun, generic", | |
"parent": "noun", | |
"tag": "NN" | |
}, | |
"NNP": { | |
"name": "singular proper noun", | |
"parent": "noun", | |
"tag": "NNP" | |
}, | |
"NNA": { | |
"name": "noun, active", | |
"parent": "noun", | |
"tag": "NNA" | |
}, | |
"NNPA": { | |
"name": "noun, acronym", | |
"parent": "noun", | |
"tag": "NNPA" | |
}, | |
"NNPS": { | |
"name": "plural proper noun", | |
"parent": "noun", | |
"tag": "NNPS" | |
}, | |
"NNAB": { | |
"name": "noun, abbreviation", | |
"parent": "noun", | |
"tag": "NNAB" | |
}, | |
"NNS": { | |
"name": "plural noun", | |
"parent": "noun", | |
"tag": "NNS" | |
}, | |
"NNO": { | |
"name": "possessive noun", | |
"parent": "noun", | |
"tag": "NNO" | |
}, | |
"NNG": { | |
"name": "gerund noun", | |
"parent": "noun", | |
"tag": "VBG" | |
}, | |
"PP": { | |
"name": "possessive pronoun", | |
"parent": "noun", | |
"tag": "PP" | |
}, | |
//glue | |
"FW": { | |
"name": "foreign word", | |
"parent": "glue", | |
"tag": "FW" | |
}, | |
"CD": { | |
"name": "cardinal value, generic", | |
"parent": "value", | |
"tag": "CD" | |
}, | |
"DA": { | |
"name": "date", | |
"parent": "value", | |
"tag": "DA" | |
}, | |
"NU": { | |
"name": "number", | |
"parent": "value", | |
"tag": "NU" | |
}, | |
"IN": { | |
"name": "preposition", | |
"parent": "glue", | |
"tag": "IN" | |
}, | |
"MD": { | |
"name": "modal verb", | |
"parent": "verb", //dunno | |
"tag": "MD" | |
}, | |
"CC": { | |
"name": "co-ordating conjunction", | |
"parent": "glue", | |
"tag": "CC" | |
}, | |
"PRP": { | |
"name": "personal pronoun", | |
"parent": "noun", | |
"tag": "PRP" | |
}, | |
"DT": { | |
"name": "determiner", | |
"parent": "glue", | |
"tag": "DT" | |
}, | |
"UH": { | |
"name": "interjection", | |
"parent": "glue", | |
"tag": "UH" | |
}, | |
"EX": { | |
"name": "existential there", | |
"parent": "glue", | |
"tag": "EX" | |
} | |
} | |
},{}],15:[function(require,module,exports){ | |
// word suffixes with a high pos signal, generated with wordnet | |
//by spencer kelly spencermountain@gmail.com 2014 | |
var data = { | |
"NN": [ | |
"ceae", | |
"inae", | |
"idae", | |
"leaf", | |
"rgan", | |
"eman", | |
"sman", | |
"star", | |
"boat", | |
"tube", | |
"rica", | |
"tica", | |
"nica", | |
"auce", | |
"tics", | |
"ency", | |
"ancy", | |
"poda", | |
"tude", | |
"xide", | |
"body", | |
"weed", | |
"tree", | |
"rrel", | |
"stem", | |
"cher", | |
"icer", | |
"erer", | |
"ader", | |
"ncer", | |
"izer", | |
"ayer", | |
"nner", | |
"ates", | |
"ales", | |
"ides", | |
"rmes", | |
"etes", | |
"llet", | |
"uage", | |
"ings", | |
"aphy", | |
"chid", | |
"tein", | |
"vein", | |
"hair", | |
"tris", | |
"unit", | |
"cake", | |
"nake", | |
"illa", | |
"ella", | |
"icle", | |
"ille", | |
"etle", | |
"scle", | |
"cell", | |
"bell", | |
"bill", | |
"palm", | |
"toma", | |
"game", | |
"lamp", | |
"bone", | |
"mann", | |
"ment", | |
"wood", | |
"book", | |
"nson", | |
"agon", | |
"odon", | |
"dron", | |
"iron", | |
"tion", | |
"itor", | |
"ator", | |
"root", | |
"cope", | |
"tera", | |
"hora", | |
"lora", | |
"bird", | |
"worm", | |
"fern", | |
"horn", | |
"wort", | |
"ourt", | |
"stry", | |
"etry", | |
"bush", | |
"ness", | |
"gist", | |
"rata", | |
"lata", | |
"tata", | |
"moth", | |
"lity", | |
"nity", | |
"sity", | |
"rity", | |
"city", | |
"dity", | |
"vity", | |
"drug", | |
"dium", | |
"llum", | |
"trum", | |
"inum", | |
"lium", | |
"tium", | |
"atum", | |
"rium", | |
"icum", | |
"anum", | |
"nium", | |
"orum", | |
"icus", | |
"opus", | |
"chus", | |
"ngus", | |
"thus", | |
"rius", | |
"rpus" | |
], | |
"JJ": [ | |
"liac", | |
"siac", | |
"clad", | |
"deaf", | |
"xial", | |
"hial", | |
"chal", | |
"rpal", | |
"asal", | |
"rial", | |
"teal", | |
"oeal", | |
"vial", | |
"phal", | |
"sial", | |
"heal", | |
"rbal", | |
"neal", | |
"geal", | |
"dial", | |
"eval", | |
"bial", | |
"ugal", | |
"kian", | |
"izan", | |
"rtan", | |
"odan", | |
"llan", | |
"zian", | |
"eian", | |
"eyan", | |
"ndan", | |
"eban", | |
"near", | |
"unar", | |
"lear", | |
"liar", | |
"-day", | |
"-way", | |
"tech", | |
"sick", | |
"tuck", | |
"inct", | |
"unct", | |
"wide", | |
"endo", | |
"uddy", | |
"eedy", | |
"uted", | |
"aled", | |
"rred", | |
"oned", | |
"rted", | |
"obed", | |
"oped", | |
"ched", | |
"dded", | |
"cted", | |
"tied", | |
"eked", | |
"ayed", | |
"rked", | |
"teed", | |
"mmed", | |
"tred", | |
"awed", | |
"rbed", | |
"bbed", | |
"axed", | |
"bred", | |
"pied", | |
"cked", | |
"rced", | |
"ened", | |
"fied", | |
"lved", | |
"mned", | |
"kled", | |
"hted", | |
"lied", | |
"eted", | |
"rded", | |
"lued", | |
"rved", | |
"azed", | |
"oked", | |
"ghed", | |
"sked", | |
"emed", | |
"aded", | |
"ived", | |
"mbed", | |
"pted", | |
"zled", | |
"ored", | |
"pled", | |
"wned", | |
"afed", | |
"nied", | |
"aked", | |
"gued", | |
"oded", | |
"oved", | |
"oled", | |
"ymed", | |
"lled", | |
"bled", | |
"cled", | |
"eded", | |
"toed", | |
"ited", | |
"oyed", | |
"eyed", | |
"ured", | |
"omed", | |
"ixed", | |
"pped", | |
"ined", | |
"lted", | |
"iced", | |
"exed", | |
"nded", | |
"amed", | |
"owed", | |
"dged", | |
"nted", | |
"eged", | |
"nned", | |
"used", | |
"ibed", | |
"nced", | |
"umed", | |
"dled", | |
"died", | |
"rged", | |
"aped", | |
"oted", | |
"uled", | |
"ided", | |
"nked", | |
"aved", | |
"rled", | |
"rned", | |
"aned", | |
"rmed", | |
"lmed", | |
"aged", | |
"ized", | |
"eved", | |
"ofed", | |
"thed", | |
"ered", | |
"ared", | |
"ated", | |
"eled", | |
"sted", | |
"ewed", | |
"nsed", | |
"nged", | |
"lded", | |
"gged", | |
"osed", | |
"fled", | |
"shed", | |
"aced", | |
"ffed", | |
"tted", | |
"uced", | |
"iled", | |
"uded", | |
"ired", | |
"yzed", | |
"-fed", | |
"mped", | |
"iked", | |
"fted", | |
"imed", | |
"hree", | |
"llel", | |
"aten", | |
"lden", | |
"nken", | |
"apen", | |
"ozen", | |
"ober", | |
"-set", | |
"nvex", | |
"osey", | |
"laid", | |
"paid", | |
"xvii", | |
"xxii", | |
"-air", | |
"tair", | |
"icit", | |
"knit", | |
"nlit", | |
"xxiv", | |
"-six", | |
"-old", | |
"held", | |
"cile", | |
"ible", | |
"able", | |
"gile", | |
"full", | |
"-ply", | |
"bbly", | |
"ggly", | |
"zzly", | |
"-one", | |
"mane", | |
"mune", | |
"rung", | |
"uing", | |
"mant", | |
"yant", | |
"uant", | |
"pant", | |
"urnt", | |
"awny", | |
"eeny", | |
"ainy", | |
"orny", | |
"siny", | |
"tood", | |
"shod", | |
"-toe", | |
"d-on", | |
"-top", | |
"-for", | |
"odox", | |
"wept", | |
"eepy", | |
"oopy", | |
"hird", | |
"dern", | |
"worn", | |
"mart", | |
"ltry", | |
"oury", | |
"ngry", | |
"arse", | |
"bose", | |
"cose", | |
"mose", | |
"iose", | |
"gish", | |
"kish", | |
"pish", | |
"wish", | |
"vish", | |
"yish", | |
"owsy", | |
"ensy", | |
"easy", | |
"ifth", | |
"edth", | |
"urth", | |
"ixth", | |
"00th", | |
"ghth", | |
"ilty", | |
"orty", | |
"ifty", | |
"inty", | |
"ghty", | |
"kety", | |
"afty", | |
"irty", | |
"roud", | |
"true", | |
"wful", | |
"dful", | |
"rful", | |
"mful", | |
"gful", | |
"lful", | |
"hful", | |
"kful", | |
"iful", | |
"yful", | |
"sful", | |
"tive", | |
"cave", | |
"sive", | |
"five", | |
"cive", | |
"xxvi", | |
"urvy", | |
"nown", | |
"hewn", | |
"lown", | |
"-two", | |
"lowy", | |
"ctyl" | |
], | |
"VB": [ | |
"wrap", | |
"hear", | |
"draw", | |
"rlay", | |
"away", | |
"elay", | |
"duce", | |
"esce", | |
"elch", | |
"ooch", | |
"pick", | |
"huck", | |
"back", | |
"hack", | |
"ruct", | |
"lict", | |
"nect", | |
"vict", | |
"eact", | |
"tect", | |
"vade", | |
"lude", | |
"vide", | |
"rude", | |
"cede", | |
"ceed", | |
"ivel", | |
"hten", | |
"rken", | |
"shen", | |
"open", | |
"quer", | |
"over", | |
"efer", | |
"eset", | |
"uiet", | |
"pret", | |
"ulge", | |
"lign", | |
"pugn", | |
"othe", | |
"rbid", | |
"raid", | |
"veil", | |
"vail", | |
"roil", | |
"join", | |
"dain", | |
"feit", | |
"mmit", | |
"erit", | |
"voke", | |
"make", | |
"weld", | |
"uild", | |
"idle", | |
"rgle", | |
"otle", | |
"rble", | |
"self", | |
"fill", | |
"till", | |
"eels", | |
"sult", | |
"pply", | |
"sume", | |
"dime", | |
"lame", | |
"lump", | |
"rump", | |
"vene", | |
"cook", | |
"look", | |
"from", | |
"elop", | |
"grow", | |
"adow", | |
"ploy", | |
"sorb", | |
"pare", | |
"uire", | |
"jure", | |
"lore", | |
"surf", | |
"narl", | |
"earn", | |
"ourn", | |
"hirr", | |
"tort", | |
"-fry", | |
"uise", | |
"lyse", | |
"sise", | |
"hise", | |
"tise", | |
"nise", | |
"lise", | |
"rise", | |
"anse", | |
"gise", | |
"owse", | |
"oosh", | |
"resh", | |
"cuss", | |
"uess", | |
"sess", | |
"vest", | |
"inst", | |
"gest", | |
"fest", | |
"xist", | |
"into", | |
"ccur", | |
"ieve", | |
"eive", | |
"olve", | |
"down", | |
"-dye", | |
"laze", | |
"lyze", | |
"raze", | |
"ooze" | |
], | |
"RB": [ | |
"that", | |
"oubt", | |
"much", | |
"diem", | |
"high", | |
"atim", | |
"sely", | |
"nely", | |
"ibly", | |
"lely", | |
"dely", | |
"ally", | |
"gely", | |
"imly", | |
"tely", | |
"ully", | |
"ably", | |
"owly", | |
"vely", | |
"cely", | |
"mely", | |
"mply", | |
"ngly", | |
"exly", | |
"ffly", | |
"rmly", | |
"rely", | |
"uely", | |
"time", | |
"iori", | |
"oors", | |
"wise", | |
"orst", | |
"east", | |
"ways" | |
] | |
} | |
//convert it to an easier format | |
module.exports = Object.keys(data).reduce(function (h, k) { | |
data[k].forEach(function (w) { | |
h[w] = k | |
}) | |
return h | |
}, {}) | |
},{}],16:[function(require,module,exports){ | |
//regex patterns and parts of speech], | |
module.exports= [ | |
[".[cts]hy$", "JJ"], | |
[".[st]ty$", "JJ"], | |
[".[lnr]ize$", "VB"], | |
[".[gk]y$", "JJ"], | |
[".fies$", "VB"], | |
[".some$", "JJ"], | |
[".[nrtumcd]al$", "JJ"], | |
[".que$", "JJ"], | |
[".[tnl]ary$", "JJ"], | |
[".[di]est$", "JJS"], | |
["^(un|de|re)\\-[a-z]..", "VB"], | |
[".lar$", "JJ"], | |
["[bszmp]{2}y", "JJ"], | |
[".zes$", "VB"], | |
[".[icldtgrv]ent$", "JJ"], | |
[".[rln]ates$", "VBZ"], | |
[".[oe]ry$", "JJ"], | |
["[rdntkdhs]ly$", "RB"], | |
[".[lsrnpb]ian$", "JJ"], | |
[".[^aeiou]ial$", "JJ"], | |
[".[^aeiou]eal$", "JJ"], | |
[".[vrl]id$", "JJ"], | |
[".[ilk]er$", "JJR"], | |
[".ike$", "JJ"], | |
[".ends$", "VB"], | |
[".wards$", "RB"], | |
[".rmy$", "JJ"], | |
[".rol$", "NN"], | |
[".tors$", "NN"], | |
[".azy$", "JJ"], | |
[".where$", "RB"], | |
[".ify$", "VB"], | |
[".bound$", "JJ"], | |
[".ens$", "VB"], | |
[".oid$", "JJ"], | |
[".vice$", "NN"], | |
[".rough$", "JJ"], | |
[".mum$", "JJ"], | |
[".teen(th)?$", "CD"], | |
[".oses$", "VB"], | |
[".ishes$", "VB"], | |
[".ects$", "VB"], | |
[".tieth$", "CD"], | |
[".ices$", "NN"], | |
[".bles$", "VB"], | |
[".pose$", "VB"], | |
[".ions$", "NN"], | |
[".ean$", "JJ"], | |
[".[ia]sed$", "JJ"], | |
[".tized$", "VB"], | |
[".llen$", "JJ"], | |
[".fore$", "RB"], | |
[".ances$", "NN"], | |
[".gate$", "VB"], | |
[".nes$", "VB"], | |
[".less$", "RB"], | |
[".ried$", "JJ"], | |
[".gone$", "JJ"], | |
[".made$", "JJ"], | |
[".[pdltrkvyns]ing$", "JJ"], | |
[".tions$", "NN"], | |
[".tures$", "NN"], | |
[".ous$", "JJ"], | |
[".ports$", "NN"], | |
[". so$", "RB"], | |
[".ints$", "NN"], | |
[".[gt]led$", "JJ"], | |
["[aeiou].*ist$", "JJ"], | |
[".lked$", "VB"], | |
[".fully$", "RB"], | |
[".*ould$", "MD"], | |
["^-?[0-9]+(.[0-9]+)?$", "CD"], | |
["[a-z]*\\-[a-z]*\\-", "JJ"], | |
["[a-z]'s$", "NNO"], | |
[".'n$", "VB"], | |
[".'re$", "CP"], | |
[".'ll$", "MD"], | |
[".'t$", "VB"], | |
[".tches$", "VB"], | |
["^https?\:?\/\/[a-z0-9]", "CD"],//the colon is removed in normalisation | |
["^www\.[a-z0-9]", "CD"], | |
[".ize$", "VB"], | |
[".[^aeiou]ise$", "VB"], | |
[".[aeiou]te$", "VB"], | |
[".ea$", "NN"], | |
["[aeiou][pns]er$", "NN"], | |
[".ia$", "NN"], | |
[".sis$", "NN"], | |
[".[aeiou]na$", "NN"], | |
[".[^aeiou]ity$", "NN"], | |
[".[^aeiou]ium$", "NN"], | |
[".[^aeiou][ei]al$", "JJ"], | |
[".ffy$", "JJ"], | |
[".[^aeiou]ic$", "JJ"], | |
[".(gg|bb|zz)ly$", "JJ"], | |
[".[aeiou]my$", "JJ"], | |
[".[aeiou]ble$", "JJ"], | |
[".[^aeiou]ful$", "JJ"], | |
[".[^aeiou]ish$", "JJ"], | |
[".[^aeiou]ica$", "NN"], | |
["[aeiou][^aeiou]is$", "NN"], | |
["[^aeiou]ard$", "NN"], | |
["[^aeiou]ism$", "NN"], | |
[".[^aeiou]ity$", "NN"], | |
[".[^aeiou]ium$", "NN"], | |
[".[lstrn]us$", "NN"], | |
["..ic$", "JJ"], | |
["[aeiou][^aeiou]id$", "JJ"], | |
[".[^aeiou]ish$", "JJ"], | |
[".[^aeiou]ive$", "JJ"], | |
["[ea]{2}zy$", "JJ"], | |
].map(function(a) { | |
return { | |
reg: new RegExp(a[0], "i"), | |
pos: a[1] | |
} | |
}) | |
},{}],17:[function(require,module,exports){ | |
// convert british spellings into american ones | |
// built with patterns+exceptions from https://en.wikipedia.org/wiki/British_spelling | |
module.exports = function (str) { | |
var patterns = [ | |
// ise -> ize | |
{ | |
reg: /([^aeiou][iy])s(e|ed|es|ing)?$/, | |
repl: '$1z$2' | |
}, | |
// our -> or | |
{ | |
reg: /(..)our(ly|y|ite)?$/, | |
repl: '$1or$2' | |
}, | |
// re -> er | |
{ | |
reg: /([^cdnv])re(s)?$/, | |
repl: '$1er$2' | |
}, | |
// xion -> tion | |
{ | |
reg: /([aeiou])xion([ed])?$/, | |
repl: '$1tion$2' | |
}, | |
//logue -> log | |
{ | |
reg: /logue$/, | |
repl: 'log' | |
}, | |
// ae -> e | |
{ | |
reg: /([o|a])e/, | |
repl: 'e' | |
}, | |
//eing -> ing | |
{ | |
reg: /e(ing|able)$/, | |
repl: '$1' | |
}, | |
// illful -> ilful | |
{ | |
reg: /([aeiou]+[^aeiou]+[aeiou]+)ll(ful|ment|est|ing|or|er|ed)$/, //must be second-syllable | |
repl: '$1l$2' | |
} | |
] | |
for (var i = 0; i < patterns.length; i++) { | |
if (str.match(patterns[i].reg)) { | |
return str.replace(patterns[i].reg, patterns[i].repl) | |
} | |
} | |
return str | |
} | |
// console.log(americanize("synthesise")=="synthesize") | |
// console.log(americanize("synthesised")=="synthesized") | |
},{}],18:[function(require,module,exports){ | |
// convert american spellings into british ones | |
// built with patterns+exceptions from https://en.wikipedia.org/wiki/British_spelling | |
// (some patterns are only safe to do in one direction) | |
module.exports = function (str) { | |
var patterns = [ | |
// ise -> ize | |
{ | |
reg: /([^aeiou][iy])z(e|ed|es|ing)?$/, | |
repl: '$1s$2' | |
}, | |
// our -> or | |
// { | |
// reg: /(..)our(ly|y|ite)?$/, | |
// repl: '$1or$2', | |
// exceptions: [] | |
// }, | |
// re -> er | |
// { | |
// reg: /([^cdnv])re(s)?$/, | |
// repl: '$1er$2', | |
// exceptions: [] | |
// }, | |
// xion -> tion | |
// { | |
// reg: /([aeiou])xion([ed])?$/, | |
// repl: '$1tion$2', | |
// exceptions: [] | |
// }, | |
//logue -> log | |
// { | |
// reg: /logue$/, | |
// repl: 'log', | |
// exceptions: [] | |
// }, | |
// ae -> e | |
// { | |
// reg: /([o|a])e/, | |
// repl: 'e', | |
// exceptions: [] | |
// }, | |
//eing -> ing | |
// { | |
// reg: /e(ing|able)$/, | |
// repl: '$1', | |
// exceptions: [] | |
// }, | |
// illful -> ilful | |
{ | |
reg: /([aeiou]+[^aeiou]+[aeiou]+)l(ful|ment|est|ing|or|er|ed)$/, //must be second-syllable | |
repl: '$1ll$2', | |
exceptions: [] | |
} | |
] | |
for (var i = 0; i < patterns.length; i++) { | |
if (str.match(patterns[i].reg)) { | |
return str.replace(patterns[i].reg, patterns[i].repl) | |
} | |
} | |
return str | |
} | |
},{}],19:[function(require,module,exports){ | |
//chop a string into pronounced syllables | |
module.exports = function (str) { | |
var all = [] | |
//suffix fixes | |
var postprocess = function (arr) { | |
//trim whitespace | |
arr = arr.map(function (w) { | |
w = w.replace(/^ */, '') | |
w = w.replace(/ *$/, '') | |
return w | |
}) | |
if (arr.length > 2) { | |
return arr | |
} | |
var ones = [ | |
/^[^aeiou]?ion/, | |
/^[^aeiou]?ised/, | |
/^[^aeiou]?iled/ | |
] | |
var l = arr.length | |
if (l > 1) { | |
var suffix = arr[l - 2] + arr[l - 1]; | |
for (var i = 0; i < ones.length; i++) { | |
if (suffix.match(ones[i])) { | |
arr[l - 2] = arr[l - 2] + arr[l - 1]; | |
arr.pop(); | |
} | |
} | |
} | |
return arr | |
} | |
var doer = function (str) { | |
var vow = /[aeiouy]$/ | |
if (!str) { | |
return | |
} | |
var chars = str.split('') | |
var before = ""; | |
var after = ""; | |
var current = ""; | |
for (var i = 0; i < chars.length; i++) { | |
before = chars.slice(0, i).join('') | |
current = chars[i] | |
after = chars.slice(i + 1, chars.length).join('') | |
var candidate = before + chars[i] | |
//rules for syllables- | |
//it's a consonant that comes after a vowel | |
if (before.match(vow) && !current.match(vow)) { | |
if (after.match(/^e[sm]/)) { | |
candidate += "e" | |
after = after.replace(/^e/, '') | |
} | |
all.push(candidate) | |
return doer(after) | |
} | |
//unblended vowels ('noisy' vowel combinations) | |
if (candidate.match(/(eo|eu|ia|oa|ua|ui)$/i)) { //'io' is noisy, not in 'ion' | |
all.push(before) | |
all.push(current) | |
return doer(after) | |
} | |
} | |
//if still running, end last syllable | |
if (str.match(/[aiouy]/) || str.match(/ee$/)) { //allow silent trailing e | |
all.push(str) | |
} else { | |
all[all.length - 1] = (all[all.length - 1] || '') + str; //append it to the last one | |
} | |
} | |
str.split(/\s\-/).forEach(function (s) { | |
doer(s) | |
}) | |
all = postprocess(all) | |
//for words like 'tree' and 'free' | |
if (all.length === 0) { | |
all = [str] | |
} | |
return all | |
} | |
// console.log(syllables("suddenly").length === 3) | |
// console.log(syllables("tree")) | |
//broken | |
// console.log(syllables("birchtree")) | |
},{}],20:[function(require,module,exports){ | |
//split a string into all possible parts | |
module.exports = function (text, options) { | |
options = options || {} | |
var min_count = options.min_count || 1; // minimum hit-count | |
var max_size = options.max_size || 5; // maximum gram count | |
var REallowedChars = /[^a-zA-Z'\-]+/g; //Invalid characters are replaced with a whitespace | |
var i, j, k, textlen, s; | |
var keys = [null]; | |
var results = []; | |
//max_size++; | |
for (i = 1; i <= max_size; i++) { | |
keys.push({}); | |
} | |
// clean the text | |
text = text.replace(REallowedChars, " ").replace(/^\s+/, "").replace(/\s+$/, ""); | |
text = text.toLowerCase() | |
// Create a hash | |
text = text.split(/\s+/); | |
for (i = 0, textlen = text.length; i < textlen; i++) { | |
s = text[i]; | |
keys[1][s] = (keys[1][s] || 0) + 1; | |
for (j = 2; j <= max_size; j++) { | |
if (i + j <= textlen) { | |
s += " " + text[i + j - 1]; | |
keys[j][s] = (keys[j][s] || 0) + 1; | |
} else { | |
break | |
} | |
} | |
} | |
// map to array | |
i = undefined; | |
for (k = 1; k <= max_size; k++) { | |
results[k] = []; | |
var key = keys[k]; | |
for (i in key) { | |
if (key.hasOwnProperty(i) && key[i] >= min_count) { | |
results[k].push({ | |
"word": i, | |
"count": key[i], | |
"size": k | |
}) | |
} | |
} | |
} | |
results = results.filter(function (s) { | |
return s !== null | |
}) | |
results = results.map(function (r) { | |
r = r.sort(function (a, b) { | |
return b.count - a.count | |
}) | |
return r; | |
}); | |
return results | |
} | |
// s = ngram("i really think that we all really think it's all good") | |
// console.log(s) | |
},{}],21:[function(require,module,exports){ | |
//(Rule-based sentence boundary segmentation) - chop given text into its proper sentences. | |
// Ignore periods/questions/exclamations used in acronyms/abbreviations/numbers, etc. | |
// @spencermountain 2015 MIT | |
module.exports = function(text) { | |
var abbreviations = require("../../data/lexicon/abbreviations") | |
var sentences = []; | |
//first do a greedy-split.. | |
var chunks = text.split(/(\S.+?[.\?!])(?=\s+|$|")/g); | |
//date abbrevs. | |
//these are added seperately because they are not nouns | |
abbreviations = abbreviations.concat(["jan", "feb", "mar", "apr", "jun", "jul", "aug", "sep", "oct", "nov", "dec", "sept", "sep"]); | |
//detection of non-sentence chunks | |
var abbrev_reg = new RegExp("\\b(" + abbreviations.join("|") + ")[.!?] ?$", "i"); | |
var acronym_reg= new RegExp("[ |\.][A-Z]\.?$", "i") | |
var elipses_reg= new RegExp("\\.\\.\\.*$") | |
//loop through these chunks, and join the non-sentence chunks back together.. | |
var chunks_length = chunks.length; | |
for (i = 0; i < chunks_length; i++) { | |
if (chunks[i]) { | |
//trim whitespace | |
chunks[i] = chunks[i].replace(/^\s+|\s+$/g, ""); | |
//should this chunk be combined with the next one? | |
if (chunks[i+1] && chunks[i].match(abbrev_reg) || chunks[i].match(acronym_reg) || chunks[i].match(elipses_reg) ) { | |
chunks[i + 1] = ((chunks[i]||'') + " " + (chunks[i + 1]||'')).replace(/ +/g, " "); | |
} else if(chunks[i] && chunks[i].length>0){ //this chunk is a proper sentence.. | |
sentences.push(chunks[i]); | |
chunks[i] = ""; | |
} | |
} | |
} | |
//if we never got a sentence, return the given text | |
if (sentences.length === 0) { | |
return [text] | |
} | |
return sentences; | |
} | |
// console.log(sentence_parser('Tony is nice. He lives in Japan.').length === 2) | |
// console.log(sentence_parser('I like that Color').length === 1) | |
// console.log(sentence_parser("She was dead. He was ill.").length === 2) | |
// console.log(sentence_parser("i think it is good ... or else.").length == 1) | |
},{"../../data/lexicon/abbreviations":3}],22:[function(require,module,exports){ | |
//split a string into 'words' - as intended to be most helpful for this library. | |
var sentence_parser = require("./sentence") | |
var multiples = require("../../data/lexicon/multiples") | |
//these expressions ought to be one token, not two, because they are a distinct POS together | |
var multi_words = Object.keys(multiples).map(function (m) { | |
return m.split(' ') | |
}) | |
var normalise = function (str) { | |
if (!str) { | |
return "" | |
} | |
str = str.toLowerCase() | |
str = str.replace(/[,\.!:;\?\(\)]/, '') | |
str = str.replace(/’/g, "'") | |
str = str.replace(/"/g, "") | |
if (!str.match(/[a-z0-9]/i)) { | |
return '' | |
} | |
return str | |
} | |
var sentence_type = function (sentence) { | |
if (sentence.match(/\?$/)) { | |
return "interrogative"; | |
} else if (sentence.match(/\!$/)) { | |
return "exclamative"; | |
} else { | |
return "declarative"; | |
} | |
} | |
//some multi-word tokens should be combined here | |
var combine_multiples = function (arr) { | |
var better = [] | |
var normalised = arr.map(function (a) { | |
return normalise(a) | |
}) //cached results | |
for (var i = 0; i < arr.length; i++) { | |
for (var o = 0; o < multi_words.length; o++) { | |
if (arr[i + 1] && normalised[i] === multi_words[o][0] && normalised[i + 1] === multi_words[o][1]) { // | |
//we have a match | |
arr[i] = arr[i] + ' ' + arr[i + 1] | |
arr[i + 1] = null | |
break | |
} | |
} | |
better.push(arr[i]) | |
} | |
return better.filter(function (w) { | |
return w | |
}) | |
} | |
var tokenize = function (str) { | |
var sentences = sentence_parser(str) | |
return sentences.map(function (sentence) { | |
var arr = sentence.split(' '); | |
arr = combine_multiples(arr) | |
var tokens = arr.map(function (w, i) { | |
return { | |
text: w, | |
normalised: normalise(w), | |
title_case: (w.match(/^[A-Z][a-z]/) !== null), //use for merge-tokens | |
noun_capital: i > 0 && (w.match(/^[A-Z][a-z]/) !== null), //use for noun signal | |
punctuated: (w.match(/[,;:\(\)"]/) !== null) || undefined, | |
end: (i === (arr.length - 1)) || undefined, | |
start: (i === 0) || undefined | |
} | |
}) | |
return { | |
sentence: sentence, | |
tokens: tokens, | |
type: sentence_type(sentence) | |
} | |
}) | |
} | |
module.exports = tokenize | |
// console.log(tokenize("i live in new york")[0].tokens.length==4) | |
// console.log(tokenize("I speak optimistically of course.")[0].tokens.length==4) | |
// console.log(tokenize("Joe is 9")[0].tokens.length==3) | |
// console.log(tokenize("Joe in Toronto")[0].tokens.length==3) | |
// console.log(tokenize("I am mega-rich")[0].tokens.length==3) | |
},{"../../data/lexicon/multiples":9,"./sentence":21}],23:[function(require,module,exports){ | |
// a hugely-ignorant, and widely subjective transliteration of latin, cryllic, greek unicode characters to english ascii. | |
//http://en.wikipedia.org/wiki/List_of_Unicode_characters | |
//https://docs.google.com/spreadsheet/ccc?key=0Ah46z755j7cVdFRDM1A2YVpwa1ZYWlpJM2pQZ003M0E | |
//approximate visual (not semantic) relationship between unicode and ascii characters | |
var compact = { | |
"2": "²ƻ", | |
"3": "³ƷƸƹƺǮǯЗҘҙӞӟӠӡȜȝ", | |
"5": "Ƽƽ", | |
"8": "Ȣȣ", | |
"!": "¡", | |
"?": "¿Ɂɂ", | |
"a": "ªÀÁÂÃÄÅàáâãäåĀāĂ㥹ǍǎǞǟǠǡǺǻȀȁȂȃȦȧȺΆΑΔΛάαλАДадѦѧӐӑӒӓƛɅ", | |
"b": "ßþƀƁƂƃƄƅɃΒβϐϦБВЪЬбвъьѢѣҌҍҔҕƥƾ", | |
"c": "¢©ÇçĆćĈĉĊċČčƆƇƈȻȼͻͼͽϲϹϽϾϿЄСсєҀҁҪҫ", | |
"d": "ÐĎďĐđƉƊȡƋƌǷ", | |
"e": "ÈÉÊËèéêëĒēĔĕĖėĘęĚěƎƏƐǝȄȅȆȇȨȩɆɇΈΕΞΣέεξϱϵ϶ЀЁЕЭеѐёҼҽҾҿӖӗӘәӚӛӬӭ", | |
"f": "ƑƒϜϝӺӻ", | |
"g": "ĜĝĞğĠġĢģƓǤǥǦǧǴǵ", | |
"h": "ĤĥĦħƕǶȞȟΉΗЂЊЋНнђћҢңҤҥҺһӉӊ", | |
"I": "ÌÍÎÏ", | |
"i": "ìíîïĨĩĪīĬĭĮįİıƖƗȈȉȊȋΊΐΪίιϊІЇії", | |
"j": "ĴĵǰȷɈɉϳЈј", | |
"k": "ĶķĸƘƙǨǩΚκЌЖКжкќҚқҜҝҞҟҠҡ", | |
"l": "ĹĺĻļĽľĿŀŁłƚƪǀǏǐȴȽΙӀӏ", | |
"m": "ΜϺϻМмӍӎ", | |
"n": "ÑñŃńŅņŇňʼnŊŋƝƞǸǹȠȵΝΠήηϞЍИЙЛПийлпѝҊҋӅӆӢӣӤӥπ", | |
"o": "ÒÓÔÕÖØðòóôõöøŌōŎŏŐőƟƠơǑǒǪǫǬǭǾǿȌȍȎȏȪȫȬȭȮȯȰȱΌΘΟΦΩδθοσόϕϘϙϬϭϴОФоѲѳѺѻѼѽӦӧӨөӪӫ¤ƍΏ", | |
"p": "ƤƿΡρϷϸϼРрҎҏÞ", | |
"q": "Ɋɋ", | |
"r": "ŔŕŖŗŘřƦȐȑȒȓɌɍЃГЯгяѓҐґҒғӶӷſ", | |
"s": "ŚśŜŝŞşŠšƧƨȘșȿςϚϛϟϨϩЅѕ", | |
"t": "ŢţŤťŦŧƫƬƭƮȚțȶȾΓΤτϮϯТт҂Ҭҭ", | |
"u": "µÙÚÛÜùúûüŨũŪūŬŭŮůŰűŲųƯưƱƲǓǔǕǖǗǘǙǚǛǜȔȕȖȗɄΰμυϋύϑЏЦЧцџҴҵҶҷҸҹӋӌӇӈ", | |
"v": "ƔνѴѵѶѷ", | |
"w": "ŴŵƜωώϖϢϣШЩшщѡѿ", | |
"x": "×ΧχϗϰХхҲҳӼӽӾӿ", | |
"y": "¥ÝýÿŶŷŸƳƴȲȳɎɏΎΥΨΫγψϒϓϔЎУучўѰѱҮүҰұӮӯӰӱӲӳ", | |
"z": "ŹźŻżŽžƩƵƶȤȥɀΖζ" | |
} | |
//decompress data into an array | |
var data = [] | |
Object.keys(compact).forEach(function (k) { | |
compact[k].split('').forEach(function (s) { | |
data.push([s, k]) | |
}) | |
}) | |
//convert array to two hashes | |
var normaler = {} | |
var greek = {} | |
data.forEach(function (arr) { | |
normaler[arr[0]] = arr[1] | |
greek[arr[1]] = arr[0] | |
}) | |
var normalize = function (str, options) { | |
options = options || {} | |
options.percentage = options.percentage || 50 | |
var arr = str.split('').map(function (s) { | |
var r = Math.random() * 100 | |
if (normaler[s] && r < options.percentage) { | |
return normaler[s] || s | |
} else { | |
return s | |
} | |
}) | |
return arr.join('') | |
} | |
var denormalize = function (str, options) { | |
options = options || {} | |
options.percentage = options.percentage || 50 | |
var arr = str.split('').map(function (s) { | |
var r = Math.random() * 100 | |
if (greek[s] && r < options.percentage) { | |
return greek[s] || s | |
} else { | |
return s | |
} | |
}) | |
return arr.join('') | |
} | |
module.exports = { | |
normalize: normalize, | |
denormalize: denormalize | |
} | |
// s = "ӳžŽżźŹźӳžŽżźŹźӳžŽżźŹźӳžŽżźŹźӳžŽżźŹź" | |
// s = "Björk" | |
// console.log(normalize.normalize(s, { | |
// percentage: 100 | |
// })) | |
// s = "The quick brown fox jumps over the lazy dog" | |
// console.log(normalize.denormalize(s, { | |
// percentage: 100 | |
// })) | |
},{}],24:[function(require,module,exports){ | |
//these are adjectives that can become comparative + superlative with out "most/more" | |
//its a whitelist for conjugation | |
//this data is shared between comparative/superlative methods | |
module.exports= [ | |
"absurd", | |
"aggressive", | |
"alert", | |
"alive", | |
"awesome", | |
"beautiful", | |
"big", | |
"bitter", | |
"black", | |
"blue", | |
"bored", | |
"boring", | |
"brash", | |
"brave", | |
"brief", | |
"bright", | |
"broad", | |
"brown", | |
"calm", | |
"charming", | |
"cheap", | |
"clean", | |
"cold", | |
"cool", | |
"cruel", | |
"cute", | |
"damp", | |
"deep", | |
"dear", | |
"dead", | |
"dark", | |
"dirty", | |
"drunk", | |
"dull", | |
"eager", | |
"efficient", | |
"even", | |
"faint", | |
"fair", | |
"fanc", | |
"fast", | |
"fat", | |
"feeble", | |
"few", | |
"fierce", | |
"fine", | |
"flat", | |
"forgetful", | |
"frail", | |
"full", | |
"gentle", | |
"glib", | |
"great", | |
"green", | |
"gruesome", | |
"handsome", | |
"hard", | |
"harsh", | |
"high", | |
"hollow", | |
"hot", | |
"impolite", | |
"innocent", | |
"keen", | |
"kind", | |
"lame", | |
"lean", | |
"light", | |
"little", | |
"loose", | |
"long", | |
"loud", | |
"low", | |
"lush", | |
"macho", | |
"mean", | |
"meek", | |
"mellow", | |
"mundane", | |
"near", | |
"neat", | |
"new", | |
"nice", | |
"normal", | |
"odd", | |
"old", | |
"pale", | |
"pink", | |
"plain", | |
"poor", | |
"proud", | |
"purple", | |
"quick", | |
"rare", | |
"rapid", | |
"red", | |
"rich", | |
"ripe", | |
"rotten", | |
"round", | |
"rude", | |
"sad", | |
"safe", | |
"scarce", | |
"scared", | |
"shallow", | |
"sharp", | |
"short", | |
"shrill", | |
"simple", | |
"slim", | |
"slow", | |
"small", | |
"smart", | |
"smooth", | |
"soft", | |
"sore", | |
"sour", | |
"square", | |
"stale", | |
"steep", | |
"stiff", | |
"straight", | |
"strange", | |
"strong", | |
"sweet", | |
"swift", | |
"tall", | |
"tame", | |
"tart", | |
"tender", | |
"tense", | |
"thick", | |
"thin", | |
"tight", | |
"tough", | |
"vague", | |
"vast", | |
"vulgar", | |
"warm", | |
"weak", | |
"wet", | |
"white", | |
"wide", | |
"wild", | |
"wise", | |
"young", | |
"yellow", | |
"easy", | |
"narrow", | |
"late", | |
"early", | |
"soon", | |
"close", | |
"empty", | |
"dry", | |
"windy", | |
"noisy", | |
"thirsty", | |
"hungry", | |
"fresh", | |
"quiet", | |
"clear", | |
"heavy", | |
"happy", | |
"funny", | |
"lucky", | |
"pretty", | |
"important", | |
"interesting", | |
"attractive", | |
"dangerous", | |
"intellegent", | |
"pure", | |
"orange", | |
"large", | |
"firm", | |
"grand", | |
"formal", | |
"raw", | |
"weird", | |
"glad", | |
"mad", | |
"strict", | |
"tired", | |
"solid", | |
"extreme", | |
"mature", | |
"true", | |
"free", | |
"curly", | |
"angry" | |
].reduce(function(h,s){ | |
h[s]=true | |
return h | |
},{}) | |
},{}],25:[function(require,module,exports){ | |
//turn 'quick' into 'quickly' | |
var main = function (str) { | |
var irregulars = { | |
"idle": "idly", | |
"public": "publicly", | |
"vague": "vaguely", | |
"day": "daily", | |
"icy": "icily", | |
"single": "singly", | |
"female": "womanly", | |
"male": "manly", | |
"simple": "simply", | |
"whole": "wholly", | |
"special": "especially", | |
"straight": "straight", | |
"wrong": "wrong", | |
"fast": "fast", | |
"hard": "hard", | |
"late": "late", | |
"early": "early", | |
"well": "well", | |
"best": "best", | |
"latter": "latter", | |
"bad": "badly" | |
} | |
var dont = { | |
"foreign": 1, | |
"black": 1, | |
"modern": 1, | |
"next": 1, | |
"difficult": 1, | |
"degenerate": 1, | |
"young": 1, | |
"awake": 1, | |
"back": 1, | |
"blue": 1, | |
"brown": 1, | |
"orange": 1, | |
"complex": 1, | |
"cool": 1, | |
"dirty": 1, | |
"done": 1, | |
"empty": 1, | |
"fat": 1, | |
"fertile": 1, | |
"frozen": 1, | |
"gold": 1, | |
"grey": 1, | |
"gray": 1, | |
"green": 1, | |
"medium": 1, | |
"parallel": 1, | |
"outdoor": 1, | |
"unknown": 1, | |
"undersized": 1, | |
"used": 1, | |
"welcome": 1, | |
"yellow": 1, | |
"white": 1, | |
"fixed": 1, | |
"mixed": 1, | |
"super": 1, | |
"guilty": 1, | |
"tiny": 1, | |
"able": 1, | |
"unable": 1, | |
"same": 1, | |
"adult": 1 | |
} | |
var transforms = [{ | |
reg: /al$/i, | |
repl: 'ally' | |
}, { | |
reg: /ly$/i, | |
repl: 'ly' | |
}, { | |
reg: /(.{3})y$/i, | |
repl: '$1ily' | |
}, { | |
reg: /que$/i, | |
repl: 'quely' | |
}, { | |
reg: /ue$/i, | |
repl: 'uly' | |
}, { | |
reg: /ic$/i, | |
repl: 'ically' | |
}, { | |
reg: /ble$/i, | |
repl: 'bly' | |
}, { | |
reg: /l$/i, | |
repl: 'ly' | |
}] | |
var not_matches = [ | |
/airs$/, | |
/ll$/, | |
/ee.$/, | |
/ile$/ | |
] | |
if (dont[str]) { | |
return null | |
} | |
if (irregulars[str]) { | |
return irregulars[str] | |
} | |
if (str.length <= 3) { | |
return null | |
} | |
var i; | |
for (i = 0; i < not_matches.length; i++) { | |
if (str.match(not_matches[i])) { | |
return null | |
} | |
} | |
for (i = 0; i < transforms.length; i++) { | |
if (str.match(transforms[i].reg)) { | |
return str.replace(transforms[i].reg, transforms[i].repl) | |
} | |
} | |
return str + 'ly' | |
} | |
module.exports = main; | |
// console.log(adj_to_adv('direct')) | |
},{}],26:[function(require,module,exports){ | |
//turn 'quick' into 'quickly' | |
var convertables = require("./convertables") | |
var main = function (str) { | |
var irregulars = { | |
"grey": "greyer", | |
"gray": "grayer", | |
"green": "greener", | |
"yellow": "yellower", | |
"red": "redder", | |
"good": "better", | |
"well": "better", | |
"bad": "worse", | |
"sad": "sadder" | |
} | |
var dont = { | |
"overweight": 1, | |
"main": 1, | |
"nearby": 1, | |
"asleep": 1, | |
"weekly": 1, | |
"secret": 1, | |
"certain": 1 | |
} | |
var transforms = [{ | |
reg: /y$/i, | |
repl: 'ier' | |
}, { | |
reg: /([aeiou])t$/i, | |
repl: '$1tter' | |
}, { | |
reg: /([aeou])de$/i, | |
repl: '$1der' | |
}, { | |
reg: /nge$/i, | |
repl: 'nger' | |
}] | |
var matches = [ | |
/ght$/, | |
/nge$/, | |
/ough$/, | |
/ain$/, | |
/uel$/, | |
/[au]ll$/, | |
/ow$/, | |
/old$/, | |
/oud$/, | |
/e[ae]p$/ | |
] | |
var not_matches = [ | |
/ary$/, | |
/ous$/ | |
] | |
if (dont.hasOwnProperty(str)) { | |
return null | |
} | |
for (i = 0; i < transforms.length; i++) { | |
if (str.match(transforms[i].reg)) { | |
return str.replace(transforms[i].reg, transforms[i].repl) | |
} | |
} | |
if (convertables.hasOwnProperty(str)) { | |
if (str.match(/e$/)) { | |
return str + "r" | |
} else { | |
return str + "er" | |
} | |
} | |
if (irregulars.hasOwnProperty(str)) { | |
return irregulars[str] | |
} | |
var i; | |
for (i = 0; i < not_matches.length; i++) { | |
if (str.match(not_matches[i])) { | |
return "more " + str | |
} | |
} | |
for (i = 0; i < matches.length; i++) { | |
if (str.match(matches[i])) { | |
return str + "er" | |
} | |
} | |
return "more " + str | |
} | |
module.exports = main; | |
},{"./convertables":24}],27:[function(require,module,exports){ | |
//convert cute to cuteness | |
module.exports = function (w) { | |
var irregulars = { | |
"clean": "cleanliness", | |
"naivety": "naivety" | |
}; | |
if (!w) { | |
return ""; | |
} | |
if (irregulars.hasOwnProperty(w)) { | |
return irregulars[w]; | |
} | |
if (w.match(" ")) { | |
return w; | |
} | |
if (w.match(/w$/)) { | |
return w; | |
} | |
var transforms = [{ | |
"reg": /y$/, | |
"repl": 'iness' | |
}, { | |
"reg": /le$/, | |
"repl": 'ility' | |
}, { | |
"reg": /ial$/, | |
"repl": 'y' | |
}, { | |
"reg": /al$/, | |
"repl": 'ality' | |
}, { | |
"reg": /ting$/, | |
"repl": 'ting' | |
}, { | |
"reg": /ring$/, | |
"repl": 'ring' | |
}, { | |
"reg": /bing$/, | |
"repl": 'bingness' | |
}, { | |
"reg": /sing$/, | |
"repl": 'se' | |
}, { | |
"reg": /ing$/, | |
"repl": 'ment' | |
}, { | |
"reg": /ess$/, | |
"repl": 'essness' | |
}, { | |
"reg": /ous$/, | |
"repl": 'ousness' | |
}, ] | |
for (var i = 0; i < transforms.length; i++) { | |
if (w.match(transforms[i].reg)) { | |
return w.replace(transforms[i].reg, transforms[i].repl); | |
} | |
} | |
if (w.match(/s$/)) { | |
return w; | |
} | |
return w + "ness"; | |
}; | |
},{}],28:[function(require,module,exports){ | |
//turn 'quick' into 'quickest' | |
var convertables = require("./convertables") | |
module.exports = function (str) { | |
var irregulars = { | |
"nice": "nicest", | |
"late": "latest", | |
"hard": "hardest", | |
"inner": "innermost", | |
"outer": "outermost", | |
"far": "furthest", | |
"worse": "worst", | |
"bad": "worst", | |
"good": "best" | |
} | |
var dont = { | |
"overweight": 1, | |
"ready": 1 | |
} | |
var transforms = [{ | |
"reg": /y$/i, | |
"repl": 'iest' | |
}, { | |
"reg": /([aeiou])t$/i, | |
"repl": '$1ttest' | |
}, { | |
"reg": /([aeou])de$/i, | |
"repl": '$1dest' | |
}, { | |
"reg": /nge$/i, | |
"repl": 'ngest' | |
}] | |
var matches = [ | |
/ght$/, | |
/nge$/, | |
/ough$/, | |
/ain$/, | |
/uel$/, | |
/[au]ll$/, | |
/ow$/, | |
/oud$/, | |
/...p$/ | |
] | |
var not_matches = [ | |
/ary$/ | |
] | |
var generic_transformation = function (str) { | |
if (str.match(/e$/)) { | |
return str + "st" | |
} else { | |
return str + "est" | |
} | |
} | |
for (i = 0; i < transforms.length; i++) { | |
if (str.match(transforms[i].reg)) { | |
return str.replace(transforms[i].reg, transforms[i].repl) | |
} | |
} | |
if (convertables.hasOwnProperty(str)) { | |
return generic_transformation(str) | |
} | |
if (dont.hasOwnProperty(str)) { | |
return "most " + str | |
} | |
if (irregulars.hasOwnProperty(str)) { | |
return irregulars[str] | |
} | |
var i; | |
for (i = 0; i < not_matches.length; i++) { | |
if (str.match(not_matches[i])) { | |
return "most " + str | |
} | |
} | |
for (i = 0; i < matches.length; i++) { | |
if (str.match(matches[i])) { | |
return generic_transformation(str) | |
} | |
} | |
return "most " + str | |
} | |
},{"./convertables":24}],29:[function(require,module,exports){ | |
//wrapper for Adjective's methods | |
var Adjective = function (str, sentence, word_i) { | |
var the = this | |
the.word = str || ''; | |
var to_comparative = require("./conjugate/to_comparative") | |
var to_superlative = require("./conjugate/to_superlative") | |
var adj_to_adv = require("./conjugate/to_adverb") | |
var adj_to_noun = require("./conjugate/to_noun") | |
var parts_of_speech = require("../../data/parts_of_speech") | |
the.conjugate = function () { | |
return { | |
comparative: to_comparative(the.word), | |
superlative: to_superlative(the.word), | |
adverb: adj_to_adv(the.word), | |
noun: adj_to_noun(the.word) | |
} | |
} | |
the.which = (function () { | |
if (the.word.match(/..est$/)) { | |
return parts_of_speech['JJS'] | |
} | |
if (the.word.match(/..er$/)) { | |
return parts_of_speech['JJR'] | |
} | |
return parts_of_speech['JJ'] | |
})() | |
return the; | |
}; | |
module.exports = Adjective; | |
// console.log(new Adjective("crazy")) | |
},{"../../data/parts_of_speech":14,"./conjugate/to_adverb":25,"./conjugate/to_comparative":26,"./conjugate/to_noun":27,"./conjugate/to_superlative":28}],30:[function(require,module,exports){ | |
//turns 'quickly' into 'quick' | |
module.exports = function (str) { | |
var irregulars = { | |
"idly": "idle", | |
"sporadically": "sporadic", | |
"basically": "basic", | |
"grammatically": "grammatical", | |
"alphabetically": "alphabetical", | |
"economically": "economical", | |
"conically": "conical", | |
"politically": "political", | |
"vertically": "vertical", | |
"practically": "practical", | |
"theoretically": "theoretical", | |
"critically": "critical", | |
"fantastically": "fantastic", | |
"mystically": "mystical", | |
"pornographically": "pornographic", | |
"fully": "full", | |
"jolly": "jolly", | |
"wholly": "whole" | |
} | |
var transforms = [{ | |
"reg": /bly$/i, | |
"repl": 'ble' | |
}, { | |
"reg": /gically$/i, | |
"repl": 'gical' | |
}, { | |
"reg": /([rsdh])ically$/i, | |
"repl": '$1ical' | |
}, { | |
"reg": /ically$/i, | |
"repl": 'ic' | |
}, { | |
"reg": /uly$/i, | |
"repl": 'ue' | |
}, { | |
"reg": /ily$/i, | |
"repl": 'y' | |
}, { | |
"reg": /(.{3})ly$/i, | |
"repl": '$1' | |
}] | |
if (irregulars.hasOwnProperty(str)) { | |
return irregulars[str] | |
} | |
for (var i = 0; i < transforms.length; i++) { | |
if (str.match(transforms[i].reg)) { | |
return str.replace(transforms[i].reg, transforms[i].repl) | |
} | |
} | |
return str | |
} | |
// console.log(to_adjective('quickly') === 'quick') | |
// console.log(to_adjective('marvelously') === 'marvelous') | |
},{}],31:[function(require,module,exports){ | |
//wrapper for Adverb's methods | |
var Adverb = function (str, sentence, word_i) { | |
var the = this | |
the.word = str || ''; | |
var to_adjective = require("./conjugate/to_adjective") | |
var parts_of_speech = require("../../data/parts_of_speech") | |
the.conjugate = function () { | |
return { | |
adjective: to_adjective(the.word) | |
} | |
} | |
the.which = (function () { | |
if (the.word.match(/..est$/)) { | |
return parts_of_speech['RBS'] | |
} | |
if (the.word.match(/..er$/)) { | |
return parts_of_speech['RBR'] | |
} | |
return parts_of_speech['RB'] | |
})() | |
return the; | |
} | |
module.exports = Adverb; | |
// console.log(new Adverb("suddenly").conjugate()) | |
// console.log(adverbs.conjugate('powerfully')) | |
},{"../../data/parts_of_speech":14,"./conjugate/to_adjective":30}],32:[function(require,module,exports){ | |
//converts nouns from plural and singular, and viceversases | |
//some regex borrowed from pksunkara/inflect | |
//https://github.com/pksunkara/inflect/blob/master/lib/defaults.js | |
var uncountables = require("../../../data/lexicon/uncountables") | |
var irregular_nouns = require("../../../data/lexicon/irregular_nouns") | |
var i; | |
//words that shouldn't ever inflect, for metaphysical reasons | |
uncountable_nouns = uncountables.reduce(function (h, a) { | |
h[a] = true | |
return h | |
}, {}) | |
var titlecase = function (str) { | |
if (!str) { | |
return '' | |
} | |
return str.charAt(0).toUpperCase() + str.slice(1) | |
} | |
//these aren't nouns, but let's inflect them anyways | |
var irregulars = [ | |
["he", "they"], | |
["she", "they"], | |
["this", "these"], | |
["that", "these"], | |
["mine", "ours"], | |
["hers", "theirs"], | |
["his", "theirs"], | |
["i", "we"], | |
["move", "_s"], | |
["myself", "ourselves"], | |
["yourself", "yourselves"], | |
["himself", "themselves"], | |
["herself", "themselves"], | |
["themself", "themselves"], | |
["its", "theirs"], | |
["theirs", "_"] | |
] | |
irregulars = irregulars.concat(irregular_nouns) | |
var pluralize_rules = [ | |
[/(ax|test)is$/i, '$1es'], | |
[/(octop|vir|radi|nucle|fung|cact|stimul)us$/i, '$1i'], | |
[/(octop|vir)i$/i, '$1i'], | |
[/([rl])f$/i, '$1ves'], | |
[/(alias|status)$/i, '$1es'], | |
[/(bu)s$/i, '$1ses'], | |
[/(al|ad|at|er|et|ed|ad)o$/i, '$1oes'], | |
[/([ti])um$/i, '$1a'], | |
[/([ti])a$/i, '$1a'], | |
[/sis$/i, 'ses'], | |
[/(?:([^f])fe|([lr])f)$/i, '$1ves'], | |
[/(hive)$/i, '$1s'], | |
[/([^aeiouy]|qu)y$/i, '$1ies'], | |
[/(x|ch|ss|sh|s|z)$/i, '$1es'], | |
[/(matr|vert|ind|cort)(ix|ex)$/i, '$1ices'], | |
[/([m|l])ouse$/i, '$1ice'], | |
[/([m|l])ice$/i, '$1ice'], | |
[/^(ox)$/i, '$1en'], | |
[/^(oxen)$/i, '$1'], | |
[/(quiz)$/i, '$1zes'], | |
[/(antenn|formul|nebul|vertebr|vit)a$/i, '$1ae'], | |
[/(sis)$/i, 'ses'], | |
[/^(?!talis|.*hu)(.*)man$/i, '$1men'], | |
[/(.*)/i, '$1s'] | |
].map(function (a) { | |
return { | |
reg: a[0], | |
repl: a[1] | |
} | |
}) | |
var pluralize = function (str) { | |
var low = str.toLowerCase() | |
//uncountable | |
if (uncountable_nouns[low]) { | |
return str | |
} | |
//is it already plural? | |
if (is_plural(low) === true) { | |
return str | |
} | |
//irregular | |
var found = irregulars.filter(function (r) { | |
return r[0] === low | |
}) | |
if (found[0]) { | |
if (titlecase(low) === str) { //handle capitalisation properly | |
return titlecase(found[0][1]) | |
} else { | |
return found[0][1] | |
} | |
} | |
//inflect first word of preposition-phrase | |
if (str.match(/([a-z]*) (of|in|by|for) [a-z]/)) { | |
var first = (str.match(/^([a-z]*) (of|in|by|for) [a-z]/) || [])[1] | |
if (first) { | |
var better_first = pluralize(first) | |
return better_first + str.replace(first, '') | |
} | |
} | |
//regular | |
for (i = 0; i < pluralize_rules.length; i++) { | |
if (str.match(pluralize_rules[i].reg)) { | |
return str.replace(pluralize_rules[i].reg, pluralize_rules[i].repl) | |
} | |
} | |
} | |
var singularize_rules = [ | |
[/([^v])ies$/i, '$1y'], | |
[/ises$/i, 'isis'], | |
[/ives$/i, 'ife'], | |
[/(antenn|formul|nebul|vertebr|vit)ae$/i, '$1a'], | |
[/(octop|vir|radi|nucle|fung|cact|stimul)(i)$/i, '$1us'], | |
[/(buffal|tomat|tornad)(oes)$/i, '$1o'], | |
[/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i, '$1sis'], | |
[/(vert|ind|cort)(ices)$/i, '$1ex'], | |
[/(matr|append)(ices)$/i, '$1ix'], | |
[/(x|ch|ss|sh|s|z|o)es$/i, '$1'], | |
[/men$/i, 'man'], | |
[/(n)ews$/i, '$1ews'], | |
[/([ti])a$/i, '$1um'], | |
[/([^f])ves$/i, '$1fe'], | |
[/([lr])ves$/i, '$1f'], | |
[/([^aeiouy]|qu)ies$/i, '$1y'], | |
[/(s)eries$/i, '$1eries'], | |
[/(m)ovies$/i, '$1ovie'], | |
[/([m|l])ice$/i, '$1ouse'], | |
[/(cris|ax|test)es$/i, '$1is'], | |
[/(alias|status)es$/i, '$1'], | |
[/(ss)$/i, '$1'], | |
[/(ics)$/i, "$1"], | |
[/s$/i, ''] | |
].map(function (a) { | |
return { | |
reg: a[0], | |
repl: a[1] | |
} | |
}) | |
var singularize = function (str) { | |
var low = str.toLowerCase() | |
//uncountable | |
if (uncountable_nouns[low]) { | |
return str | |
} | |
//is it already singular? | |
if (is_plural(low) === false) { | |
return str | |
} | |
//irregular | |
var found = irregulars.filter(function (r) { | |
return r[1] === low | |
}) | |
if (found[0]) { | |
if (titlecase(low) === str) { //handle capitalisation properly | |
return titlecase(found[0][0]) | |
} else { | |
return found[0][0] | |
} | |
} | |
//inflect first word of preposition-phrase | |
if (str.match(/([a-z]*) (of|in|by|for) [a-z]/)) { | |
var first = str.match(/^([a-z]*) (of|in|by|for) [a-z]/) | |
if (first && first[1]) { | |
var better_first = singularize(first[1]) | |
return better_first + str.replace(first[1], '') | |
} | |
} | |
//regular | |
for (i = 0; i < singularize_rules.length; i++) { | |
if (str.match(singularize_rules[i].reg)) { | |
return str.replace(singularize_rules[i].reg, singularize_rules[i].repl) | |
} | |
} | |
return str | |
} | |
var is_plural = function (str) { | |
str = (str || '').toLowerCase() | |
//handle 'mayors of chicago' | |
var preposition = str.match(/([a-z]*) (of|in|by|for) [a-z]/) | |
if (preposition && preposition[1]) { | |
str = preposition[1] | |
} | |
// if it's a known irregular case | |
for (i = 0; i < irregulars.length; i++) { | |
if (irregulars[i][1] === str) { | |
return true | |
} | |
if (irregulars[i][0] === str) { | |
return false | |
} | |
} | |
//similar to plural/singularize rules, but not the same | |
var plural_indicators = [ | |
/(^v)ies$/i, | |
/ises$/i, | |
/ives$/i, | |
/(antenn|formul|nebul|vertebr|vit)ae$/i, | |
/(octop|vir|radi|nucle|fung|cact|stimul)i$/i, | |
/(buffal|tomat|tornad)oes$/i, | |
/(analy|ba|diagno|parenthe|progno|synop|the)ses$/i, | |
/(vert|ind|cort)ices$/i, | |
/(matr|append)ices$/i, | |
/(x|ch|ss|sh|s|z|o)es$/i, | |
/men$/i, | |
/news$/i, | |
/.tia$/i, | |
/(^f)ves$/i, | |
/(lr)ves$/i, | |
/(^aeiouy|qu)ies$/i, | |
/(m|l)ice$/i, | |
/(cris|ax|test)es$/i, | |
/(alias|status)es$/i, | |
/ics$/i | |
] | |
for (i = 0; i < plural_indicators.length; i++) { | |
if (str.match(plural_indicators[i])) { | |
return true | |
} | |
} | |
//similar to plural/singularize rules, but not the same | |
var singular_indicators = [ | |
/(ax|test)is$/i, | |
/(octop|vir|radi|nucle|fung|cact|stimul)us$/i, | |
/(octop|vir)i$/i, | |
/(rl)f$/i, | |
/(alias|status)$/i, | |
/(bu)s$/i, | |
/(al|ad|at|er|et|ed|ad)o$/i, | |
/(ti)um$/i, | |
/(ti)a$/i, | |
/sis$/i, | |
/(?:(^f)fe|(lr)f)$/i, | |
/hive$/i, | |
/(^aeiouy|qu)y$/i, | |
/(x|ch|ss|sh|z)$/i, | |
/(matr|vert|ind|cort)(ix|ex)$/i, | |
/(m|l)ouse$/i, | |
/(m|l)ice$/i, | |
/(antenn|formul|nebul|vertebr|vit)a$/i, | |
/.sis$/i, | |
/^(?!talis|.*hu)(.*)man$/i | |
] | |
for (i = 0; i < singular_indicators.length; i++) { | |
if (str.match(singular_indicators[i])) { | |
return false | |
} | |
} | |
// 'looks pretty plural' rules | |
if (str.match(/s$/) && !str.match(/ss$/) && str.length > 3) { //needs some lovin' | |
return true | |
} | |
return false | |
} | |
var inflect = function (str) { | |
if (uncountable_nouns[str]) { //uncountables shouldn't ever inflect | |
return { | |
plural: str, | |
singular: str | |
} | |
} | |
if (is_plural(str)) { | |
return { | |
plural: str, | |
singular: singularize(str) | |
} | |
} else { | |
return { | |
singular: str, | |
plural: pluralize(str) | |
} | |
} | |
} | |
module.exports = { | |
inflect: inflect, | |
is_plural: is_plural, | |
singularize: singularize, | |
pluralize: pluralize | |
} | |
// console.log(inflect.singularize('kisses')=="kiss") | |
// console.log(inflect.singularize('kiss')=="kiss") | |
// console.log(inflect.singularize('children')=="child") | |
// console.log(inflect.singularize('child')=="child") | |
// console.log(inflect.pluralize('gas')=="gases") | |
// console.log(inflect.pluralize('narrative')=="narratives") | |
// console.log(inflect.singularize('gases')=="gas") | |
// console.log(inflect.pluralize('video')=="videos") | |
// console.log(inflect.pluralize('photo')=="photos") | |
// console.log(inflect.pluralize('stomach')=="stomachs") | |
// console.log(inflect.pluralize('database')=="databases") | |
// console.log(inflect.pluralize('kiss')=="kisses") | |
// console.log(inflect.pluralize('towns')=="towns") | |
// console.log(inflect.pluralize('mayor of chicago')=="mayors of chicago") | |
// console.log(inflect.inflect('Index').plural=='Indices') | |
// console.log(inflect.is_plural('octopus')==false) | |
// console.log(inflect.is_plural('octopi')==true) | |
// console.log(inflect.is_plural('eyebrow')==false) | |
// console.log(inflect.is_plural('eyebrows')==true) | |
// console.log(inflect.is_plural('child')==false) | |
// console.log(inflect.is_plural('children')==true) | |
// console.log(inflect.singularize('mayors of chicago')=="mayor of chicago") | |
},{"../../../data/lexicon/irregular_nouns":8,"../../../data/lexicon/uncountables":11}],33:[function(require,module,exports){ | |
//chooses an indefinite aricle 'a/an' for a word | |
module.exports = function (str) { | |
if (!str) { | |
return null | |
} | |
var irregulars = { | |
"hour": "an", | |
"heir": "an", | |
"heirloom": "an", | |
"honest": "an", | |
"honour": "an", | |
"honor": "an", | |
"uber": "an" //german u | |
} | |
var is_acronym = function (s) { | |
//no periods | |
if (s.length <= 5 && s.match(/^[A-Z]*$/)) { | |
return true | |
} | |
//with periods | |
if (s.length >= 4 && s.match(/^([A-Z]\.)*$/)) { | |
return true | |
} | |
return false | |
} | |
//pronounced letters of acronyms that get a 'an' | |
var an_acronyms = { | |
A: true, | |
E: true, | |
F: true, | |
H: true, | |
I: true, | |
L: true, | |
M: true, | |
N: true, | |
O: true, | |
R: true, | |
S: true, | |
X: true | |
} | |
//'a' regexes | |
var a_regexs = [ | |
/^onc?e/i, //'wu' sound of 'o' | |
/^u[bcfhjkqrstn][aeiou]/i, // 'yu' sound for hard 'u' | |
/^eul/i | |
]; | |
//begin business time | |
//////////////////// | |
//explicit irregular forms | |
if (irregulars.hasOwnProperty(str)) { | |
return irregulars[str] | |
} | |
//spelled-out acronyms | |
if (is_acronym(str) && an_acronyms.hasOwnProperty(str.substr(0, 1))) { | |
return "an" | |
} | |
//'a' regexes | |
for (var i = 0; i < a_regexs.length; i++) { | |
if (str.match(a_regexs[i])) { | |
return "a" | |
} | |
} | |
//basic vowel-startings | |
if (str.match(/^[aeiou]/i)) { | |
return "an" | |
} | |
return "a" | |
} | |
// console.log(indefinite_article("wolf") === "a") | |
},{}],34:[function(require,module,exports){ | |
//wrapper for noun's methods | |
var Noun = function (str, sentence, word_i) { | |
var the = this | |
var token, next; | |
if (sentence !== undefined && word_i !== undefined) { | |
token = sentence.tokens[word_i] | |
next = sentence.tokens[word_i + i] | |
} | |
the.word = str || ''; | |
var parts_of_speech = require("../../data/parts_of_speech") | |
var firstnames = require("../../data/lexicon/firstnames") | |
var honourifics = require("../../data/lexicon/honourifics") | |
var inflect = require("./conjugate/inflect") | |
var indefinite_article = require("./indefinite_article") | |
//personal pronouns | |
var prps = { | |
"it": "PRP", | |
"they": "PRP", | |
"i": "PRP", | |
"them": "PRP", | |
"you": "PRP", | |
"she": "PRP", | |
"me": "PRP", | |
"he": "PRP", | |
"him": "PRP", | |
"her": "PRP", | |
"us": "PRP", | |
"we": "PRP", | |
"thou": "PRP" | |
} | |
var blacklist = { | |
"itself": 1, | |
"west": 1, | |
"western": 1, | |
"east": 1, | |
"eastern": 1, | |
"north": 1, | |
"northern": 1, | |
"south": 1, | |
"southern": 1, | |
"the": 1, | |
"one": 1, | |
"your": 1, | |
"my": 1, | |
"today": 1, | |
"yesterday": 1, | |
"tomorrow": 1, | |
"era": 1, | |
"century": 1, | |
"it": 1 | |
} | |
//for resolution of obama -> he -> his | |
var posessives = { | |
"his": "he", | |
"her": "she", | |
"hers": "she", | |
"their": "they", | |
"them": "they", | |
"its": "it" | |
} | |
the.is_acronym = function () { | |
var s = the.word | |
//no periods | |
if (s.length <= 5 && s.match(/^[A-Z]*$/)) { | |
return true | |
} | |
//with periods | |
if (s.length >= 4 && s.match(/^([A-Z]\.)*$/)) { | |
return true | |
} | |
return false | |
} | |
the.is_entity = function () { | |
if (!token) { | |
return false | |
} | |
if (token.normalised.length < 3 || !token.normalised.match(/[a-z]/i)) { | |
return false | |
} | |
//prepositions | |
if (prps[token.normalised]) { | |
return false | |
} | |
//blacklist | |
if (blacklist[token.normalised]) { | |
return false | |
} | |
//discredit specific nouns forms | |
if (token.pos) { | |
if (token.pos.tag == "NNA") { //eg. 'singer' | |
return false | |
} | |
if (token.pos.tag == "NNO") { //eg. "spencer's" | |
return false | |
} | |
if (token.pos.tag == "NNG") { //eg. 'walking' | |
return false | |
} | |
if (token.pos.tag == "NNP") { //yes! eg. 'Edinburough' | |
return true | |
} | |
} | |
//distinct capital is very good signal | |
if (token.noun_capital) { | |
return true | |
} | |
//multiple-word nouns are very good signal | |
if (token.normalised.match(/ /)) { | |
return true | |
} | |
//if it has an acronym/abbreviation, like 'business ltd.' | |
if (token.normalised.match(/\./)) { | |
return true | |
} | |
//appears to be a non-capital acronym, and not just caps-lock | |
if (token.normalised.length < 5 && token.text.match(/^[A-Z]*$/)) { | |
return true | |
} | |
//acronyms are a-ok | |
if (the.is_acronym()) { | |
return true | |
} | |
//else, be conservative | |
return false | |
} | |
the.conjugate = function () { | |
return inflect.inflect(the.word) | |
} | |
the.is_plural = function () { | |
return inflect.is_plural(the.word) | |
} | |
the.article = function () { | |
if (the.is_plural()) { | |
return "the" | |
} else { | |
return indefinite_article(the.word) | |
} | |
} | |
the.pluralize = function () { | |
return inflect.pluralize(the.word) | |
} | |
the.singularize = function () { | |
return inflect.singularize(the.word) | |
} | |
//uses common first-name list + honourifics to guess if this noun is the name of a person | |
the.is_person = function () { | |
var i, l; | |
//remove things that are often named after people | |
var blacklist = [ | |
"center", | |
"centre", | |
"memorial", | |
"school", | |
"bridge", | |
"university", | |
"house", | |
"college", | |
"square", | |
"park", | |
"foundation", | |
"institute", | |
"club", | |
"museum", | |
"arena", | |
"stadium", | |
"ss", | |
"of", | |
"the", | |
"for", | |
"and", | |
"&", | |
"co", | |
"sons" | |
] | |
l = blacklist.length | |
for (i = 0; i < l; i++) { | |
if (the.word.match(new RegExp("\\b" + blacklist[i] + "\\b", "i"))) { | |
return false | |
} | |
} | |
//see if noun has an honourific, like 'jr.' | |
l = honourifics.length; | |
for (i = 0; i < l; i++) { | |
if (the.word.match(new RegExp("\\b" + honourifics[i] + "\\.?\\b", 'i'))) { | |
return true | |
} | |
} | |
//see if noun has a known first-name | |
var names = the.word.split(' ').map(function (a) { | |
return a.toLowerCase() | |
}) | |
if (firstnames[names[0]]) { | |
return true | |
} | |
//(test middle name too, if there's one) | |
if (names.length > 2 && firstnames[names[1]]) { | |
return true | |
} | |
//if it has an initial between two words | |
if (the.word.match(/[a-z]{3,20} [a-z]\.? [a-z]{3,20}/i)) { | |
return true | |
} | |
return false | |
} | |
//decides if it deserves a he, she, they, or it | |
the.pronoun = function () { | |
//if it's a person try to classify male/female | |
if (the.is_person()) { | |
var names = the.word.split(' ').map(function (a) { | |
return a.toLowerCase() | |
}) | |
if (firstnames[names[0]] === "m" || firstnames[names[1]] == "m") { | |
return "he" | |
} | |
if (firstnames[names[0]] === "f" || firstnames[names[1]] == "f") { | |
return "she" | |
} | |
//test some honourifics | |
if (the.word.match(/^(mrs|miss|ms|misses|mme|mlle)\.? /, 'i')) { | |
return "she" | |
} | |
if (the.word.match(/\b(mr|mister|sr|jr)\b/, 'i')) { | |
return "he" | |
} | |
//if it's a known unisex name, don't try guess it. be safe. | |
if (firstnames[names[0]] === "a" || firstnames[names[1]] == "a") { | |
return "they" | |
} | |
//if we think it's a person, but still don't know the gender, do a little guessing | |
if (names[0].match(/[aeiy]$/)) { //if it ends in a 'ee or ah', female | |
return "she" | |
} | |
if (names[0].match(/[ou]$/)) { //if it ends in a 'oh or uh', male | |
return "he" | |
} | |
if (names[0].match(/(nn|ll|tt)/)) { //if it has double-consonants, female | |
return "she" | |
} | |
//fallback to 'singular-they' | |
return "they" | |
} | |
//not a person | |
if (the.is_plural()) { | |
return "they" | |
} | |
return "it" | |
} | |
//list of pronouns that refer to this named noun. "[obama] is cool, [he] is nice." | |
the.referenced_by = function () { | |
//if it's named-noun, look forward for the pronouns pointing to it -> '... he' | |
if (token && token.pos.tag !== "PRP" && token.pos.tag !== "PP") { | |
var prp = the.pronoun() | |
//look at rest of sentence | |
var interested = sentence.tokens.slice(word_i + 1, sentence.tokens.length) | |
//add next sentence too, could go further.. | |
if (sentence.next) { | |
interested = interested.concat(sentence.next.tokens) | |
} | |
//find the matching pronouns, and break if another noun overwrites it | |
var matches = [] | |
for (var i = 0; i < interested.length; i++) { | |
if (interested[i].pos.tag === "PRP" && (interested[i].normalised === prp || posessives[interested[i].normalised] === prp)) { | |
//this pronoun points at our noun | |
matches.push(interested[i]) | |
} else if (interested[i].pos.tag === "PP" && posessives[interested[i].normalised] === prp) { | |
//this posessive pronoun ('his/her') points at our noun | |
matches.push(interested[i]) | |
} else if (interested[i].pos.parent === "noun" && interested[i].analysis.pronoun() === prp) { | |
//this noun stops our further pursuit | |
break | |
} | |
} | |
return matches | |
} | |
return [] | |
} | |
// a pronoun that points at a noun mentioned previously '[he] is nice' | |
the.reference_to = function () { | |
//if it's a pronoun, look backwards for the first mention '[obama]... <-.. [he]' | |
if (token && (token.pos.tag === "PRP" || token.pos.tag === "PP")) { | |
var prp = token.normalised | |
var possessives={ | |
"his":"he", | |
"her":"she", | |
"their":"they" | |
} | |
if(possessives[prp]!==undefined){//support possessives | |
prp=possessives[prp] | |
} | |
//look at starting of this sentence | |
var interested = sentence.tokens.slice(0, word_i) | |
//add previous sentence, if applicable | |
if (sentence.last) { | |
interested = sentence.last.tokens.concat(interested) | |
} | |
//reverse the terms to loop through backward.. | |
interested = interested.reverse() | |
for (var i = 0; i < interested.length; i++) { | |
//it's a match | |
if (interested[i].pos.parent === "noun" && interested[i].pos.tag !== "PRP" && interested[i].analysis.pronoun() === prp) { | |
return interested[i] | |
} | |
} | |
} | |
} | |
//specifically which pos it is | |
the.which = (function () { | |
//posessive | |
if (the.word.match(/'s$/)) { | |
return parts_of_speech['NNO'] | |
} | |
//plural | |
// if (the.is_plural) { | |
// return parts_of_speech['NNS'] | |
// } | |
//generic | |
return parts_of_speech['NN'] | |
})() | |
return the; | |
} | |
module.exports = Noun; | |
// console.log(new Noun('farmhouse').is_entity()) | |
// console.log(new Noun("FBI").is_acronym()) | |
// console.log(new Noun("Tony Danza").is_person()) | |
// console.log(new Noun("Tony Danza").pronoun()=="he") | |
// console.log(new Noun("Tanya Danza").pronoun()=="she") | |
// console.log(new Noun("mrs. Taya Danza").pronoun()=="she") | |
// console.log(new Noun("Gool Tanya Danza").pronoun()=="she") | |
// console.log(new Noun("illi G. Danza").pronoun()=="she") | |
// console.log(new Noun("horses").pronoun()=="they") | |
},{"../../data/lexicon/firstnames":6,"../../data/lexicon/honourifics":7,"../../data/parts_of_speech":14,"./conjugate/inflect":32,"./indefinite_article":33}],35:[function(require,module,exports){ | |
//Parents are classes for each main part of speech, with appropriate methods | |
//load files if server-side, otherwise assume these are prepended already | |
var Adjective = require("./adjective/index"); | |
var Noun = require("./noun/index"); | |
var Adverb = require("./adverb/index"); | |
var Verb = require("./verb/index"); | |
var Value = require("./value/index"); | |
var parents = { | |
adjective: function(str, next, last, token) { | |
return new Adjective(str, next, last, token) | |
}, | |
noun: function(str, next, last, token) { | |
return new Noun(str, next, last, token) | |
}, | |
adverb: function(str, next, last, token) { | |
return new Adverb(str, next, last, token) | |
}, | |
verb: function(str, next, last, token) { | |
return new Verb(str, next, last, token) | |
}, | |
value: function(str, next, last, token) { | |
return new Value(str, next, last, token) | |
}, | |
glue: function(str, next, last, token) { | |
return {} | |
} | |
} | |
module.exports = parents; | |
},{"./adjective/index":29,"./adverb/index":31,"./noun/index":34,"./value/index":37,"./verb/index":44}],36:[function(require,module,exports){ | |
// #generates properly-formatted dates from free-text date forms | |
// #by spencer kelly 2014 | |
var months = "(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|aug|sept|oct|nov|dec),?"; | |
var days = "([0-9]{1,2}),?"; | |
var years = "([12][0-9]{3})"; | |
var to_obj = function (arr, places) { | |
return Object.keys(places).reduce(function (h, k) { | |
h[k] = arr[places[k]]; | |
return h; | |
}, {}); | |
} | |
var regexes = [{ | |
reg: String(months) + " " + String(days) + "-" + String(days) + " " + String(years), | |
example: "March 7th-11th 1987", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
month: 1, | |
day: 2, | |
to_day: 3, | |
year: 4 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: String(days) + " of " + String(months) + " to " + String(days) + " of " + String(months) + " " + String(years), | |
example: "28th of September to 5th of October 2008", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
day: 1, | |
month: 2, | |
to_day: 3, | |
to_month: 4, | |
to_year: 5 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: String(months) + " " + String(days) + " to " + String(months) + " " + String(days) + " " + String(years), | |
example: "March 7th to june 11th 1987", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
month: 1, | |
day: 2, | |
to_month: 3, | |
to_day: 4, | |
year: 5, | |
to_year: 5 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: "between " + String(days) + " " + String(months) + " and " + String(days) + " " + String(months) + " " + String(years), | |
example: "between 13 February and 15 February 1945", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
day: 1, | |
month: 2, | |
to_day: 3, | |
to_month: 4, | |
year: 5, | |
to_year: 5 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: "between " + String(months) + " " + String(days) + " and " + String(months) + " " + String(days) + " " + String(years), | |
example: "between March 7th and june 11th 1987", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
month: 1, | |
day: 2, | |
to_month: 3, | |
to_day: 4, | |
year: 5, | |
to_year: 5 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: String(months) + " " + String(days) + " " + String(years), | |
example: "March 1st 1987", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
month: 1, | |
day: 2, | |
year: 3 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: String(days) + " - " + String(days) + " of " + String(months) + " " + String(years), | |
example: "3rd - 5th of March 1969", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
day: 1, | |
to_day: 2, | |
month: 3, | |
year: 4 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: String(days) + " of " + String(months) + " " + String(years), | |
example: "3rd of March 1969", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
day: 1, | |
month: 2, | |
year: 3 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: String(months) + " " + years + ",? to " + String(months) + " " + String(years), | |
example: "September 1939 to April 1945", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
month: 1, | |
year: 2, | |
to_month: 3, | |
to_year: 4 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: String(months) + " " + String(years), | |
example: "March 1969", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
month: 1, | |
year: 2 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: String(months) + " " + days, | |
example: "March 18th", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
month: 1, | |
day: 2 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: String(days) + " of " + months, | |
example: "18th of March", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
month: 2, | |
day: 1 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: years + " ?- ?" + String(years), | |
example: "1997-1998", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
year: 1, | |
to_year: 2 | |
}; | |
return to_obj(arr, places); | |
} | |
}, { | |
reg: years, | |
example: "1998", | |
process: function (arr) { | |
if (!arr) { | |
arr = []; | |
} | |
var places = { | |
year: 1 | |
}; | |
return to_obj(arr, places); | |
} | |
}].map(function (o) { | |
o.reg = new RegExp(o.reg, "g"); | |
return o; | |
}); | |
//0 based months, 1 based days... | |
var months_obj = { | |
january: 0, | |
february: 1, | |
march: 2, | |
april: 3, | |
may: 4, | |
june: 5, | |
july: 6, | |
august: 7, | |
september: 8, | |
october: 9, | |
november: 10, | |
december: 11, | |
jan: 0, | |
feb: 1, | |
mar: 2, | |
apr: 3, | |
aug: 7, | |
sept: 8, | |
oct: 9, | |
nov: 10, | |
dec: 11 | |
}; | |
//thirty days hath september... | |
var last_dates = [31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]; | |
var preprocess = function (str) { | |
str = str.toLowerCase(); | |
str = str.replace(/([0-9])(th|rd|st)/g, '$1'); | |
return str; | |
}; | |
var postprocess = function (obj, options) { | |
var d; | |
d = new Date(); | |
options = options || {}; | |
obj.year = parseInt(obj.year, 10) || undefined; | |
obj.day = parseInt(obj.day, 10) || undefined; | |
obj.to_day = parseInt(obj.to_day, 10) || undefined; | |
obj.to_year = parseInt(obj.to_year, 10) || undefined; | |
obj.month = months_obj[obj.month]; | |
obj.to_month = months_obj[obj.to_month]; | |
//swap to_month and month | |
if (obj.to_month !== undefined && obj.month === undefined) { | |
obj.month = obj.to_month; | |
} | |
if (obj.to_month === undefined && obj.month !== undefined) { | |
obj.to_month = obj.month; | |
} | |
//swap to_year and year | |
if (obj.to_year && !obj.year) { | |
obj.year = obj.to_year; | |
} | |
if (!obj.to_year && obj.year && obj.to_month !== undefined) { | |
obj.to_year = obj.year; | |
} | |
if (options.assume_year && !obj.year) { | |
obj.year = d.getFullYear(); | |
} | |
//make sure date is in that month.. | |
if (obj.day !== undefined && (obj.day > 31 || (obj.month !== undefined && obj.day > last_dates[obj.month]))) { | |
obj.day = undefined; | |
} | |
//make sure to date is after from date. fail everything if so... | |
//todo: do this smarter | |
if (obj.to_month !== undefined && obj.to_month < obj.month) { | |
return {} | |
} | |
if (obj.to_year && obj.to_year < obj.year) { | |
obj.year = undefined; | |
obj.to_year = undefined; | |
} | |
//make sure date is in reasonable range (very opinionated) | |
if (obj.year > 2090 || obj.year < 1200) { | |
obj.year = undefined; | |
obj.to_year = undefined; | |
} | |
//format result better | |
obj = { | |
day: obj.day, | |
month: obj.month, | |
year: obj.year, | |
to: { | |
day: obj.to_day, | |
month: obj.to_month, | |
year: obj.to_year | |
} | |
}; | |
//add javascript date objects, if you can | |
if (obj.year && obj.day && obj.month !== undefined) { | |
obj.date_object = new Date(); | |
obj.date_object.setYear(obj.year); | |
obj.date_object.setMonth(obj.month); | |
obj.date_object.setDate(obj.day); | |
} | |
if (obj.to.year && obj.to.day && obj.to.month !== undefined) { | |
obj.to.date_object = new Date(); | |
obj.to.date_object.setYear(obj.to.year); | |
obj.to.date_object.setMonth(obj.to.month); | |
obj.to.date_object.setDate(obj.to.day); | |
} | |
//if we have enough data to return a result.. | |
if (obj.year || obj.month !== undefined) { | |
return obj; | |
} | |
return {}; | |
}; | |
//pass through sequence of regexes until tempate is matched.. | |
module.exports = function (str, options) { | |
options = options || {}; | |
str = preprocess(str) | |
var arr, good, clone_reg, obj; | |
var l = regexes.length; | |
for (var i = 0; i < l; i += 1) { | |
obj = regexes[i] | |
if (str.match(obj.reg)) { | |
clone_reg = new RegExp(obj.reg.source, "i"); //this avoids a memory-leak | |
arr = clone_reg.exec(str); | |
good = obj.process(arr); | |
return postprocess(good, options); | |
} | |
} | |
}; | |
// console.log(date_extractor("1998")) | |
// console.log(date_extractor("1999")) | |
},{}],37:[function(require,module,exports){ | |
//wrapper for value's methods | |
var Value = function (str, sentence, word_i) { | |
var the = this | |
the.word = str || ''; | |
var to_number = require("./to_number") | |
var date_extractor = require("./date_extractor") | |
var parts_of_speech = require("../../data/parts_of_speech") | |
the.date = function (options) { | |
options = options || {} | |
return date_extractor(the.word, options) | |
} | |
the.is_date = function () { | |
var months = /(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|aug|sept|oct|nov|dec)/i | |
var times = /1?[0-9]:[0-9]{2}/ | |
var days = /\b(monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tues|wed|thurs|fri|sat|sun)\b/i | |
if (the.word.match(months) || the.word.match(times) || the.word.match(days)) { | |
return true | |
} | |
return false | |
} | |
the.number = function () { | |
if (the.is_date()) { | |
return null | |
} | |
return to_number(the.word) | |
} | |
the.which = (function () { | |
if (the.date()) { | |
return parts_of_speech['DA'] | |
} | |
if (the.number()) { | |
return parts_of_speech['NU'] | |
} | |
return parts_of_speech['CD'] | |
})() | |
return the; | |
}; | |
module.exports = Value; | |
// console.log(new Value("fifty five").number()) | |
// console.log(new Value("june 5th 1998").date()) | |
},{"../../data/parts_of_speech":14,"./date_extractor":36,"./to_number":38}],38:[function(require,module,exports){ | |
// converts spoken numbers into integers "fifty seven point eight" -> 57.8 | |
// | |
// Spoken numbers take the following format | |
// [sixty five] (thousand) [sixty five] (hundred) [sixty five] | |
// aka: [one/teen/ten] (multiple) [one/teen/ten] (multiple) ... | |
// combile the [one/teen/ten]s as 'current_sum', then multiply it by its following multiple | |
// multiple not repeat | |
"use strict"; | |
//these sets of numbers each have different rules | |
//[tenth, hundreth, thousandth..] are ambiguous because they could be ordinal like fifth, or decimal like one-one-hundredth, so are ignored | |
var ones = { | |
'a': 1, | |
'zero': 0, | |
'one': 1, | |
'two': 2, | |
'three': 3, | |
'four': 4, | |
'five': 5, | |
'six': 6, | |
'seven': 7, | |
'eight': 8, | |
'nine': 9, | |
"first": 1, | |
"second": 2, | |
"third": 3, | |
"fourth": 4, | |
"fifth": 5, | |
"sixth": 6, | |
"seventh": 7, | |
"eighth": 8, | |
"ninth": 9 | |
} | |
var teens = { | |
'ten': 10, | |
'eleven': 11, | |
'twelve': 12, | |
'thirteen': 13, | |
'fourteen': 14, | |
'fifteen': 15, | |
'sixteen': 16, | |
'seventeen': 17, | |
'eighteen': 18, | |
'nineteen': 19, | |
"eleventh": 11, | |
"twelfth": 12, | |
"thirteenth": 13, | |
"fourteenth": 14, | |
"fifteenth": 15, | |
"sixteenth": 16, | |
"seventeenth": 17, | |
"eighteenth": 18, | |
"nineteenth": 19 | |
} | |
var tens = { | |
'twenty': 20, | |
'thirty': 30, | |
'forty': 40, | |
'fifty': 50, | |
'sixty': 60, | |
'seventy': 70, | |
'eighty': 80, | |
'ninety': 90, | |
"twentieth": 20, | |
"thirtieth": 30, | |
"fourtieth": 40, | |
"fiftieth": 50, | |
"sixtieth": 60, | |
"seventieth": 70, | |
"eightieth": 80, | |
"ninetieth": 90 | |
} | |
var multiple = { | |
'hundred': 100, | |
'grand': 1000, | |
'thousand': 1000, | |
'million': 1000000, | |
'billion': 1000000000, | |
'trillion': 1000000000000, | |
'quadrillion': 1000000000000000, | |
'quintillion': 1000000000000000000, | |
'sextillion': 1000000000000000000000, | |
'septillion': 1000000000000000000000000, | |
'octillion': 1000000000000000000000000000, | |
'nonillion': 1000000000000000000000000000000, | |
'decillion': 1000000000000000000000000000000000 | |
} | |
// var decimal_multiple={'tenth':0.1, 'hundredth':0.01, 'thousandth':0.001, 'millionth':0.000001,'billionth':0.000000001}; | |
var main = function (s) { | |
//remember these concerns for possible errors | |
var ones_done = false | |
var teens_done = false | |
var tens_done = false | |
var multiple_done = {} | |
var total = 0 | |
var global_multiplier = 1 | |
//pretty-printed numbers | |
s = s.replace(/, ?/g, '') | |
//parse-out currency | |
s = s.replace(/[$£€]/, '') | |
//try to finish-fast | |
if (s.match(/[0-9]\.[0-9]/) && parseFloat(s) == s) { | |
return parseFloat(s) | |
} | |
if (parseInt(s, 10) == s) { | |
return parseInt(s, 10) | |
} | |
//try to die fast. (phone numbers or times) | |
if (s.match(/[0-9][\-:][0-9]/)) { | |
return null | |
} | |
//support global multipliers, like 'half-million' by doing 'million' then multiplying by 0.5 | |
var mults = [{ | |
reg: /^(minus|negative)[\s\-]/i, | |
mult: -1 | |
}, { | |
reg: /^(a\s)?half[\s\-](of\s)?/i, | |
mult: 0.5 | |
}, { | |
reg: /^(a\s)?quarter[\s\-]/i, | |
mult: 0.25 | |
}] | |
for (i = 0; i < mults.length; i++) { | |
if (s.match(mults[i].reg)) { | |
global_multiplier = mults[i].mult | |
s = s.replace(mults[i].reg, '') | |
break; | |
} | |
} | |
//do each word in turn.. | |
var words = s.toString().split(/[\s\-]+/); | |
var w, x; | |
var current_sum = 0; | |
var local_multiplier = 1 | |
var decimal_mode = false | |
for (var i = 0; i < words.length; i++) { | |
w = words[i] | |
//skip 'and' eg. five hundred and twelve | |
if (w == "and") { | |
continue; | |
} | |
//..we're doing decimals now | |
if (w == "point" || w == "decimal") { | |
if (decimal_mode) { | |
return null | |
} //two point one point six | |
decimal_mode = true | |
total += current_sum | |
current_sum = 0 | |
ones_done = false | |
local_multiplier = 0.1 | |
continue; | |
} | |
//handle special rules following a decimal | |
if (decimal_mode) { | |
x = null | |
//allow consecutive ones in decimals eg. 'two point zero five nine' | |
if (ones[w] !== undefined) { | |
x = ones[w] | |
} | |
if (teens[w] !== undefined) { | |
x = teens[w] | |
} | |
if (parseInt(w, 10) == w) { | |
x = parseInt(w, 10) | |
} | |
if (!x) { | |
return null | |
} | |
if (x < 10) { | |
total += x * local_multiplier | |
local_multiplier = local_multiplier * 0.1 // next number is next decimal place | |
current_sum = 0 | |
continue; | |
} | |
//two-digit decimals eg. 'two point sixteen' | |
if (x < 100) { | |
total += x * (local_multiplier * 0.1) | |
local_multiplier = local_multiplier * 0.01 // next number is next decimal place | |
current_sum = 0 | |
continue; | |
} | |
} | |
//if it's already an actual number | |
if (w.match(/^[0-9]\.[0-9]$/)) { | |
current_sum += parseFloat(w) | |
continue; | |
} | |
if (parseInt(w, 10) == w) { | |
current_sum += parseInt(w, 10) | |
continue; | |
} | |
//ones rules | |
if (ones[w] !== undefined) { | |
if (ones_done) { | |
return null | |
} // eg. five seven | |
if (teens_done) { | |
return null | |
} // eg. five seventeen | |
ones_done = true | |
current_sum += ones[w] | |
continue; | |
} | |
//teens rules | |
if (teens[w]) { | |
if (ones_done) { | |
return null | |
} // eg. five seventeen | |
if (teens_done) { | |
return null | |
} // eg. fifteen seventeen | |
if (tens_done) { | |
return null | |
} // eg. sixty fifteen | |
teens_done = true | |
current_sum += teens[w] | |
continue; | |
} | |
//tens rules | |
if (tens[w]) { | |
if (ones_done) { | |
return null | |
} // eg. five seventy | |
if (teens_done) { | |
return null | |
} // eg. fiveteen seventy | |
if (tens_done) { | |
return null | |
} // eg. twenty seventy | |
tens_done = true | |
current_sum += tens[w] | |
continue; | |
} | |
//multiple rules | |
if (multiple[w]) { | |
if (multiple_done[w]) { | |
return null | |
} // eg. five hundred six hundred | |
multiple_done[w] = true | |
//reset our concerns. allow 'five hundred five' | |
ones_done = false | |
teens_done = false | |
tens_done = false | |
//case of 'hundred million', (2 consecutive multipliers) | |
if (current_sum === 0) { | |
total = total || 1 //dont ever multiply by 0 | |
total *= multiple[w] | |
} else { | |
current_sum *= multiple[w] | |
total += current_sum | |
} | |
current_sum = 0 | |
continue; | |
} | |
//if word is not a known thing now, die | |
return null | |
} | |
if (current_sum) { | |
total += (current_sum || 1) * local_multiplier | |
} | |
//combine with global multiplier, like 'minus' or 'half' | |
total = total * global_multiplier | |
return total | |
} | |
//kick it into module | |
module.exports = main; | |
// console.log(to_number("sixteen hundred")) | |
// console.log(to_number("a hundred")) | |
// console.log(to_number("four point seven seven")) | |
},{}],39:[function(require,module,exports){ | |
//turn a verb into its other grammatical forms. | |
var verb_to_doer = require("./to_doer") | |
var verb_irregulars = require("./verb_irregulars") | |
var verb_rules = require("./verb_rules") | |
var suffix_rules = require("./suffix_rules") | |
//this method is the slowest in the whole library, basically TODO:whaaa | |
var predict = function (w) { | |
var endsWith = function (str, suffix) { | |
return str.indexOf(suffix, str.length - suffix.length) !== -1; | |
} | |
var arr = Object.keys(suffix_rules); | |
for (i = 0; i < arr.length; i++) { | |
if (endsWith(w, arr[i])) { | |
return suffix_rules[arr[i]] | |
} | |
} | |
return "infinitive" | |
} | |
//fallback to this transformation if it has an unknown prefix | |
var fallback = function (w) { | |
var infinitive; | |
if (w.length > 4) { | |
infinitive = w.replace(/ed$/, ''); | |
} else { | |
infinitive = w.replace(/d$/, ''); | |
} | |
var present, past, gerund, doer; | |
if (w.match(/[^aeiou]$/)) { | |
gerund = w + "ing" | |
past = w + "ed" | |
if (w.match(/ss$/)) { | |
present = w + "es" //'passes' | |
} else { | |
present = w + "s" | |
} | |
doer = verb_to_doer(infinitive) | |
} else { | |
gerund = w.replace(/[aeiou]$/, 'ing') | |
past = w.replace(/[aeiou]$/, 'ed') | |
present = w.replace(/[aeiou]$/, 'es') | |
doer = verb_to_doer(infinitive) | |
} | |
return { | |
infinitive: infinitive, | |
present: present, | |
past: past, | |
gerund: gerund, | |
doer: doer, | |
future: "will " + infinitive | |
} | |
} | |
//make sure object has all forms | |
var fufill = function (obj, prefix) { | |
if (!obj.infinitive) { | |
return obj | |
} | |
if (!obj.gerund) { | |
obj.gerund = obj.infinitive + 'ing' | |
} | |
if (!obj.doer) { | |
obj.doer = verb_to_doer(obj.infinitive) | |
} | |
if (!obj.present) { | |
obj.present = obj.infinitive + 's' | |
} | |
if (!obj.past) { | |
obj.past = obj.infinitive + 'ed' | |
} | |
//add the prefix to all forms, if it exists | |
if (prefix) { | |
Object.keys(obj).forEach(function (k) { | |
obj[k] = prefix + obj[k] | |
}) | |
} | |
//future is 'will'+infinitive | |
if (!obj.future) { | |
obj.future = "will " + obj.infinitive | |
} | |
//perfect is 'have'+past-tense | |
if (!obj.perfect) { | |
obj.perfect = "have " + obj.past | |
} | |
//pluperfect is 'had'+past-tense | |
if (!obj.pluperfect) { | |
obj.pluperfect = "had " + obj.past | |
} | |
//future perfect is 'will have'+past-tense | |
if (!obj.future_perfect) { | |
obj.future_perfect = "will have " + obj.past | |
} | |
return obj | |
} | |
var main = function (w) { | |
if (w === undefined) { | |
return {} | |
} | |
//for phrasal verbs ('look out'), conjugate look, then append 'out' | |
var phrasal_reg = new RegExp("^(.*?) (in|out|on|off|behind|way|with|of|do|away|across|ahead|back|over|under|together|apart|up|upon|aback|down|about|before|after|around|to|forth|round|through|along|onto)$", 'i') | |
if (w.match(' ') && w.match(phrasal_reg)) { | |
var split = w.match(phrasal_reg, '') | |
var phrasal_verb = split[1] | |
var particle = split[2] | |
var result = main(phrasal_verb) //recursive | |
delete result["doer"] | |
Object.keys(result).forEach(function (k) { | |
if (result[k]) { | |
result[k] += " " + particle | |
} | |
}) | |
return result | |
} | |
//for pluperfect ('had tried') remove 'had' and call it past-tense | |
if (w.match(/^had [a-z]/i)) { | |
w = w.replace(/^had /i, '') | |
} | |
//for perfect ('have tried') remove 'have' and call it past-tense | |
if (w.match(/^have [a-z]/i)) { | |
w = w.replace(/^have /i, '') | |
} | |
//for future perfect ('will have tried') remove 'will have' and call it past-tense | |
if (w.match(/^will have [a-z]/i)) { | |
w = w.replace(/^will have /i, '') | |
} | |
//chop it if it's future-tense | |
w = w.replace(/^will /i, '') | |
//un-prefix the verb, and add it in later | |
var prefix = (w.match(/^(over|under|re|anti|full)\-?/i) || [])[0] | |
var verb = w.replace(/^(over|under|re|anti|full)\-?/i, '') | |
//check irregulars | |
var obj = {}; | |
var l = verb_irregulars.length | |
var x, i; | |
for (i = 0; i < l; i++) { | |
x = verb_irregulars[i] | |
if (verb === x.present || verb === x.gerund || verb === x.past || verb === x.infinitive) { | |
obj = JSON.parse(JSON.stringify(verb_irregulars[i])); // object 'clone' hack, to avoid mem leak | |
return fufill(obj, prefix) | |
} | |
} | |
//guess the tense, so we know which transormation to make | |
var predicted = predict(w) || 'infinitive' | |
//check against suffix rules | |
l = verb_rules[predicted].length | |
var r, keys; | |
for (i = 0; i < l; i++) { | |
r = verb_rules[predicted][i]; | |
if (w.match(r.reg)) { | |
obj[predicted] = w; | |
keys= Object.keys(r.repl) | |
for(var o=0; o<keys.length; o++){ | |
if (keys[o] === predicted) { | |
obj[keys[o]] = w | |
} else { | |
obj[keys[o]] = w.replace(r.reg, r.repl[keys[o]]) | |
} | |
} | |
return fufill(obj); | |
} | |
} | |
//produce a generic transformation | |
return fallback(w) | |
}; | |
module.exports = main; | |
// console.log(module.exports("walking")) | |
// console.log(module.exports("overtook")) | |
// console.log(module.exports("watch out")) | |
// console.log(module.exports("watch")) | |
// console.log(module.exports("smash")) | |
// console.log(module.exports("word")) | |
// // broken | |
// console.log(module.exports("read")) | |
// console.log(module.exports("free")) | |
// console.log(module.exports("flesh")) | |
// console.log(module.exports("branch")) | |
// console.log(module.exports("spred")) | |
// console.log(module.exports("bog")) | |
// console.log(module.exports("nod")) | |
// console.log(module.exports("had tried")) | |
// console.log(module.exports("have tried")) | |
},{"./suffix_rules":40,"./to_doer":41,"./verb_irregulars":42,"./verb_rules":43}],40:[function(require,module,exports){ | |
//generated from test data | |
var compact = { | |
"gerund": [ | |
"ing" | |
], | |
"infinitive": [ | |
"ate", | |
"ize", | |
"tion", | |
"rify", | |
"ress", | |
"ify", | |
"age", | |
"nce", | |
"ect", | |
"ise", | |
"ine", | |
"ish", | |
"ace", | |
"ash", | |
"ure", | |
"tch", | |
"end", | |
"ack", | |
"and", | |
"ute", | |
"ade", | |
"ock", | |
"ite", | |
"ase", | |
"ose", | |
"use", | |
"ive", | |
"int", | |
"nge", | |
"lay", | |
"est", | |
"ain", | |
"ant", | |
"eed", | |
"er", | |
"le" | |
], | |
"past": [ | |
"ed", | |
"lt", | |
"nt", | |
"pt", | |
"ew", | |
"ld" | |
], | |
"present": [ | |
"rks", | |
"cks", | |
"nks", | |
"ngs", | |
"mps", | |
"tes", | |
"zes", | |
"ers", | |
"les", | |
"acks", | |
"ends", | |
"ands", | |
"ocks", | |
"lays", | |
"eads", | |
"lls", | |
"els", | |
"ils", | |
"ows", | |
"nds", | |
"ays", | |
"ams", | |
"ars", | |
"ops", | |
"ffs", | |
"als", | |
"urs", | |
"lds", | |
"ews", | |
"ips", | |
"es", | |
"ts", | |
"ns", | |
"s" | |
] | |
} | |
var suffix_rules = {} | |
var keys = Object.keys(compact) | |
var l = keys.length; | |
var l2, i; | |
for (i = 0; i < l; i++) { | |
l2 = compact[keys[i]].length | |
for (var o = 0; o < l2; o++) { | |
suffix_rules[compact[keys[i]][o]] = keys[i] | |
} | |
} | |
module.exports = suffix_rules; | |
},{}],41:[function(require,module,exports){ | |
//somone who does this present-tense verb | |
//turn 'walk' into 'walker' | |
module.exports = function (str) { | |
str = str || '' | |
var irregulars = { | |
"tie": "tier", | |
"dream": "dreamer", | |
"sail": "sailer", | |
"run": "runner", | |
"rub": "rubber", | |
"begin": "beginner", | |
"win": "winner", | |
"claim": "claimant", | |
"deal": "dealer", | |
"spin": "spinner" | |
} | |
var dont = { | |
"aid": 1, | |
"fail": 1, | |
"appear": 1, | |
"happen": 1, | |
"seem": 1, | |
"try": 1, | |
"say": 1, | |
"marry": 1, | |
"be": 1, | |
"forbid": 1, | |
"understand": 1, | |
"bet": 1 | |
} | |
var transforms = [{ | |
"reg": /e$/i, | |
"repl": 'er' | |
}, { | |
"reg": /([aeiou])([mlgp])$/i, | |
"repl": '$1$2$2er' | |
}, { | |
"reg": /([rlf])y$/i, | |
"repl": '$1ier' | |
}, { | |
"reg": /^(.?.[aeiou])t$/i, | |
"repl": '$1tter' | |
}] | |
if (dont.hasOwnProperty(str)) { | |
return null | |
} | |
if (irregulars.hasOwnProperty(str)) { | |
return irregulars[str] | |
} | |
for (var i = 0; i < transforms.length; i++) { | |
if (str.match(transforms[i].reg)) { | |
return str.replace(transforms[i].reg, transforms[i].repl) | |
} | |
} | |
return str + "er" | |
} | |
// console.log(verb_to_doer('set')) | |
// console.log(verb_to_doer('sweep')) | |
// console.log(verb_to_doer('watch')) | |
},{}],42:[function(require,module,exports){ | |
var types = [ | |
'infinitive', | |
'gerund', | |
'past', | |
'present', | |
'doer', | |
'future' | |
] | |
//list of verb irregular verb forms, compacted to save space. ('_' -> infinitive ) | |
var compact = [ | |
[ | |
"arise", | |
"arising", | |
"arose", | |
"_s", | |
"_r" | |
], | |
[ | |
"babysit", | |
"_ting", | |
"babysat", | |
"_s", | |
"_ter" | |
], | |
[ | |
"be", | |
"_ing", | |
"was", | |
"is", | |
"" | |
], | |
[ | |
"beat", | |
"_ing", | |
"_", | |
"_s", | |
"_er" | |
], | |
[ | |
"become", | |
"becoming", | |
"became", | |
"_s", | |
"_r" | |
], | |
[ | |
"bend", | |
"_ing", | |
"bent", | |
"_s", | |
"_er" | |
], | |
[ | |
"begin", | |
"_ning", | |
"began", | |
"_s", | |
"_ner" | |
], | |
[ | |
"bet", | |
"_ting", | |
"_", | |
"_s", | |
"_ter" | |
], | |
[ | |
"bind", | |
"_ing", | |
"bound", | |
"_s", | |
"_er" | |
], | |
[ | |
"bite", | |
"biting", | |
"bit", | |
"_s", | |
"_r" | |
], | |
[ | |
"bleed", | |
"_ing", | |
"bled", | |
"_s", | |
"_er" | |
], | |
[ | |
"blow", | |
"_ing", | |
"blew", | |
"_s", | |
"_er" | |
], | |
[ | |
"break", | |
"_ing", | |
"broke", | |
"_s", | |
"_er" | |
], | |
[ | |
"breed", | |
"_ing", | |
"bred", | |
"_s", | |
"_er" | |
], | |
[ | |
"bring", | |
"_ing", | |
"brought", | |
"_s", | |
"_er" | |
], | |
[ | |
"broadcast", | |
"_ing", | |
"_", | |
"_s", | |
"_er" | |
], | |
[ | |
"build", | |
"_ing", | |
"built", | |
"_s", | |
"_er" | |
], | |
[ | |
"buy", | |
"_ing", | |
"bought", | |
"_s", | |
"_er" | |
], | |
[ | |
"catch", | |
"_ing", | |
"caught", | |
"_es", | |
"_er" | |
], | |
[ | |
"choose", | |
"choosing", | |
"chose", | |
"_s", | |
"_r" | |
], | |
[ | |
"come", | |
"coming", | |
"came", | |
"_s", | |
"_r" | |
], | |
[ | |
"cost", | |
"_ing", | |
"_", | |
"_s", | |
"_er" | |
], | |
[ | |
"cut", | |
"_ting", | |
"_", | |
"_s", | |
"_ter" | |
], | |
[ | |
"deal", | |
"_ing", | |
"_t", | |
"_s", | |
"_er" | |
], | |
[ | |
"dig", | |
"_ging", | |
"dug", | |
"_s", | |
"_ger" | |
], | |
[ | |
"do", | |
"_ing", | |
"did", | |
"_es", | |
"_er" | |
], | |
[ | |
"draw", | |
"_ing", | |
"drew", | |
"_s", | |
"_er" | |
], | |
[ | |
"drink", | |
"_ing", | |
"drank", | |
"_s", | |
"_er" | |
], | |
[ | |
"drive", | |
"driving", | |
"drove", | |
"_s", | |
"_r" | |
], | |
[ | |
"eat", | |
"_ing", | |
"ate", | |
"_s", | |
"_er" | |
], | |
[ | |
"fall", | |
"_ing", | |
"fell", | |
"_s", | |
"_er" | |
], | |
[ | |
"feed", | |
"_ing", | |
"fed", | |
"_s", | |
"_er" | |
], | |
[ | |
"feel", | |
"_ing", | |
"felt", | |
"_s", | |
"_er" | |
], | |
[ | |
"fight", | |
"_ing", | |
"fought", | |
"_s", | |
"_er" | |
], | |
[ | |
"find", | |
"_ing", | |
"found", | |
"_s", | |
"_er" | |
], | |
[ | |
"fly", | |
"_ing", | |
"flew", | |
"_s", | |
"flier" | |
], | |
[ | |
"forbid", | |
"_ing", | |
"forbade", | |
"_s", | |
], | |
[ | |
"forget", | |
"_ing", | |
"forgot", | |
"_s", | |
"_er" | |
], | |
[ | |
"forgive", | |
"forgiving", | |
"forgave", | |
"_s", | |
"_r" | |
], | |
[ | |
"freeze", | |
"freezing", | |
"froze", | |
"_s", | |
"_r" | |
], | |
[ | |
"get", | |
"_ting", | |
"got", | |
"_s", | |
"_ter" | |
], | |
[ | |
"give", | |
"giving", | |
"gave", | |
"_s", | |
"_r" | |
], | |
[ | |
"go", | |
"_ing", | |
"went", | |
"_es", | |
"_er" | |
], | |
[ | |
"grow", | |
"_ing", | |
"grew", | |
"_s", | |
"_er" | |
], | |
[ | |
"hang", | |
"_ing", | |
"hung", | |
"_s", | |
"_er" | |
], | |
[ | |
"have", | |
"having", | |
"had", | |
"has", | |
], | |
[ | |
"hear", | |
"_ing", | |
"_d", | |
"_s", | |
"_er" | |
], | |
[ | |
"hide", | |
"hiding", | |
"hid", | |
"_s", | |
"_r" | |
], | |
[ | |
"hit", | |
"_ting", | |
"_", | |
"_s", | |
"_ter" | |
], | |
[ | |
"hold", | |
"_ing", | |
"held", | |
"_s", | |
"_er" | |
], | |
[ | |
"hurt", | |
"_ing", | |
"_", | |
"_s", | |
"_er" | |
], | |
[ | |
"know", | |
"_ing", | |
"knew", | |
"_s", | |
"_er" | |
], | |
[ | |
"relay", | |
"_ing", | |
"_ed", | |
"_s", | |
"_er" | |
], | |
[ | |
"lay", | |
"_ing", | |
"laid", | |
"_s", | |
"_er" | |
], | |
[ | |
"lead", | |
"_ing", | |
"led", | |
"_s", | |
"_er" | |
], | |
[ | |
"leave", | |
"leaving", | |
"left", | |
"_s", | |
"_r" | |
], | |
[ | |
"lend", | |
"_ing", | |
"lent", | |
"_s", | |
"_er" | |
], | |
[ | |
"let", | |
"_ting", | |
"_", | |
"_s", | |
"_ter" | |
], | |
[ | |
"lie", | |
"lying", | |
"lay", | |
"_s", | |
"_r" | |
], | |
[ | |
"light", | |
"_ing", | |
"lit", | |
"_s", | |
"_er" | |
], | |
[ | |
"lose", | |
"losing", | |
"lost", | |
"_s", | |
"_r" | |
], | |
[ | |
"make", | |
"making", | |
"made", | |
"_s", | |
"_r" | |
], | |
[ | |
"mean", | |
"_ing", | |
"_t", | |
"_s", | |
"_er" | |
], | |
[ | |
"meet", | |
"_ing", | |
"met", | |
"_s", | |
"_er" | |
], | |
[ | |
"pay", | |
"_ing", | |
"paid", | |
"_s", | |
"_er" | |
], | |
[ | |
"put", | |
"_ting", | |
"_", | |
"_s", | |
"_ter" | |
], | |
[ | |
"quit", | |
"_ting", | |
"_", | |
"_s", | |
"_ter" | |
], | |
[ | |
"read", | |
"_ing", | |
"_", | |
"_s", | |
"_er" | |
], | |
[ | |
"ride", | |
"riding", | |
"rode", | |
"_s", | |
"_r" | |
], | |
[ | |
"ring", | |
"_ing", | |
"rang", | |
"_s", | |
"_er" | |
], | |
[ | |
"rise", | |
"rising", | |
"rose", | |
"_s", | |
"_r" | |
], | |
[ | |
"run", | |
"_ning", | |
"ran", | |
"_s", | |
"_ner" | |
], | |
[ | |
"say", | |
"_ing", | |
"said", | |
"_s", | |
], | |
[ | |
"see", | |
"_ing", | |
"saw", | |
"_s", | |
"_r" | |
], | |
[ | |
"sell", | |
"_ing", | |
"sold", | |
"_s", | |
"_er" | |
], | |
[ | |
"send", | |
"_ing", | |
"sent", | |
"_s", | |
"_er" | |
], | |
[ | |
"set", | |
"_ting", | |
"_", | |
"_s", | |
"_ter" | |
], | |
[ | |
"shake", | |
"shaking", | |
"shook", | |
"_s", | |
"_r" | |
], | |
[ | |
"shine", | |
"shining", | |
"shone", | |
"_s", | |
"_r" | |
], | |
[ | |
"shoot", | |
"_ing", | |
"shot", | |
"_s", | |
"_er" | |
], | |
[ | |
"show", | |
"_ing", | |
"_ed", | |
"_s", | |
"_er" | |
], | |
[ | |
"shut", | |
"_ting", | |
"_", | |
"_s", | |
"_ter" | |
], | |
[ | |
"sing", | |
"_ing", | |
"sang", | |
"_s", | |
"_er" | |
], | |
[ | |
"sink", | |
"_ing", | |
"sank", | |
"_s", | |
"_er" | |
], | |
[ | |
"sit", | |
"_ting", | |
"sat", | |
"_s", | |
"_ter" | |
], | |
[ | |
"slide", | |
"sliding", | |
"slid", | |
"_s", | |
"_r" | |
], | |
[ | |
"speak", | |
"_ing", | |
"spoke", | |
"_s", | |
"_er" | |
], | |
[ | |
"spend", | |
"_ing", | |
"spent", | |
"_s", | |
"_er" | |
], | |
[ | |
"spin", | |
"_ning", | |
"spun", | |
"_s", | |
"_ner" | |
], | |
[ | |
"spread", | |
"_ing", | |
"_", | |
"_s", | |
"_er" | |
], | |
[ | |
"stand", | |
"_ing", | |
"stood", | |
"_s", | |
"_er" | |
], | |
[ | |
"steal", | |
"_ing", | |
"stole", | |
"_s", | |
"_er" | |
], | |
[ | |
"stick", | |
"_ing", | |
"stuck", | |
"_s", | |
"_er" | |
], | |
[ | |
"sting", | |
"_ing", | |
"stung", | |
"_s", | |
"_er" | |
], | |
[ | |
"strike", | |
"striking", | |
"struck", | |
"_s", | |
"_r" | |
], | |
[ | |
"swear", | |
"_ing", | |
"swore", | |
"_s", | |
"_er" | |
], | |
[ | |
"swim", | |
"_ing", | |
"swam", | |
"_s", | |
"_mer" | |
], | |
[ | |
"swing", | |
"_ing", | |
"swung", | |
"_s", | |
"_er" | |
], | |
[ | |
"take", | |
"taking", | |
"took", | |
"_s", | |
"_r" | |
], | |
[ | |
"teach", | |
"_ing", | |
"taught", | |
"_s", | |
"_er" | |
], | |
[ | |
"tear", | |
"_ing", | |
"tore", | |
"_s", | |
"_er" | |
], | |
[ | |
"tell", | |
"_ing", | |
"told", | |
"_s", | |
"_er" | |
], | |
[ | |
"think", | |
"_ing", | |
"thought", | |
"_s", | |
"_er" | |
], | |
[ | |
"throw", | |
"_ing", | |
"threw", | |
"_s", | |
"_er" | |
], | |
[ | |
"understand", | |
"_ing", | |
"understood", | |
"_s", | |
], | |
[ | |
"wake", | |
"waking", | |
"woke", | |
"_s", | |
"_r" | |
], | |
[ | |
"wear", | |
"_ing", | |
"wore", | |
"_s", | |
"_er" | |
], | |
[ | |
"win", | |
"_ning", | |
"won", | |
"_s", | |
"_ner" | |
], | |
[ | |
"withdraw", | |
"_ing", | |
"withdrew", | |
"_s", | |
"_er" | |
], | |
[ | |
"write", | |
"writing", | |
"wrote", | |
"_s", | |
"_r" | |
], | |
[ | |
"tie", | |
"tying", | |
"_d", | |
"_s", | |
"_r" | |
], | |
[ | |
"obey", | |
"_ing", | |
"_ed", | |
"_s", | |
"_er" | |
], | |
[ | |
"ski", | |
"_ing", | |
"_ied", | |
"_s", | |
"_er" | |
], | |
[ | |
"boil", | |
"_ing", | |
"_ed", | |
"_s", | |
"_er" | |
], | |
[ | |
"miss", | |
"_ing", | |
"_ed", | |
"_", | |
"_er" | |
], | |
[ | |
"act", | |
"_ing", | |
"_ed", | |
"_s", | |
"_or" | |
], | |
[ | |
"compete", | |
"competing", | |
"_d", | |
"_s", | |
"competitor" | |
], | |
[ | |
"being", | |
"are", | |
"were", | |
"are", | |
], | |
[ | |
"imply", | |
"_ing", | |
"implied", | |
"implies", | |
"implier" | |
], | |
[ | |
"ice", | |
"icing", | |
"_d", | |
"_s", | |
"_r" | |
], | |
[ | |
"develop", | |
"_ing", | |
"_", | |
"_s", | |
"_er" | |
], | |
[ | |
"wait", | |
"_ing", | |
"_ed", | |
"_s", | |
"_er" | |
], | |
[ | |
"aim", | |
"_ing", | |
"_ed", | |
"_s", | |
"_er" | |
], | |
[ | |
"spill", | |
"_ing", | |
"spilt", | |
"_s", | |
"_er" | |
], | |
[ | |
"drop", | |
"_ping", | |
"_ped", | |
"_s", | |
"_per" | |
], | |
[ | |
"head", | |
"_ing", | |
"_ed", | |
"_s", | |
"_er" | |
], | |
[ | |
"log", | |
"_ging", | |
"_ged", | |
"_s", | |
"_ger" | |
], | |
[ | |
"rub", | |
"_bing", | |
"_bed", | |
"_s", | |
"_ber" | |
], | |
[ | |
"smash", | |
"_ing", | |
"_ed", | |
"_es", | |
"_er" | |
], | |
[ | |
"add", | |
"_ing", | |
"_ed", | |
"_s", | |
"_er" | |
], | |
[ | |
"word", | |
"_ing", | |
"_ed", | |
"_s", | |
"_er" | |
], | |
[ | |
"suit", | |
"_ing", | |
"_ed", | |
"_s", | |
"_er" | |
], | |
[ | |
"be", | |
"am", | |
"was", | |
"am", | |
"" | |
] | |
] | |
//expand compact version out | |
module.exports = compact.map(function (arr) { | |
var obj = {} | |
for (var i = 0; i < arr.length; i++) { | |
obj[types[i]] = arr[i].replace(/_/, arr[0]) | |
} | |
return obj | |
}) | |
// console.log(JSON.stringify(verb_irregulars, null, 2)); | |
},{}],43:[function(require,module,exports){ | |
// regex rules for each part of speech that convert it to all other parts of speech. | |
// used in combination with the generic 'fallback' method | |
var verb_rules = { | |
"infinitive": [ | |
[ | |
"(eed)$", | |
{ | |
"pr": "$1s", | |
"g": "$1ing", | |
"pa": "$1ed", | |
"do": "$1er" | |
} | |
], | |
[ | |
"(e)(ep)$", | |
{ | |
"pr": "$1$2s", | |
"g": "$1$2ing", | |
"pa": "$1pt", | |
"do": "$1$2er" | |
} | |
], | |
[ | |
"(a[tg]|i[zn]|ur|nc|gl|is)e$", | |
{ | |
"pr": "$1es", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"([i|f|rr])y$", | |
{ | |
"pr": "$1ies", | |
"g": "$1ying", | |
"pa": "$1ied" | |
} | |
], | |
[ | |
"([td]er)$", | |
{ | |
"pr": "$1s", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"([bd]l)e$", | |
{ | |
"pr": "$1es", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"(ish|tch|ess)$", | |
{ | |
"pr": "$1es", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"(ion|end|e[nc]t)$", | |
{ | |
"pr": "$1s", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"(om)e$", | |
{ | |
"pr": "$1es", | |
"g": "$1ing", | |
"pa": "ame" | |
} | |
], | |
[ | |
"([aeiu])([pt])$", | |
{ | |
"pr": "$1$2s", | |
"g": "$1$2$2ing", | |
"pa": "$1$2" | |
} | |
], | |
[ | |
"(er)$", | |
{ | |
"pr": "$1s", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"(en)$", | |
{ | |
"pr": "$1s", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
] | |
], | |
"present": [ | |
[ | |
"(ies)$", | |
{ | |
"in": "y", | |
"g": "ying", | |
"pa": "ied" | |
} | |
], | |
[ | |
"(tch|sh)es$", | |
{ | |
"in": "$1", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"(ss)es$", | |
{ | |
"in": "$1", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"([tzlshicgrvdnkmu])es$", | |
{ | |
"in": "$1e", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"(n[dtk]|c[kt]|[eo]n|i[nl]|er|a[ytrl])s$", | |
{ | |
"in": "$1", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"(ow)s$", | |
{ | |
"in": "$1", | |
"g": "$1ing", | |
"pa": "ew" | |
} | |
], | |
[ | |
"(op)s$", | |
{ | |
"in": "$1", | |
"g": "$1ping", | |
"pa": "$1ped" | |
} | |
], | |
[ | |
"([eirs])ts$", | |
{ | |
"in": "$1t", | |
"g": "$1tting", | |
"pa": "$1tted" | |
} | |
], | |
[ | |
"(ll)s$", | |
{ | |
"in": "$1", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"(el)s$", | |
{ | |
"in": "$1", | |
"g": "$1ling", | |
"pa": "$1led" | |
} | |
], | |
[ | |
"(ip)es$", | |
{ | |
"in": "$1e", | |
"g": "$1ing", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"ss$", | |
{ | |
"in": "ss", | |
"g": "ssing", | |
"pa": "ssed" | |
} | |
], | |
[ | |
"s$", | |
{ | |
"in": "", | |
"g": "ing", | |
"pa": "ed" | |
} | |
] | |
], | |
"gerund": [ | |
[ | |
"pping$", | |
{ | |
"in": "p", | |
"pr": "ps", | |
"pa": "pped" | |
} | |
], | |
[ | |
"lling$", | |
{ | |
"in": "ll", | |
"pr": "lls", | |
"pa": "lled" | |
} | |
], | |
[ | |
"tting$", | |
{ | |
"in": "t", | |
"pr": "ts", | |
"pa": "t" | |
} | |
], | |
[ | |
"ssing$", | |
{ | |
"in": "ss", | |
"pr": "sses", | |
"pa": "ssed" | |
} | |
], | |
[ | |
"gging$", | |
{ | |
"in": "g", | |
"pr": "gs", | |
"pa": "gged" | |
} | |
], | |
[ | |
"([^aeiou])ying$", | |
{ | |
"in": "$1y", | |
"pr": "$1ies", | |
"pa": "$1ied", | |
"do": "$1ier" | |
} | |
], | |
[ | |
"(i.)ing$", | |
{ | |
"in": "$1e", | |
"pr": "$1es", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"(u[rtcb]|[bdtpkg]l|n[cg]|a[gdkvtc]|[ua]s|[dr]g|yz|o[rlsp]|cre)ing$", | |
{ | |
"in": "$1e", | |
"pr": "$1es", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"(ch|sh)ing$", | |
{ | |
"in": "$1", | |
"pr": "$1es", | |
"pa": "$1ed" | |
} | |
], | |
[ | |
"(..)ing$", | |
{ | |
"in": "$1", | |
"pr": "$1s", | |
"pa": "$1ed" | |
} | |
] | |
], | |
"past": [ | |
[ | |
"(ued)$", | |
{ | |
"pr": "ues", | |
"g": "uing", | |
"pa": "ued", | |
"do": "uer" | |
} | |
], | |
[ | |
"(e|i)lled$", | |
{ | |
"pr": "$1lls", | |
"g": "$1lling", | |
"pa": "$1lled", | |
"do": "$1ller" | |
} | |
], | |
[ | |
"(sh|ch)ed$", | |
{ | |
"in": "$1", | |
"pr": "$1es", | |
"g": "$1ing", | |
"do": "$1er" | |
} | |
], | |
[ | |
"(tl|gl)ed$", | |
{ | |
"in": "$1e", | |
"pr": "$1es", | |
"g": "$1ing", | |
"do": "$1er" | |
} | |
], | |
[ | |
"(ss)ed$", | |
{ | |
"in": "$1", | |
"pr": "$1es", | |
"g": "$1ing", | |
"do": "$1er" | |
} | |
], | |
[ | |
"pped$", | |
{ | |
"in": "p", | |
"pr": "ps", | |
"g": "pping", | |
"do": "pper" | |
} | |
], | |
[ | |
"tted$", | |
{ | |
"in": "t", | |
"pr": "ts", | |
"g": "tting", | |
"do": "tter" | |
} | |
], | |
[ | |
"gged$", | |
{ | |
"in": "g", | |
"pr": "gs", | |
"g": "gging", | |
"do": "gger" | |
} | |
], | |
[ | |
"(h|ion|n[dt]|ai.|[cs]t|pp|all|ss|tt|int|ail|ld|en|oo.|er|k|pp|w|ou.|rt|ght|rm)ed$", | |
{ | |
"in": "$1", | |
"pr": "$1s", | |
"g": "$1ing", | |
"do": "$1er" | |
} | |
], | |
[ | |
"(..[^aeiou])ed$", | |
{ | |
"in": "$1e", | |
"pr": "$1es", | |
"g": "$1ing", | |
"do": "$1er" | |
} | |
], | |
[ | |
"ied$", | |
{ | |
"in": "y", | |
"pr": "ies", | |
"g": "ying", | |
"do": "ier" | |
} | |
], | |
[ | |
"(.o)ed$", | |
{ | |
"in": "$1o", | |
"pr": "$1os", | |
"g": "$1oing", | |
"do": "$1oer" | |
} | |
], | |
[ | |
"(.i)ed$", | |
{ | |
"in": "$1", | |
"pr": "$1s", | |
"g": "$1ing", | |
"do": "$1er" | |
} | |
], | |
[ | |
"([rl])ew$", | |
{ | |
"in": "$1ow", | |
"pr": "$1ows", | |
"g": "$1owing" | |
} | |
], | |
[ | |
"([pl])t$", | |
{ | |
"in": "$1t", | |
"pr": "$1ts", | |
"g": "$1ting" | |
} | |
] | |
] | |
} | |
//unpack compressed form | |
verb_rules=Object.keys(verb_rules).reduce(function(h,k){ | |
h[k]=verb_rules[k].map(function(a){ | |
var obj={ | |
reg:new RegExp(a[0],"i"), | |
repl:{ | |
infinitive:a[1]["in"], | |
present:a[1]["pr"], | |
past:a[1]["pa"], | |
gerund:a[1]["g"] | |
} | |
} | |
if(a[1]["do"]){ | |
obj.repl.doer=a[1]["do"] | |
} | |
return obj | |
}) | |
return h | |
},{}) | |
module.exports = verb_rules; | |
// console.log(JSON.stringify(verb_rules, null, 2)); | |
},{}],44:[function(require,module,exports){ | |
//wrapper for verb's methods | |
var Verb = function (str, sentence, word_i) { | |
var the = this | |
var token, next; | |
if (sentence !== undefined && word_i !== undefined) { | |
token = sentence.tokens[word_i] | |
next = sentence.tokens[word_i + i] | |
} | |
the.word = str || ''; | |
var verb_conjugate = require("./conjugate/conjugate") | |
var parts_of_speech = require("../../data/parts_of_speech") | |
var copulas = { | |
"is": "CP", | |
"will be": "CP", | |
"will": "CP", | |
"are": "CP", | |
"was": "CP", | |
"were": "CP" | |
} | |
var modals = { | |
"can": "MD", | |
"may": "MD", | |
"could": "MD", | |
"might": "MD", | |
"will": "MD", | |
"ought to": "MD", | |
"would": "MD", | |
"must": "MD", | |
"shall": "MD", | |
"should": "MD" | |
} | |
var tenses = { | |
past: "VBD", | |
participle: "VBN", | |
infinitive: "VBP", | |
present: "VBZ", | |
gerund: "VBG" | |
} | |
the.conjugate = function () { | |
return verb_conjugate(the.word) | |
} | |
the.to_past = function () { | |
if (the.form === "gerund") { | |
return the.word | |
} | |
return verb_conjugate(the.word).past | |
} | |
the.to_present = function () { | |
return verb_conjugate(the.word).present | |
} | |
the.to_future = function () { | |
return "will " + verb_conjugate(the.word).infinitive | |
} | |
//which conjugation | |
the.form = (function () { | |
//don't choose infinitive if infinitive==present | |
var order = [ | |
"past", | |
"present", | |
"gerund", | |
"infinitive" | |
] | |
var forms = verb_conjugate(the.word) | |
for (var i = 0; i < order.length; i++) { | |
if (forms[order[i]] === the.word) { | |
return order[i] | |
} | |
} | |
})() | |
//past/present/future //wahh?! | |
the.tense = (function () { | |
if (the.word.match(/\bwill\b/)) { | |
return "future" | |
} | |
if (the.form === "present") { | |
return "present" | |
} | |
if (the.form === "past") { | |
return "past" | |
} | |
return "present" | |
})() | |
//the most accurate part_of_speech | |
the.which = (function () { | |
if (copulas[the.word]) { | |
return parts_of_speech['CP'] | |
} | |
if (the.word.match(/([aeiou][^aeiouwyrlm])ing$/)) { | |
return parts_of_speech['VBG'] | |
} | |
var form = the.form | |
return parts_of_speech[tenses[form]] | |
})() | |
//is this verb negative already? | |
the.negative = function () { | |
if (the.word.match(/n't$/)) { | |
return true | |
} | |
if ((modals[the.word] || copulas[the.word]) && next && next.normalised === "not") { | |
return true | |
} | |
return false | |
} | |
return the; | |
} | |
module.exports = Verb; | |
// console.log(new Verb("will")) | |
// console.log(new Verb("stalking").tense) | |
},{"../../data/parts_of_speech":14,"./conjugate/conjugate":39}],45:[function(require,module,exports){ | |
var lexicon = require("./data/lexicon") | |
var values = require("./data/lexicon/values") | |
var tokenize = require("./methods/tokenization/tokenize"); | |
var parts_of_speech = require("./data/parts_of_speech") | |
var word_rules = require("./data/word_rules") | |
var wordnet_suffixes = require("./data/unambiguous_suffixes") | |
var Sentence = require("./sentence") | |
var Section = require("./section") | |
var parents = require("./parents/parents") | |
//possible 2nd part in a phrasal verb | |
var particles = ["in", "out", "on", "off", "behind", "way", "with", "of", "do", "away", "across", "ahead", "back", "over", "under", "together", "apart", "up", "upon", "aback", "down", "about", "before", "after", "around", "to", "forth", "round", "through", "along", "onto"] | |
particles = particles.reduce(function (h, s) { | |
h[s] = true | |
return h | |
}, {}) | |
var merge_tokens = function (a, b) { | |
a.text += " " + b.text | |
a.normalised += " " + b.normalised | |
a.pos_reason += "|" + b.pos_reason | |
a.start = a.start || b.start | |
a.noun_capital = (a.noun_capital && b.noun_capital) | |
a.punctuated = a.punctuated || b.punctuated | |
a.end = a.end || b.end | |
return a | |
} | |
//combine adjacent neighbours, and special cases | |
var combine_tags = function (sentence) { | |
var arr = sentence.tokens || [] | |
for (var i = 0; i <= arr.length; i++) { | |
var next = arr[i + 1] | |
if (arr[i] && next) { | |
var tag = arr[i].pos.tag | |
//'joe smith' are both NN, for example | |
if (tag === next.pos.tag && arr[i].punctuated !== true && arr[i].noun_capital == next.noun_capital) { | |
arr[i + 1] = merge_tokens(arr[i], arr[i + 1]) | |
arr[i] = null | |
} | |
//merge NNP and NN, like firstname, lastname | |
else if ((tag === "NNP" && next.pos.tag === "NN") || (tag === "NN" && next.pos.tag === "NNP")) { | |
arr[i + 1] = merge_tokens(arr[i], arr[i + 1]) | |
arr[i] = null | |
arr[i + 1].pos = parts_of_speech['NNP'] | |
} | |
//merge dates manually, which often have punctuation | |
else if (tag === "CD" && next.pos.tag === "CD") { | |
arr[i + 1] = merge_tokens(arr[i], arr[i + 1]) | |
arr[i] = null | |
} | |
//merge abbreviations with nouns manually, eg. "Joe jr." | |
else if ((tag === "NNAB" && next.pos.parent === "noun") || (arr[i].pos.parent === "noun" && next.pos.tag === "NNAB")) { | |
arr[i + 1] = merge_tokens(arr[i], arr[i + 1]) | |
arr[i] = null | |
} | |
//'will walk' -> future-tense verb | |
else if (arr[i].normalised === "will" && next.pos.parent === "verb") { | |
arr[i + 1] = merge_tokens(arr[i], arr[i + 1]) | |
arr[i] = null | |
} | |
//'hundred and fifty', 'march the 5th' | |
else if (tag === "CD" && (next.normalised === "and" || next.normalised === "the") && arr[i + 2] && arr[i + 2].pos.tag === "CD") { | |
arr[i + 1] = merge_tokens(arr[i], arr[i + 1]) | |
arr[i] = null | |
} | |
//capitals surrounding a preposition 'United States of America' | |
else if (tag == "NN" && arr[i].noun_capital && (next.normalised == "of" || next.normalised == "and") && arr[i + 2] && arr[i + 2].noun_capital) { | |
arr[i + 1] = merge_tokens(arr[i], arr[i + 1]) | |
arr[i] = null | |
arr[i + 2] = merge_tokens(arr[i + 1], arr[i + 2]) | |
arr[i + 1] = null | |
} | |
//capitals surrounding two prepositions 'Phantom of the Opera' | |
else if (arr[i].noun_capital && next.normalised == "of" && arr[i + 2] && arr[i + 2].pos.tag == "DT" && arr[i + 3] && arr[i + 3].noun_capital) { | |
arr[i + 1] = merge_tokens(arr[i], arr[i + 1]) | |
arr[i] = null | |
arr[i + 2] = merge_tokens(arr[i + 1], arr[i + 2]) | |
arr[i + 1] = null | |
arr[i + 3] = merge_tokens(arr[i + 2], arr[i + 3]) | |
arr[i + 2] = null | |
} | |
} | |
} | |
sentence.tokens = arr.filter(function (r) { | |
return r | |
}) | |
return sentence | |
} | |
//some prepositions are clumped onto the back of a verb "looked for", "looks at" | |
//they should be combined with the verb, sometimes. | |
//does not handle seperated phrasal verbs ('take the coat off' -> 'take off') | |
var combine_phrasal_verbs = function (sentence) { | |
var arr = sentence.tokens || [] | |
for (var i = 1; i < arr.length; i++) { | |
if (particles[arr[i].normalised]) { | |
//it matches a known phrasal-verb | |
if (lexicon[arr[i - 1].normalised + " " + arr[i].normalised]) { | |
// console.log(arr[i-1].normalised + " " + arr[i].normalised) | |
arr[i] = merge_tokens(arr[i - 1], arr[i]) | |
arr[i - 1] = null | |
} | |
} | |
} | |
sentence.tokens = arr.filter(function (r) { | |
return r | |
}) | |
return sentence | |
} | |
var lexicon_pass = function (w) { | |
if (lexicon.hasOwnProperty(w)) { | |
return parts_of_speech[lexicon[w]] | |
} | |
//try to match it without a prefix - eg. outworked -> worked | |
if (w.match(/^(over|under|out|-|un|re|en).{4}/)) { | |
var attempt = w.replace(/^(over|under|out|.*?-|un|re|en)/, '') | |
return parts_of_speech[lexicon[attempt]] | |
} | |
} | |
var rules_pass = function (w) { | |
for (var i = 0; i < word_rules.length; i++) { | |
if (w.length > 4 && w.match(word_rules[i].reg)) { | |
return parts_of_speech[word_rules[i].pos] | |
} | |
} | |
} | |
var fourth_pass = function (token, i, sentence) { | |
var last = sentence.tokens[i - 1] | |
var next = sentence.tokens[i + 1] | |
var strong_determiners = { | |
"the": 1, | |
"a": 1, | |
"an": 1 | |
} | |
//resolve ambiguous 'march','april','may' with dates | |
if ((token.normalised == "march" || token.normalised == "april" || token.normalised == "may") && ((next && next.pos.tag == "CD") || (last && last.pos.tag == "CD"))) { | |
token.pos = parts_of_speech['CD'] | |
token.pos_reason = "may_is_date" | |
} | |
//if it's before a modal verb, it's a noun -> lkjsdf would | |
if (next && token.pos.parent !== "noun" && token.pos.parent !== "glue" && next.pos.tag === "MD") { | |
token.pos = parts_of_speech['NN'] | |
token.pos_reason = "before_modal" | |
} | |
//if it's after the word 'will' its probably a verb/adverb | |
if (last && last.normalised == "will" && !last.punctuated && token.pos.parent == "noun" && token.pos.tag !== "PRP" && token.pos.tag !== "PP") { | |
token.pos = parts_of_speech['VB'] | |
token.pos_reason = "after_will" | |
} | |
//if it's after the word 'i' its probably a verb/adverb | |
if (last && last.normalised == "i" && !last.punctuated && token.pos.parent == "noun") { | |
token.pos = parts_of_speech['VB'] | |
token.pos_reason = "after_i" | |
} | |
//if it's after an adverb, it's not a noun -> quickly acked | |
//support form 'atleast he is..' | |
if (last && token.pos.parent === "noun" && token.pos.tag !== "PRP" && token.pos.tag !== "PP" && last.pos.tag === "RB" && !last.start) { | |
token.pos = parts_of_speech['VB'] | |
token.pos_reason = "after_adverb" | |
} | |
//no consecutive, unpunctuated adjectives -> real good | |
if (next && token.pos.parent === "adjective" && next.pos.parent === "adjective" && !token.punctuated) { | |
token.pos = parts_of_speech['RB'] | |
token.pos_reason = "consecutive_adjectives" | |
} | |
//if it's after a determiner, it's not a verb -> the walk | |
if (last && token.pos.parent === "verb" && strong_determiners[last.pos.normalised] && token.pos.tag != "CP") { | |
token.pos = parts_of_speech['NN'] | |
token.pos_reason = "determiner-verb" | |
} | |
//copulas are followed by a determiner ("are a .."), or an adjective ("are good") | |
if (last && last.pos.tag === "CP" && token.pos.tag !== "DT" && token.pos.tag !== "RB" && token.pos.tag !== "PRP" && token.pos.parent !== "adjective" && token.pos.parent !== "value") { | |
token.pos = parts_of_speech['JJ'] | |
token.pos_reason = "copula-adjective" | |
} | |
//copula, adverb, verb -> copula adverb adjective -> is very lkjsdf | |
if (last && next && last.pos.tag === "CP" && token.pos.tag === "RB" && next.pos.parent === "verb") { | |
sentence.tokens[i + 1].pos = parts_of_speech['JJ'] | |
sentence.tokens[i + 1].pos_reason = "copula-adverb-adjective" | |
} | |
// the city [verb] him. | |
if (next && next.pos.tag == "PRP" && token.pos.tag !== "PP" && token.pos.parent == "noun" && !token.punctuated) { | |
token.pos = parts_of_speech['VB'] | |
token.pos_reason = "before_[him|her|it]" | |
} | |
//the misled worker -> misled is an adjective, not vb | |
if (last && next && last.pos.tag === "DT" && next.pos.parent === "noun" && token.pos.parent === "verb") { | |
token.pos = parts_of_speech['JJ'] | |
token.pos_reason = "determiner-adjective-noun" | |
} | |
//where's he gone -> gone=VB, not JJ | |
if (last && last.pos.tag === "PRP" && token.pos.tag === "JJ") { | |
token.pos = parts_of_speech['VB'] | |
token.pos_reason = "adjective-after-pronoun" | |
} | |
return token | |
} | |
//add a 'quiet' token for contractions so we can represent their grammar | |
var handle_contractions = function (arr) { | |
var contractions = { | |
"i'd": ["i", "would"], | |
"she'd": ["she", "would"], | |
"he'd": ["he", "would"], | |
"they'd": ["they", "would"], | |
"we'd": ["we", "would"], | |
"i'll": ["i", "will"], | |
"she'll": ["she", "will"], | |
"he'll": ["he", "will"], | |
"they'll": ["they", "will"], | |
"we'll": ["we", "will"], | |
"i've": ["i", "have"], | |
"they've": ["they", "have"], | |
"we've": ["we", "have"], | |
"should've": ["should", "have"], | |
"would've": ["would", "have"], | |
"could've": ["could", "have"], | |
"must've": ["must", "have"], | |
"i'm": ["i", "am"], | |
"we're": ["we", "are"], | |
"they're": ["they", "are"], | |
"cannot": ["can", "not"] | |
} | |
var before, after, fix; | |
for (var i = 0; i < arr.length; i++) { | |
if (contractions.hasOwnProperty(arr[i].normalised)) { | |
before = arr.slice(0, i) | |
after = arr.slice(i + 1, arr.length) | |
fix = [{ | |
text: arr[i].text, | |
normalised: contractions[arr[i].normalised][0], | |
start: arr[i].start | |
}, { | |
text: "", | |
normalised: contractions[arr[i].normalised][1], | |
start: undefined | |
}] | |
arr = before.concat(fix) | |
arr = arr.concat(after) | |
return handle_contractions(arr) //recursive | |
} | |
} | |
return arr | |
} | |
//these contractions require (some) grammatical knowledge to disambig properly (e.g "he's"=> ['he is', 'he was'] | |
var handle_ambiguous_contractions = function (arr) { | |
var ambiguous_contractions = { | |
"he's": "he", | |
"she's": "she", | |
"it's": "it", | |
"who's": "who", | |
"what's": "what", | |
"where's": "where", | |
"when's": "when", | |
"why's": "why", | |
"how's": "how" | |
} | |
var before, after, fix; | |
for (var i = 0; i < arr.length; i++) { | |
if (ambiguous_contractions.hasOwnProperty(arr[i].normalised)) { | |
before = arr.slice(0, i) | |
after = arr.slice(i + 1, arr.length) | |
//choose which verb this contraction should have.. | |
var chosen = "is" | |
//look for the next verb, and if it's past-tense (he's walked -> he has walked) | |
for (var o = i + 1; o < arr.length; o++) { | |
if (arr[o] && arr[o].pos && arr[o].pos.tag == "VBD") { //past tense | |
chosen = "has" | |
break | |
} | |
} | |
fix = [{ | |
text: arr[i].text, | |
normalised: ambiguous_contractions[arr[i].normalised], //the "he" part | |
start: arr[i].start, | |
pos: parts_of_speech[lexicon[ambiguous_contractions[arr[i].normalised]]], | |
pos_reason: "ambiguous_contraction" | |
}, { | |
text: "", | |
normalised: chosen, //is,was,or have | |
start: undefined, | |
pos: parts_of_speech[lexicon[chosen]], | |
pos_reason: "silent_contraction" | |
}] | |
arr = before.concat(fix) | |
arr = arr.concat(after) | |
return handle_ambiguous_contractions(arr) //recursive | |
} | |
} | |
return arr | |
} | |
//////////////// | |
///party-time// | |
var main = function (text, options) { | |
options = options || {} | |
if (!text || !text.match(/[a-z0-9]/i)) { | |
return new Section([]) | |
} | |
var sentences = tokenize(text); | |
sentences.forEach(function (sentence) { | |
//first, let's handle the capitalisation-of-the-first-word issue | |
var first = sentence.tokens[0] | |
if (first) { | |
//if second word is a noun-capital, give more sympathy to this capital | |
if (sentence.tokens[1] && sentence.tokens[1].noun_capital && !lexicon_pass(first.normalised)) { | |
sentence.tokens[0].noun_capital = true; | |
} | |
} | |
//smart handling of contractions | |
sentence.tokens = handle_contractions(sentence.tokens) | |
//first pass, word-level clues | |
sentence.tokens = sentence.tokens.map(function (token) { | |
//it has a capital and isn't a month, etc. | |
if (token.noun_capital && !values[token.normalised]) { | |
token.pos = parts_of_speech['NN'] | |
token.pos_reason = "noun_capitalised" | |
return token | |
} | |
//known words list | |
var lex = lexicon_pass(token.normalised) | |
if (lex) { | |
token.pos = lex; | |
token.pos_reason = "lexicon" | |
//if it's an abbreviation, forgive the punctuation (eg. 'dr.') | |
if (token.pos.tag === "NNAB") { | |
token.punctuated = false | |
} | |
return token | |
} | |
//handle punctuation like ' -- ' | |
if (!token.normalised) { | |
token.pos = parts_of_speech['UH'] | |
token.pos_reason = "wordless_string" | |
return token | |
} | |
// suffix pos signals from wordnet | |
var len = token.normalised.length | |
if (len > 4) { | |
var suffix = token.normalised.substr(len - 4, len - 1) | |
if (wordnet_suffixes.hasOwnProperty(suffix)) { | |
token.pos = parts_of_speech[wordnet_suffixes[suffix]] | |
token.pos_reason = "wordnet suffix" | |
return token | |
} | |
} | |
// suffix regexes for words | |
var r = rules_pass(token.normalised); | |
if (r) { | |
token.pos = r; | |
token.pos_reason = "regex suffix" | |
return token | |
} | |
//see if it's a number | |
if (parseFloat(token.normalised)) { | |
token.pos = parts_of_speech['CD'] | |
token.pos_reason = "parsefloat" | |
return token | |
} | |
return token | |
}) | |
//second pass, wrangle results a bit | |
sentence.tokens = sentence.tokens.map(function (token, i) { | |
//set ambiguous 'ed' endings as either verb/adjective | |
if (token.pos_reason !== "lexicon" && token.normalised.match(/.ed$/)) { | |
token.pos = parts_of_speech['VB'] | |
token.pos_reason = "ed" | |
} | |
return token | |
}) | |
//split-out more difficult contractions, like "he's"->["he is", "he was"] | |
// (now that we have enough pos data to do this) | |
sentence.tokens = handle_ambiguous_contractions(sentence.tokens) | |
//third pass, seek verb or noun phrases after their signals | |
var need = null | |
var reason = '' | |
sentence.tokens = sentence.tokens.map(function (token, i) { | |
var next = sentence.tokens[i + 1] | |
if (token.pos) { | |
//suggest noun after some determiners (a|the), posessive pronouns (her|my|its) | |
if (token.normalised == "the" || token.normalised == "a" || token.normalised == "an" || token.pos.tag === "PP") { | |
need = 'noun' | |
reason = token.pos.name | |
return token //proceed | |
} | |
//suggest verb after personal pronouns (he|she|they), modal verbs (would|could|should) | |
if (token.pos.tag === "PRP" && token.pos.tag !== "PP" || token.pos.tag === "MD") { | |
need = 'verb' | |
reason = token.pos.name | |
return token //proceed | |
} | |
} | |
//satisfy need on a conflict, and fix a likely error | |
if (token.pos) { | |
if (need == "verb" && token.pos.parent == "noun" && (!next || (next.pos && next.pos.parent != "noun"))) { | |
if (!next || !next.pos || next.pos.parent != need) { //ensure need not satisfied on the next one | |
token.pos = parts_of_speech['VB'] | |
token.pos_reason = "signal from " + reason | |
need = null | |
} | |
} | |
if (need == "noun" && token.pos.parent == "verb" && (!next || (next.pos && next.pos.parent != "verb"))) { | |
if (!next || !next.pos || next.pos.parent != need) { //ensure need not satisfied on the next one | |
token.pos = parts_of_speech["NN"] | |
token.pos_reason = "signal from " + reason | |
need = null | |
} | |
} | |
} | |
//satisfy need with an unknown pos | |
if (need && !token.pos) { | |
if (!next || !next.pos || next.pos.parent != need) { //ensure need not satisfied on the next one | |
token.pos = parts_of_speech[need] | |
token.pos_reason = "signal from " + reason | |
need = null | |
} | |
} | |
//set them back as satisfied.. | |
if (need === 'verb' && token.pos && token.pos.parent === 'verb') { | |
need = null | |
} | |
if (need === 'noun' && token.pos && token.pos.parent === 'noun') { | |
need = null | |
} | |
return token | |
}) | |
//third pass, identify missing clauses, fallback to noun | |
var has = {} | |
sentence.tokens.forEach(function (token) { | |
if (token.pos) { | |
has[token.pos.parent] = true | |
} | |
}) | |
sentence.tokens = sentence.tokens.map(function (token, i) { | |
if (!token.pos) { | |
//if there is no verb in the sentence, and there needs to be. | |
if (has['adjective'] && has['noun'] && !has['verb']) { | |
token.pos = parts_of_speech['VB'] | |
token.pos_reason = "need one verb" | |
has['verb'] = true | |
return token | |
} | |
//fallback to a noun | |
token.pos = parts_of_speech['NN'] | |
token.pos_reason = "noun fallback" | |
} | |
return token | |
}) | |
//fourth pass, error correction | |
sentence.tokens = sentence.tokens.map(function (token, i) { | |
return fourth_pass(token, i, sentence) | |
}) | |
//run the fourth-pass again! | |
sentence.tokens = sentence.tokens.map(function (token, i) { | |
return fourth_pass(token, i, sentence) | |
}) | |
}) | |
//combine neighbours | |
if (!options.dont_combine) { | |
sentences = sentences.map(function (s) { | |
return combine_tags(s) | |
}) | |
sentences = sentences.map(function (s) { | |
return combine_phrasal_verbs(s) | |
}) | |
} | |
//make them Sentence objects | |
sentences = sentences.map(function (s) { | |
var sentence = new Sentence(s.tokens) | |
sentence.type = s.type | |
return sentence | |
}) | |
//add analysis on each token | |
sentences = sentences.map(function (s) { | |
s.tokens = s.tokens.map(function (token, i) { | |
token.analysis = parents[token.pos.parent](token.normalised, s, i) | |
return token | |
}) | |
return s | |
}) | |
//add next-last references | |
sentences = sentences.map(function (sentence, i) { | |
sentence.last = sentences[i - 1] | |
sentence.next = sentences[i + 1] | |
return sentence | |
}) | |
//return a Section object, with its methods | |
return new Section(sentences) | |
} | |
module.exports = main; | |
// console.log( pos("Geroge Clooney walked, quietly into a bank. It was cold.") ) | |
// console.log( pos("it is a three-hundred and one").tags() ) | |
// console.log( pos("funny funny funny funny").sentences[0].tokens ) | |
// pos("In March 2009, while Secretary of State for Energy and Climate Change, Miliband attended the UK premiere of climate-change film The Age of Stupid, where he was ambushed").sentences[0].tokens.map(function(t){console.log(t.pos.tag + " "+t.text)}) | |
// pos("the Energy and Climate Change, Miliband").sentences[0].tokens.map(function(t){console.log(t.pos.tag + " "+t.text)}) | |
// console.log(pos("Energy and Climate Change, Miliband").sentences[0].tokens) | |
// console.log(pos("http://google.com").sentences[0].tokens) | |
// console.log(pos("may live").tags()) | |
// console.log(pos("may 7th live").tags()) | |
// console.log(pos("She and Marc Emery married on July 23, 2006.").tags()) | |
// console.log(pos("Toronto is fun. Spencer and heather quickly walked. it was cool").sentences[0].referables()) | |
// console.log(pos("a hundred").sentences[0].tokens) | |
// console.log(pos("Tony Reagan skates").sentences[0].tokens) | |
// console.log(pos("She and Marc Emery married on July 23, 2006").sentences[0].tokens) | |
// console.log(pos("Tony Hawk walked quickly to the store.").sentences[0].tokens) | |
// console.log(pos("jahn j. jacobheimer").sentences[0].tokens[0].analysis.is_person()) | |
// pos("Dr. Conrad Murray recieved a guilty verdict").sentences[0].tokens.map(function(t){console.log(t.pos.tag + " "+t.text)}) | |
// pos("the Phantom of the Opera").sentences[0].tokens.map(function(t){console.log(t.pos.tag + " "+t.text)}) | |
// pos("Tony Hawk is nice").sentences[0].tokens.map(function(t){console.log(t.pos.tag + " "+t.text)}) | |
// pos("tony hawk is nice").sentences[0].tokens.map(function(t){console.log(t.pos.tag + " "+t.text)}) | |
// console.log(pos("look after a kid").sentences[0].tags()) | |
// pos("Sather tried to stop the deal, but when he found out that Gretzky").sentences[0].tokens.map(function(t){console.log(t.pos.tag + " "+t.text+" "+t.pos_reason)}) | |
// pos("Gretzky had tried skating").sentences[0].tokens.map(function(t){console.log(t.pos.tag + " "+t.text+" "+t.pos_reason)}) | |
// pos("Sally and Tom fight a lot. She thinks he is her friend.").sentences[0].tokens.map(function(t){console.log(t.pos.tag + " "+t.text+" "+t.pos_reason)}) | |
// console.log(pos("i think Tony Danza is cool. He rocks and he is golden.").sentences[0].tokens[2].analysis.referenced_by()) | |
// console.log(pos("i think Tony Danza is cool and he is golden.").sentences[0].tokens[6].analysis.reference_to()) | |
// console.log(pos("Tina grabbed her shoes. She is lovely.").sentences[0].tokens[0].analysis.referenced_by()) | |
// console.log(pos("Sally and Tom fight a lot. She thinks he is her friend.").sentences[0].tokens[0].analysis.referenced_by()) | |
// console.log(pos("it's gotten the best features").sentences[0].tokens[1].normalised=="has") //bug | |
// console.log(pos("he's fun").sentences[0].tokens[1].normalised=="is") | |
},{"./data/lexicon":2,"./data/lexicon/values":12,"./data/parts_of_speech":14,"./data/unambiguous_suffixes":15,"./data/word_rules":16,"./methods/tokenization/tokenize":22,"./parents/parents":35,"./section":46,"./sentence":47}],46:[function(require,module,exports){ | |
//a section is a block of text, with an arbitrary number of sentences | |
//these methods are just wrappers around the ones in sentence.js | |
var Section = function(sentences) { | |
var the = this | |
the.sentences = sentences || []; | |
the.text = function() { | |
return the.sentences.map(function(s) { | |
return s.text() | |
}).join(' ') | |
} | |
the.tense = function() { | |
return the.sentences.map(function(s) { | |
return s.tense() | |
}) | |
} | |
//pluck out wanted data from sentences | |
the.nouns = function() { | |
return the.sentences.map(function(s) { | |
return s.nouns() | |
}).reduce(function(arr, a) { | |
return arr.concat(a) | |
}, []) | |
} | |
the.entities = function(options) { | |
return the.sentences.map(function(s) { | |
return s.entities(options) | |
}).reduce(function(arr, a) { | |
return arr.concat(a) | |
}, []) | |
} | |
the.people = function() { | |
return the.sentences.map(function(s) { | |
return s.people() | |
}).reduce(function(arr, a) { | |
return arr.concat(a) | |
}, []) | |
} | |
the.adjectives = function() { | |
return the.sentences.map(function(s) { | |
return s.adjectives() | |
}).reduce(function(arr, a) { | |
return arr.concat(a) | |
}, []) | |
} | |
the.verbs = function() { | |
return the.sentences.map(function(s) { | |
return s.verbs() | |
}).reduce(function(arr, a) { | |
return arr.concat(a) | |
}, []) | |
} | |
the.adverbs = function() { | |
return the.sentences.map(function(s) { | |
return s.adverbs() | |
}).reduce(function(arr, a) { | |
return arr.concat(a) | |
}, []) | |
} | |
the.values = function() { | |
return the.sentences.map(function(s) { | |
return s.values() | |
}).reduce(function(arr, a) { | |
return arr.concat(a) | |
}, []) | |
} | |
the.tags = function() { | |
return the.sentences.map(function(s) { | |
return s.tags() | |
}) | |
} | |
//transform the sentences | |
the.negate = function() { | |
the.sentences = the.sentences.map(function(s) { | |
return s.negate() | |
}) | |
return the | |
} | |
the.to_past = function() { | |
the.sentences = the.sentences.map(function(s) { | |
return s.to_past() | |
}) | |
return the | |
} | |
the.to_present = function() { | |
the.sentences = the.sentences.map(function(s) { | |
return s.to_present() | |
}) | |
return the | |
} | |
the.to_future = function() { | |
the.sentences = the.sentences.map(function(s) { | |
return s.to_future() | |
}) | |
return the | |
} | |
} | |
module.exports = Section; | |
},{}],47:[function(require,module,exports){ | |
// methods that hang on a parsed set of words | |
// accepts parsed tokens | |
var Sentence = function(tokens) { | |
var the = this | |
the.tokens = tokens || []; | |
var capitalise = function(s) { | |
return s.charAt(0).toUpperCase() + s.slice(1); | |
} | |
the.tense = function() { | |
var verbs = the.tokens.filter(function(token) { | |
return token.pos.parent === "verb" | |
}) | |
return verbs.map(function(v) { | |
return v.analysis.tense | |
}) | |
} | |
the.to_past = function() { | |
the.tokens = the.tokens.map(function(token) { | |
if (token.pos.parent === "verb") { | |
token.text = token.analysis.to_past() | |
token.normalised = token.text | |
} | |
return token | |
}) | |
return the | |
} | |
the.to_present = function() { | |
the.tokens = the.tokens.map(function(token) { | |
if (token.pos.parent === "verb") { | |
token.text = token.analysis.to_present() | |
token.normalised = token.text | |
} | |
return token | |
}) | |
return the | |
} | |
the.to_future = function() { | |
the.tokens = the.tokens.map(function(token) { | |
if (token.pos.parent === "verb") { | |
token.text = token.analysis.to_future() | |
token.normalised = token.text | |
} | |
return token | |
}) | |
return the | |
} | |
the.insert = function(token, i) { | |
if (i && token) { | |
the.tokens.splice(i, 0, token); | |
} | |
} | |
//negate makes the sentence mean the opposite thing. | |
the.negate = function() { | |
//these are cheap ways to negate the meaning | |
// ('none' is ambiguous because it could mean (all or some) ) | |
var logic_negate = { | |
//some logical ones work | |
"everyone": "no one", | |
"everybody": "nobody", | |
"someone": "no one", | |
"somebody": "nobody", | |
// everything:"nothing", | |
"always": "never", | |
//copulas | |
"is": "isn't", | |
"are": "aren't", | |
"was": "wasn't", | |
"will": "won't", | |
//modals | |
"didn't": "did", | |
"wouldn't": "would", | |
"couldn't": "could", | |
"shouldn't": "should", | |
"can't": "can", | |
"won't": "will", | |
"mustn't": "must", | |
"shan't": "shall", | |
"shant": "shall", | |
"did": "didn't", | |
"would": "wouldn't", | |
"could": "couldn't", | |
"should": "shouldn't", | |
"can": "can't", | |
"must": "mustn't" | |
} | |
//loop through each term.. | |
for (var i = 0; i < the.tokens.length; i++) { | |
var tok = the.tokens[i] | |
//turn 'is' into 'isn't', etc - make sure 'is' isnt followed by a 'not', too | |
if (logic_negate[tok.normalised] && (!the.tokens[i + 1] || the.tokens[i + 1].normalised != "not")) { | |
tok.text = logic_negate[tok.normalised] | |
tok.normalised = logic_negate[tok.normalised] | |
if (tok.capitalised) { | |
tok.text = capitalise(tok.text) | |
} | |
return the | |
} | |
// find the first verb.. | |
if (tok.pos.parent == "verb") { | |
// if verb is already negative, make it not negative | |
if (tok.analysis.negative()) { | |
if (the.tokens[i + 1] && the.tokens[i + 1].normalised == "not") { | |
the.tokens.splice(i + 1, 1) | |
} | |
return the | |
} | |
//turn future-tense 'will go' into "won't go" | |
if (tok.normalised.match(/^will /i)) { | |
tok.text = tok.text.replace(/^will /i, "won't ") | |
tok.normalised = tok.text | |
if (tok.capitalised) { | |
tok.text = capitalise(tok.text) | |
} | |
return the | |
} | |
// - INFINITIVE- | |
// 'i walk' -> "i don't walk" | |
if (tok.analysis.form == "infinitive" && tok.analysis.form != "future") { | |
tok.text = "don't " + (tok.analysis.conjugate().infinitive || tok.text) | |
tok.normalised = tok.text.toLowerCase() | |
return the | |
} | |
// - GERUND- | |
// if verb is gerund, 'walking' -> "not walking" | |
if (tok.analysis.form == "gerund") { | |
tok.text = "not " + tok.text | |
tok.normalised = tok.text.toLowerCase() | |
return the | |
} | |
// - PAST- | |
// if verb is past-tense, 'he walked' -> "he did't walk" | |
if (tok.analysis.tense == "past") { | |
tok.text = "didn't " + (tok.analysis.conjugate().infinitive || tok.text) | |
tok.normalised = tok.text.toLowerCase() | |
return the | |
} | |
// - PRESENT- | |
// if verb is present-tense, 'he walks' -> "he doesn't walk" | |
if (tok.analysis.tense == "present") { | |
tok.text = "doesn't " + (tok.analysis.conjugate().infinitive || tok.text) | |
tok.normalised = tok.text.toLowerCase() | |
return the | |
} | |
// - FUTURE- | |
// if verb is future-tense, 'will go' -> won't go. easy-peasy | |
if (tok.analysis.tense == "future") { | |
if (tok.normalised == "will") { | |
tok.normalised = "won't" | |
tok.text = "won't" | |
} else { | |
tok.text = tok.text.replace(/^will /i, "won't ") | |
tok.normalised = tok.normalised.replace(/^will /i, "won't ") | |
} | |
if (tok.capitalised) { | |
tok.text = capitalise(tok.text); | |
} | |
return the | |
} | |
return the | |
} | |
} | |
return the | |
} | |
the.entities = function(options) { | |
var spots = [] | |
options = options || {} | |
the.tokens.forEach(function(token) { | |
if (token.pos.parent === "noun" && token.analysis.is_entity()) { | |
spots.push(token) | |
} | |
}) | |
if (options.ignore_gerund) { | |
spots = spots.filter(function(t) { | |
return t.pos.tag !== "VBG" | |
}) | |
} | |
return spots | |
} | |
//noun-entities that look like person names.. | |
the.people = function(){ | |
return the.entities({}).filter(function(o){ | |
return o.analysis.is_person() | |
}) | |
} | |
the.text = function() { | |
return the.tokens.map(function(s) { | |
return s.text | |
}).join(' ') | |
} | |
//sugar 'grab' methods | |
the.verbs = function() { | |
return the.tokens.filter(function(t) { | |
return t.pos.parent == "verb" | |
}) | |
} | |
the.adverbs = function() { | |
return the.tokens.filter(function(t) { | |
return t.pos.parent == "adverb" | |
}) | |
} | |
the.nouns = function() { | |
return the.tokens.filter(function(t) { | |
return t.pos.parent == "noun" | |
}) | |
} | |
the.adjectives = function() { | |
return the.tokens.filter(function(t) { | |
return t.pos.parent == "adjective" | |
}) | |
} | |
the.values = function() { | |
return the.tokens.filter(function(t) { | |
return t.pos.parent == "value" | |
}) | |
} | |
the.tags = function() { | |
return the.tokens.map(function(t) { | |
return t.pos.tag | |
}) | |
} | |
//find the 'it', 'he', 'she', and 'they' of this sentence | |
//these are the words that get 'exported' to be used in other sentences | |
the.referables=function(){ | |
var pronouns={ | |
he:undefined, | |
she:undefined, | |
they:undefined, | |
it:undefined | |
} | |
the.tokens.forEach(function(t){ | |
if(t.pos.parent=="noun" && t.pos.tag!="PRP"){ | |
pronouns[t.analysis.pronoun()]=t | |
} | |
}) | |
return pronouns | |
} | |
return the | |
} | |
module.exports = Sentence; | |
},{}],48:[function(require,module,exports){ | |
//just a wrapper for text -> entities | |
//most of this logic is in ./parents/noun | |
var pos = require("./pos"); | |
var main = function (text, options) { | |
options = options || {} | |
//collect 'entities' from all nouns | |
var sentences = pos(text, options).sentences | |
var arr = sentences.reduce(function (arr, s) { | |
return arr.concat(s.entities(options)) | |
}, []) | |
//for people, remove instances of 'george', and 'bush' after 'george bush'. | |
var ignore = {} | |
arr = arr.filter(function (o) { | |
//add tokens to blacklist | |
if (o.analysis.is_person()) { | |
o.normalised.split(' ').forEach(function (s) { | |
ignore[s] = true | |
}) | |
} | |
if (ignore[o.normalised]) { | |
return false | |
} | |
return true | |
}) | |
return arr | |
} | |
module.exports = main; | |
// console.log(spot("Tony Hawk is cool. Tony eats all day.").map(function(s){return s})) | |
// console.log(spot("Tony eats all day. Tony Hawk is cool.").map(function(s){return s})) | |
// console.log(spot("My Hawk is cool").map(function(s){return s.normalised})) | |
},{"./pos":45}]},{},[1]); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment