Skip to content

Instantly share code, notes, and snippets.

@trinker
Last active March 24, 2020 18:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save trinker/a75144f8d90738a169dc4c99f4ad3717 to your computer and use it in GitHub Desktop.
Save trinker/a75144f8d90738a169dc4c99f4ad3717 to your computer and use it in GitHub Desktop.
Norah's Phoneme Question
## Norah's Question: Are there any words that start with chr (phoenetically /k/ /r/) that don't have a a short i sound following it?
library(openssl)
library(textshape)
library(tidyverse)
cmudict <- readLines('https://raw.githubusercontent.com/michelleful/ToBoldlyStress/master/stressed_spelling.txt')
cmudict7b <- readLines('http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b') %>% tail(-121) %>% head(-4)
stress_key <- tribble(
~stress_code, ~stress,
0, 'No stress',
1, 'Primary stress',
2, 'Secondary stress'
)
phonemes <- tibble(
raw = grep('\\(\\d+\\)', cmudict7b, value = TRUE, invert = TRUE)
) %>%
dplyr::filter(grepl('^[A-Z]', raw)) %>%
filter(grepl('^CH', raw)) %>%
mutate(
id = openssl::md5(as.character(seq_len(n()))),
word = tolower(gsub(' .+$', '', raw)),
phoneme = trimws(gsub('^. ', '', raw)) %>% lapply(function(x) unlist(strsplit(x, '\\s+')))
) %>%
select(-raw) %>%
unnest(cols = c(phoneme)) %>%
mutate(
stress_code = as.integer(gsub('\\D', '', phoneme)),
phoneme = gsub('\\d', '', phoneme)
) %>%
left_join(stress_key, by = 'stress_code')
ch_k_sample <- phonemes %>%
group_by(word) %>%
slice(2) %>%
filter(phoneme == 'K')
ch_k_sample %>%
filter(grepl('^chr[iy]', word)) %>%
select(word) %>%
print(n = Inf)
ch_k_sample %>%
filter(!grepl('^chr[iy]', word))%>%
select(word) %>%
print(n = Inf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment