Skip to content

Instantly share code, notes, and snippets.

View RyanMcCarl's full-sized avatar

RyanM RyanMcCarl

View GitHub Profile
@RyanMcCarl
RyanMcCarl / How difficult is it to learn Chinese.md
Last active February 18, 2019 00:19
Ryan McCarl's answer to "How difficult is it to learn to write Chinese?"

How difficult is it to learn to write Chinese?

Learning to read Chinese fluently is a lifelong project for people who do not grow up in a Chinese-speaking environment. Learning to write Chinese fluently is even tougher.

You can only write a language if you can read it. So let’s start with reading. The primary obstacle to reading Chinese is learning to recognize, identify the meaning of, and pronounce each word you come across, including each character within the word. There are about 5,000 such characters [hanzi] used in everyday Chinese writing. Memorizing these is an enormous task. (Japanese, which uses around 2,000 Chinese characters [kanji] in addition to 71 phonetic kana characters, presents a similar obstacle.)

To get to an advanced level—but not fluency—you need to learn around 5,000 words and the 2,663 characters within those words. Those last two numbers come from the highest level of the HSK test, a standard for students learning Chinese as a second language.

**Memorizing characters is not

@RyanMcCarl
RyanMcCarl / article-summarizer.py
Created July 4, 2018 23:29 — forked from jackschultz/article-summarizer.py
Article summarizer written in python.
import nltk
from nltk.stem.wordnet import WordNetLemmatizer
import string
class SentenceRank(object):
def __init__(self, body, title):
self.body = body
self.sentence_list = nltk.tokenize.sent_tokenize(self.body)[:]
self.title = title
@RyanMcCarl
RyanMcCarl / article-summarizer.py
Created July 4, 2018 23:29 — forked from jackschultz/article-summarizer.py
Article summarizer written in python.
import nltk
from nltk.stem.wordnet import WordNetLemmatizer
import string
class SentenceRank(object):
def __init__(self, body, title):
self.body = body
self.sentence_list = nltk.tokenize.sent_tokenize(self.body)[:]
self.title = title
@RyanMcCarl
RyanMcCarl / EXAMPLE.js
Created June 13, 2018 06:39 — forked from redaktor/EXAMPLE.js
nlp_compromise metrics proposal as standalone example
// TODO - make logic_negate and abbreviations to lexicon as resource file (i18n, language aware, seperate data and logic)
// the best way might be a dictionary with flags where we can easily derive the lexicon by Object.keys and map, like
/* dictionary: {
"CP": [
{v:'is', weak: 1},
...
],
...
};
*/
@RyanMcCarl
RyanMcCarl / EXAMPLE.js
Created June 13, 2018 06:39 — forked from redaktor/EXAMPLE.js
nlp_compromise metrics proposal as standalone example
// TODO - make logic_negate and abbreviations to lexicon as resource file (i18n, language aware, seperate data and logic)
// the best way might be a dictionary with flags where we can easily derive the lexicon by Object.keys and map, like
/* dictionary: {
"CP": [
{v:'is', weak: 1},
...
],
...
};
*/
@RyanMcCarl
RyanMcCarl / bullshit_generator.py
Created June 13, 2018 06:36 — forked from SteveClement/bullshit_generator.py
Found (https://mail.python.org/pipermail/python-list/2009-March/530858.html) this nice Bullshit Generator by Pierre Denis. Adapted it to Python3
'''
======================================================================
Bullshit Generator
by Pierre Denis, March 2009
======================================================================
'''
# --------------------------------------------------
# grammar engine
# --------------------------------------------------
@RyanMcCarl
RyanMcCarl / topokanji_avg.txt
Created December 28, 2017 09:43
topokanji_avg.txt
@RyanMcCarl
RyanMcCarl / all_kanji.txt
Last active December 28, 2017 03:53
all_kanji.txt
@RyanMcCarl
RyanMcCarl / kanji_topo.txt
Created December 28, 2017 03:32
kanji_topo.txt
@RyanMcCarl
RyanMcCarl / kanjidata_2017.csv
Created December 8, 2017 01:54
kanjidata_2017.csv
char pedAvg freqAvg mccarl heisig conning gsf kic news aozora twitter wikipedia novels
1 8 1 1 2 1 37 116 2 8 12 2
2 5 2 12 1 24 1 3971 5 2 2 23
4 2 3 1023 14 2 4 2 1 4 15 1
26 13 4 39 34 26 21 9 15 17 11 14
3 74 5 2 2 3 124 240 18 157 197 22
30 13 6 224 30 38 2 4 27 16 7 33
32 10 7 112 32 11 41 23 7 5 9 12
20 42 8 15 20 37 168 43 53 42 17 23
9 64 9 3 4 15 14 355 30 148 122 46