Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save bakirillov/a9bc8a525120a72864081c03a425a59a to your computer and use it in GitHub Desktop.
Save bakirillov/a9bc8a525120a72864081c03a425a59a to your computer and use it in GitHub Desktop.
References for MCCMB 2021
1. E.Loper, S.Bird (2002) Nltk: The natural language toolkit. arXiv preprint arXiv:cs/0205028.
2. K.W.Church (2017) Word2Vec. Natural Language Engineering, 23(1):155-162.
3. J.H.Lau, T.Baldwin (2016) An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint, arXiv:1607.05368.
4. C.R.Huang, P.Šimon, S.K.Hsieh, L.Prévot (2007) Rethinking chinese word segmentation: tokenization, character classification, or wordbreak identification. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 69-72.
5. E.Asgari, M.R.Mofrad (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PloS one, 10(11):e0141287.
6. H.Iuchi et al (2021) Representation learning applications in biological sequence analysis. bioRxiv.
7. P.Ng (2017) dna2vec: Consistent vector representations of variable-length k-mers. arXiv preprint, arXiv:1701.06279.
8. Y.Shibata, T.Kida, S.Fukamachi, M.Takeda, A.Shinohara, T.Shinohara, S.Arikawa (1999) Byte Pair encoding: A text compression scheme that accelerates pattern matching. In Technical Report DOI-TR-161, Department of Informatics, Kyushu University.
9. P.Bojanowski, E.Grave, A.Joulin, T.Mikolov (2017) Enriching word vectors with subword information. In Transactions of the Association for Computational Linguistics, 5:135-146.
10. A.C.Gyllensten, A.Ekgren, M.Sahlgren (2019) R-grams: Unsupervised Learning of Semantic Units in Natural Language. In Proceedings of the 13th International Conference on Computational Semantics-Student Papers (pp. 52-62).
11. S.Tempel, B.Zerath, F.Zehraoui, & F.Tahi (2015). miRBoost: boosting support vector machines for microRNA precursor classification. RNA, 21(5):775-785.
12. R.Řehůřek, P.Sojka (2011) Gensim—statistical semantics in python. Retrieved from genism.org.
13. F.Pedregosa et al (2011) Scikit-learn: Machine learning in Python. The Journal of machine Learning research, 12:2825-2830.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment