Skip to content

Instantly share code, notes, and snippets.

@vmujadia
Last active June 13, 2020 09:20
Show Gist options
  • Save vmujadia/d31d718dd199dd820aa4 to your computer and use it in GitHub Desktop.
Save vmujadia/d31d718dd199dd820aa4 to your computer and use it in GitHub Desktop.
IASNLP-2015 Project list

IASNLP-2015 Project list

LTRC, IIIT-Hyderabad

Treebanking

#####1. Shallow Parsers for different languages#####

  • Description - POS Tagging and Chunking for Gujarati,Odia, Hindi, Bengali, Marathi, Telugu (individual project for each language)
  • We will implement many supervised algorithms including CRF, HMM, MaxEnt, SVM, some semi-supervised classification methods, finally an unsupervised one. Will try to implement Morph Analyzer if time permits. students need to annotate data, understand the challenges, compare results given by multiple
  • Mentor: Pruthwik M
  • algorithms- students for each language - 1 / 2
  • prerequisites - Data-structures, knowledge of POS tagging, basics of Machine Learning

#####2. Sandhi Splitter for Dravidian Languages.

  • Description- Sandhi is a major problem in computational processing of Dravidian languaes as well as Sanskrit, where 2 or more words can join together to form a single string, with a change in character(s) at the point of joining. Sandhi splitting is the task of identifying and separating the individual words present in a string (if more than 1 word present in the string). Currently Sandhi splitters are available for Malayalam and Telugu, which are Hybrid and statistical respectively. This project aims at creating Statistical/Hybrid Sandhi splitters for Kannada and Tamil, as well as improving the current Sandhi splitters for Malayalam and Telugu.
  • Faculty- Prof. Dipti Misra Sharma
  • Mentor - Devadath V V, Vigneshwaran M
  • Students - 2-3
  • Requirements - NLP, Python, and fluency in any Dravidian language(Telugu/Tamil/Malayalam/Kannada)

#####3. Combined Modelling of Sandhi Identification and Morph analysis in Dravidian languages

  • Description :: Dravidian languages are morphologically rich where multiple words combine together to form a single token. This phenomenon is called Sandhi. It is imperative to recognize individual words that are contained within a sandhi complex to even begin with a basic NLP task like POS tagger. This project attempts to exploit certain morphological properties of these languages to model Sandhi boundary identification as well as morph analysis of the combining stems. (Languages: Telugu, Kannada, Tamil, Malayalam [separate project for each language])
  • Faculty :: Prof. Dipti Misra Sharma
  • Mentor :: Vigneshwaran M, Devadath V
  • Students :: 4 (maximum)
  • Resources to be read by summer school students :: Research papers
  • Skills :: NLP, Programming, Linguistic analysis

#####4. Drugs identification from discharge summary

  • Description :: Given a discharge summary one need to find out drug names using various linguistic features and knowledge bases like UMLS and wikipedia .
  • Faculty :: Prof. Dipti Misra Sharma
  • Mentor :: Mihir Shekhar
  • Students :: 3-4
  • Skills :: NLP, Machine learning, Moduler programming, Java/python

Parsing

#####5. Implementing dependency parser.#####

  • Description-one need to create CPG based dependency perser exploring different tools and resources. Advanced techniques can be applied exploiting large monolingual corpus.
  • Faculty - Prof. Dipti Misra Sharma
  • Mentor - Aniruddh Tammewar
  • Students - 2-3
  • Requirements - NLP, Machine Learning, Python.

#####6. CPG dependency parsing for English.#####

  • Description-CPG dependency framework has been well explored with Hindi, Telugu and a few more Indian languages. An English treebank of nearly 2000 sentences has been annotated with the above framework, and a few parsing experiments have been done with the treebank. This project would explore dependency parsing techniques in English and may go on to compare the dependency scheme with other widely used dependency schemes such as Penn dependency tags and Czech Tectogrammatical framework.
  • Faculty - Prof. Dipti Misra Sharma
  • Mentor - Himani Chodhry, Aniruddh Tammewar
  • Students - 2-3
  • Requirements - NLP, Machine Learning, Python.

#####7. Semantic Role Labeling using Parsing.#####

  • Description-Semantic Role Labeling involves the task of automatically identifying the arguments of a verb in a sentence and then classifying them by labeling the arguments with semantic labels, also known as PropBank labels. Presently, Hindi and Urdu PropBanks are built on top of HDT and UDT respectively and this project aims at building robust statistical Semantic role labellers for both the languages. Furthermore, we can use the Prop-Bank features to improve Parsing and vice-versa.
  • Faculty - Prof. Dipti Misra Sharma
  • Mentor - Maaz Anwar
  • Students - 2-3
  • Requirements - NLP, Machine Learning, Python.

Anusaaraka

#####8. Handling Idiom Expression using Grammatical Framework (GF) software for English-Hindi pair of languages.#####

  • Description :: Both English and Hindi language have different set of idioms. Idioms from one language might correspond to an idiom in another language or have a different meaning altogether. The project involves mapping idioms from one language to another by creating abstract definitions and then mapping these semantics to the available idioms or literal forms.
  • Faculty :: Dr. Soma Paul
  • Mentor :: Prateek Saxena and Shastri V.
  • Students :: 3-4
  • Resources to be read by summer school students :: Resources that give an idea of existing as well as the latest 'linguistic' and 'NLP' tools, GF slides
  • Skills :: NLP, GF Programming

#####9. Preparing Linguistic resources using GF on android #####

  • Description :: ( description not finalized, information not available)
  • Resources :: GF app for android named "GF offline translator". One can use and play around with it.
  • Faculty :: Dr. Soma Paul
  • Mentor :: Ayushi Agrawal and Shivani Pathak

#####10. Generating English-Hindi Word aligned corpora using existing NLP resources#####

  • Description :: The existing algorithm for the task of word alignment gives word aligned output using output of 2 tools namely Anusaaraka and phrase table(generated by SMT tool Phrasal).This algorithm will be provided to the interns who have to test, evaluate and improve it.We have to integrate output of 2 more NLP tools(Parser)-a Parser using ERG(English Resource Grammar) and a Hindi Parser.Interns will have to propose solutions for this idea and try to accomplish this ongoing task by improving the existing algorithm.
  • Faculty :: Dr. Soma Paul
  • Mentor :: Ayushi Agrawal and Shivani Pathak
  • Students :: 3-4
  • Resources to be read by summer school students :: Resources that give an idea of existing as well as the latest 'linguistic' and 'NLP' tools.
  • Skills :: NLP, Programming

####Dialogue Processing

#####11. Syntactic and semantic processing for NLIDB in Telugu/Hindi#####

  • NLIDB is a system which translates a natural language query into a SQL query. Syntactic and semantic processing of the given NL query are important for the NLIDB system to translate it into a SQL query. This project aims at understanding the architecture of a NLIDB system in CPG framework and also involves developing syntactic and semantic modules for the NLIDB system in Telugu/Hindi.
  • Student: 3-4
  • Skills: NLP, MySQL, Programming(Python).
  • Mentors: Arjun Reddy Akula, Ashish P

Discourse

#####12. Discourse Argument Identification from Dependency Structure and Argument Span Selection.

  • Description :: From the given Hindi dependency output, identify the explicit discourse connectives and the span of their arguments. Implicit discourse connections are not handled as a part of this project.
  • Faculty :: Prof. Dipti Misra Sharma
  • Mentor :: Rohit and Vignesh
  • Students :: 2
  • Resources to be read by summer school students :: Research papers
  • Skills :: NLP, Programming, Linguistic analysis

#####13. Discourse Sense Identification from Sentential features and creating relation hierarchies based on Senses

  • Description :: From the given Hindi dependency output, identify the explicit discourse connectives and the span of their arguments. Implicit discourse connections are not handled as a part of this project.
  • Faculty :: Prof. Dipti Misra Sharma
  • Mentor :: Rohit and Vignesh
  • Students :: 2
  • Resources to be read by summer school students :: Research papers
  • Skills :: NLP, Programming, Linguistic analysis

#####14. Sentence level semantic similarity by Karka cluster classification in semantic vector space

  • Description :: Given two sentences of text, s1 and s2, the system need find how similar s1 and s2 are, returning a similarity score, and an optional confidence score.The annotations and systems will use a scale from 0 (no relation) to 5 (semantic equivalence), indicating the similarity between two sentences.
  • Faculty :: Prof. Dipti Misra Sharma
  • Mentor :: Darshan A
  • Students :: 1
  • Skills :: NLP, Programming, Linguistic analysis, Moses,C++, Boost.

#####15. Build a web interface for coreference resolution.#####

  • Description - students need to build web interface for coreference resolution system.
  • Faculty - Prof. Dipti Misra Sharma
  • Mentors - Vandan Mujadia, Darshan Agrawal, Palash Gupta
  • Requirements - HTML5, PHP,jQuery
  • Students - 2

#####16. Title Domain Specific Sentiment Analysis of Telugu

  • Description: We aim to build a comprehensive sentiment analysis system for telugu language in specific domains, which when given a text and a keyword, outputs all sentiment words(positive or negative sentiment tagged) used to express sentiment about the keyword. Dependency parsing is used to extract the sentiment words.
  • Mentor:K. Hemant
  • Students:2-3
  • Skills:NLP,Data Mining, Programming

Machine Translation

#####17. Integrating SMT in ILMT system

  • Description :: Student need to understand the existing SMT system like Moses and modular MT system like ILMT and worked towards the improvement of ILMT system by implementing various Moses feature functions.
  • Faculty :: Prof. Dipti Misra Sharma, Dr. Manish Shrivastava
  • Mentor :: Saumitra
  • Students :: 1
  • Skills :: NLP, Programming, Linguistic analysis,NLP, Moses,C++

Question Answering

#####18. Question Answering on Google

  • Description :: This project aims at building a simple Web scale QA system, which uses Google search results for answer extraction and ranking of the results will be done with designed algorithm. It has mainly 4 steps as follows 1)Question Classification, in which we use Li-Roth based classifier using svm, where we get the coarse grained as well as fine grained class. 2) Answer Retrieval, in this we use the google web search API for querying google, where we can get maximum of 8 results for a page request. then we fire complete user question on google for retrieving the results. 3) Phrase & Named Entity Extractor which tries to extract the Noun Phrases from the Search Results content using nltk chunker. Then we try to extract named entities using stanford Named Entity Recognizer. Then we have to implement 4) Answer Extraction & Ranking module where We try to extract all the different noun phrases and compare its Entity with Answer type of question. Then We rank the matched noun phrases based on the frequency of the noun phrase occurrence in different search results. Those high ranked nouns will be given as output to the user.
  • Faculty :: Prof. Manish Shrivastava , Manoj chinnakotla
  • Mentor :: Harish Yenala , Avinash Kamineni,Abhishek Kannan,Teja
  • Students :: 4-5
  • Skills :: Basic Idea of NLTK and it's usage, Python
  • Resources to be read :: Li-Roth Question classification paper, SVM algorithm , knowledge on Chunker and NER, papers on "QA on unstrured web content"

Speech processing

#####19. Speech recognition using Sphinx.

  • Description :: Speech recognition means speech to text conversion. This project will help to implement Hidden Markov Model (HMM) based speech recognition using MFCC features. SPHINX tool will be used for its implementation.
  • Faculty :: Prof. Anil kumar vuppala
  • Mentor :: A. Raju
  • Students :: 3-4
  • Resources to be read by summer school students :: Research papers

#####20. Speaker recognition using GMM.

  • Description :: Speaker recognition means identification of speaker from speech. This project will help to implement Gaussian Mixture model (GMM) based speaker identification using MFCC features.
  • Faculty :: Prof. Anil kumar vuppala
  • Mentor :: V. Raju
  • Students :: 3-4
  • Resources to be read by summer school students :: Research papers

#####21. Prosody modification of speech.

  • Description :: Prosody means supra-segmental features of speech, namely energy, duration and pitch. This project will help to implement prosody modification i.e changing pitch values or duration etc using SOLA technique.
  • Faculty :: Prof. Anil kumar vuppala
  • Mentor :: Hari Krishna
  • Students :: 3-4
  • Resources to be read by summer school students :: Research papers

#####22. Speech enhancement.

  • Description :: Speech enhancement means enhancing the speech in noisy conditions. There are three different kinds of noises, namely background noise, muti-speaker and reverberant noise. This project helps to enhance degraded speech using spectral processing techniques.
  • Faculty :: Prof. Anil kumar vuppala
  • Mentor :: Mounika
  • Students :: 2-3
  • Resources to be read by summer school students :: Research papers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment