Skip to content

Instantly share code, notes, and snippets.

View Prakhar0409's full-sized avatar

Prakhar Agrawal Prakhar0409

View GitHub Profile

Frustratingly Easy Domain Adaptation

Link to the paper

The paper can be found here

What is this about?

Put in simple words: The paper presents a method on how you can train a model when you have only a small amount of (labelled) data in the domain you are working on, but have access to loads of (labelled) data from some other domain. The paper has been named so, because the author suggests that it can be frustrating when you figure out that simple methods like those illustrated can be such difficult benchmarks to beat and perform reasonably well.

@Prakhar0409
Prakhar0409 / Neural-Question-Generation.md
Last active May 11, 2023 10:12
SOTA automatic question generation system as of March, 2018

Neural Question Generation for Reading Comprehension

Link to the paper

The paper can be found here

Goal of the paper

Automatic question generation for sentexces from passages in reading comprehension

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

Link to the paper

The paper can be found here

What is this about?

Put in simple words: The paper presents a way on how you can classify text without any annotated data (i.e. unsupervised) and some minimal domain knowledge. The paper uses the domain of reviews, where the domain knowledge is knowing excellent is positive while poor is negative sentiment.

@Prakhar0409
Prakhar0409 / GFS.md
Created October 29, 2017 15:44
It presents Google File System, a scalable distributed file system for large distributed data-intensive applications, which provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.

The Google Filesystem

Link to the paper

The paper can be found here.

Introduction

GFS is a scalable distributed FS for large distributed data intensive applications. Capabilities of fault tolerance and high streaming performance are inbuil while running on commodity grade hardware.

var watson = require('watson-developer-cloud');
var NaturalLanguageUnderstandingV1 = require('watson-developer-cloud/natural-language-understanding/v1.js');
var nlu = new NaturalLanguageUnderstandingV1(<API SECRET>);
function getConcepts(text,callback){
nlu.analyze({
'html': text, // Search concepts related to the text in the buffer/String
'features': { // Search concepts related to these keywords/concepts.
'concepts': {}, 'keywords': {},}
@Prakhar0409
Prakhar0409 / Wikipedia-HTML-Search-in-NodeJs
Created September 30, 2017 12:31
A quick description of how any wikipedia can be searched for some query and retrieve result in HTML using NodeJs
var wikipedia = require("wikipedia-js"); // using wikipedia-js library for search
var options = {query: query, format: "html", summaryOnly: false, lang: "en"};
wikipedia.searchArticle(options, function(err, htmlWikiText){ // searching wikipedia with options
if(err){
console.log("An error occurred[query=%s, error=%s]", query, err);
return;
}
callback('<div style="text-align: justify !important;">'+htmlWikiText+'</div>'); // Pretty print the wiki-html-text
});
@Prakhar0409
Prakhar0409 / epidemic.md
Last active March 14, 2020 18:34
This gist describes two basic epidemic-like algorithms (Anti-entropy and Rumor Mongering) which are highly popular and used for maintaining consistency across replicated databases

Epidemic Algorithms for Replicated Database Management

Why use?

Maintaining mutual consistency across different sites, on updates, insertion and deletions, when a database is replicated is non-trivial and a significant problem. Though, it sounds reasonable to maintain a list of all replication servers and send direct updates to all when an update occurs at a site, it can cause large network load on the link of the node that has the initial update. Also, in case of constantly adding and leaving nodes, maintaining a consistent list of a million or a few hundered thousand nodes at every site consistently itself is difficult. In the face of the above mentioned problems, the algorithms described in the paper can come in handy.

The described algorithms have been used in the clearinghouse servers of the Xerox Corporate Internet and have proven to be very useful.

Formal introduction to the Problem

@Prakhar0409
Prakhar0409 / MapReduce.md
Last active December 11, 2016 20:13
A summary of the introductory paper to the MapReduce programming model that automatically parallelises a simple computation program on a large dataset.

MapReduce: Simplified Data Processing on Large Clusters

Link to the paper

The paper can be found here.

Gist in short

The paper is written in a very elegant and easy to understand way is and divided in 6 parts explaining topics from implementation to the usage at google. It exposes the reader to the immense power in functional languages and is explains how the programming model of map reduce is inspired from it.

@Prakhar0409
Prakhar0409 / Never-Ending Learning.md
Last active May 11, 2023 10:10
The gist is a text summarisation of a paper on Never-Ending Learning. This is a part of my new initiative a-paper-a-week inspired from Shagun Sodhani.

Never-Ending Learning

Link to the paper

The paper can be found here.

Introduction and Intuition

This paper explores an alternative paradigm for machine learning that more closely models the diversity, competence and cumulative nature of human learning (called never-ending learning). It compares, the current day machine learning systems which have a very narrow scope and learn only a single function from very specific and limited training examples in a particular format, to the broad learning that humans undergo. It also presents a case study of NELL(never-ending language learner) from CMU and discusses the acheivements and laggings in the system. The paper has little formalism and nearly no mathematical concreteness but explores a powerful machine learning paradigm, backed up with intuitive reasoning, that may see some light in the future. For some concreteness we start by defining any general purpose agent (supervised learning) in machine learning c

@Prakhar0409
Prakhar0409 / ubuntu-iitd.md
Last active May 11, 2017 09:18 — forked from rishirdua/ubuntu-iitd.md
Configuring Ubuntu for using IITD internet