This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ಮತ್ತು | |
ಈ | |
ಒಂದು | |
ರಲ್ಲಿ | |
ಹಾಗೂ | |
ಎಂದು | |
ಅಥವಾ | |
ಇದು | |
ರ | |
ಅವರು |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!DOCTYPE html> | |
<html> | |
<body> | |
<h1>Extracting Skills from Personal Communication Data using StackExchange Dataset</h1> | |
<p>In this blog, we will see how to make use of the stack exchange publicly available dump to extract skills from the communication data. | |
First, download the entire stack exchange dataset. | |
The entire stackexchange dataset can be downloaded <a href=" https://archive.org/details/stackexchange">here</a>. There are many stackexchange websites like stackoverflow,cs, datascience, physics, history and so on. One can download the necessary compressed files or one can download the entire dump using torrents. Since, we were using linux on openstack framework, we had to download the torrent files from the terminal and more information about downloading the torrent files from command line is <a href="https://www.learn2crack.com/2013/10/download-torrent-using-terminal.html">here</a>. After downloading the files extract the 7z files (Can be done in one script). Each 7z file corresponds to a stackexchange |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import csv | |
import random | |
import math | |
import operator | |
def loadDataset(filename, split, trainingSet=[] , testSet=[]): | |
with open(filename, 'rb') as csvfile: | |
lines = csv.reader(csvfile) | |
dataset = list(lines) | |
for x in range(len(dataset)-1): |