Skip to content

Instantly share code, notes, and snippets.

@tomardern
Created March 27, 2015 11:01
Show Gist options
  • Save tomardern/e41c1944a6b5d48ac3bc to your computer and use it in GitHub Desktop.
Save tomardern/e41c1944a6b5d48ac3bc to your computer and use it in GitHub Desktop.
Weka PowerShell Script
cls
$path = "C:\Weka-3-6\data\"
$output = "C:\Weka-3-6\dwm_output\"
$files = get-childitem $path
Add-Content $output"console_log.txt" "STARTING NEW OUTPUT =-------------------------------- ";
#Loop though all the files of directory
foreach ($file in $files) {
#Full File Path
$fpath = $path + $file.name
#Get the file name, but with no extension
$name_no_ext = [System.IO.Path]::GetFileNameWithoutExtension($file.name)
#Stop Watch Object
$stopwatch = New-Object System.Diagnostics.Stopwatch
echo "==========================================================="
echo $name_no_ext" "$fpath
echo "==========================================================="
#Lets do the J48 Decision Tree - This is the easiest to complete
$stopwatch.Start()
java -classpath "C:\Weka-3-6\weka.jar" weka.classifiers.trees.J48 -C 0.25 -M 2 -t $fpath >$output"j48_"$name_no_ext".txt" -i
$stopwatch.Stop()
$timeTaken = $stopwatch.ElapsedMilliseconds;
echo "J48: $timeTaken Milliseconds to complete"
Add-Content $output"console_log.txt" "j48_$name_no_ext.txt $timeTaken";
#Lets do the Naive Bayes
$stopwatch.Start()
java -classpath "C:\Weka-3-6\weka.jar" weka.classifiers.bayes.NaiveBayes -t $fpath >$output"nb_"$name_no_ext".txt" -i
$stopwatch.Stop()
$timeTaken = $stopwatch.ElapsedMilliseconds;
echo "Naive Bayes: $timeTaken Milliseconds to complete"
Add-Content $output"console_log.txt" "nb_$name_no_ext.txt $timeTaken";
#Lets do the K-NN with K = 1 to get our initial data
$stopwatch.Start()
java -classpath "C:\Weka-3-6\weka.jar" weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \`"weka.core.EuclideanDistance -R first-last\`"" -t $fpath >$output"knn_1_"$name_no_ext".txt" -i
$stopwatch.Stop()
$timeTaken = $stopwatch.ElapsedMilliseconds
echo "K-NN (K=1): $timeTaken Milliseconds to complete"
Add-Content $output"console_log.txt" "knn_1_$name_no_ext.txt $timeTaken";
#Now we need to get the number of class labels in the dataset
$knnname = $output+"knn_1_"+$name_no_ext+".txt";
$text = [IO.File]::ReadAllText($knnname);
#Look for the confusion metrics to get the number of classifications
$found = gc $knnname
$found = $found -match "(.+)<-- classified as"
#Tidy up just so we have the number of classifications
$found = $found[0].Substring(0,$found[0].length - 18) -replace " ",""
echo $found.length" classifications have been found"
for($i=1;$i -le $found.length;$i++){
#Lets do the K-NN with K = 1 to get our initial data
$stopwatch.Start()
java -classpath "C:\Weka-3-6\weka.jar" weka.classifiers.lazy.IBk -K $i -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \`"weka.core.EuclideanDistance -R first-last\`"" -t $fpath >$output"knn_"$i"_"$name_no_ext".txt" -i
$stopwatch.Stop();
$timeTaken = $stopwatch.ElapsedMilliseconds;
echo " K-NN (K=$i): $timeTaken Milliseconds to complete";
$iname = $i.ToSTring() + "_";
echo "knn_$iname$name_no_ext.txt";
Add-Content $output"console_log.txt" "knn_$iname$name_no_ext.txt $timeTaken";
}
echo ""
}
@tomardern
Copy link
Author

Data mining classification techniques can be executed using the Command Line Interface (CLI) within WEKA (University of Waikto, 2012). This therefore enables WEKA to be programmed to carry out a large series of tasks automatically. The following commands run the three required classification techniques; Decision Tree (J48), Naïve Bayes (NB) and K-Nearest Neighbour (K-NN):
java -classpath "C:\Weka-3-6\weka.jar" weka.classifiers.trees.J48 -C 0.25 -M 2 -t $fpath >$output"j48_"$name_no_ext".txt" -i
java -classpath "C:\Weka-3-6\weka.jar" weka.classifiers.bayes.NaiveBayes -t $fpath >$output"nb_"$name_no_ext".txt" -i
java -classpath "C:\Weka-3-6\weka.jar" weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A "weka.core.EuclideanDistance -R first-last"" -t $fpath >$output"knn_1_"$name_no_ext".txt" -i
Figure 1 - Running WEKA from Command Line

Using the above three commands, a simple shell/batch script can be created to scan though a directory of .arff files and create .txt files from the output. The required data (such as number of instances, attributes, time taken etc) are extracted from the output .txt using any programming language (in this case, PHP was used). The data is then organised and converted into a .csv file. This can then be opened in Microsoft Excel or imported into WEKA to carry out data mining techniques. The .csv is shown in appendix 1.
As WEKA has been automated to execute the required techniques for all .arff within a directory, the number of datasets used is only limited by computational time and time required to analyse the output. For this comparison, a total of 58 datasets have been used. All required techniques can be used on all chosen datasets without WEKA raising any exceptions or errors.
The script was executed using the default Java settings on an Intel Core i5-2500 CPU at 3.3GHz, 16GB RAM and datasets stored on an SSD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment