Skip to content

Instantly share code, notes, and snippets.

@kenwebb
Last active January 26, 2018 17:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kenwebb/159b55875112402890d926211468eb91 to your computer and use it in GitHub Desktop.
Save kenwebb/159b55875112402890d926211468eb91 to your computer and use it in GitHub Desktop.
Data Science Learning Project
<?xml version="1.0" encoding="UTF-8"?>
<!--Xholon Workbook http://www.primordion.com/Xholon/gwt/ MIT License, Copyright (C) Ken Webb, Fri Jan 26 2018 12:11:50 GMT-0500 (EST)-->
<XholonWorkbook>
<Notes><![CDATA[
Xholon
------
Title: Data Science Learning Project
Description:
Url: http://www.primordion.com/Xholon/gwt/
InternalName: 159b55875112402890d926211468eb91
Keywords:
My Notes
--------
January 19, 2018
I'll suggest a set of 5 or so projects for a hackathon.
I would prefer to use R.
- people already familiar with Python or other tools could use that
The data should be ASCII data or text, rather than images or other binary data.
I don't know what should be involved in a hackathon
- maybe there are no specific rules
- work through "The Roles of the Data Scientist" in Jen's presentation
- data collection would typically just involve downloading some files
- could spend some time on data storage, as a useful exercise
- possibly, each person could work on a separate problem
- at intervals, each person or small group could report to the whole group on what they are doing
- could solicit help at those or any other times
- people could wander around between grous
- maybe one person would be responsible for a given competition
- but woulld be free to offer help to others, and accept help from others
- everyone will have their own expertise, and their own areas they want to learn more about
- anyone could give a short tutorial for anyone interested (or for everyone)
- for example, I could provide a quick example of moving a CSV dataset into a relational database
- and maybe compare R dplyr with SQL
- most of the kaggle competitions on this list, have either expired or are just there for fun/learning
- the hackathon will not be actually entering any of the kaggle competitions
- the only structure of the event would be from
(1) the process in Jen's presentation, and
(2) the list of competitions
(3) regular short breaks where people can quickly talk about what they have learned, problems they're having, requests to move to another projects, etc.
- as a minimum, a participant might just work through some existing kaggle "kernels" and discussions
- ideally try them out in R (or Python), and report back to the group
- what data science /machine learning algorithm was used
- for exploratory data analysis
- for final analysis
- whether data needed to be cleaned
- how the results were visualized
- we should focus just on R (although individuals could use whatever they want)
- ask everyone to bring a laptop with R and RStudio installed
- use R notebooks? optional
-
References
----------
(1) https://www.kaggle.com/competitions
(2) https://www.kaggle.com/c/forest-cover-type-prediction
Forest Cover Type Prediction
Use cartographic variables to classify forest categories
1,694 teams, 3 years ago
(3) https://www.kaggle.com/c/titanic
Titanic: Machine Learning from Disaster
Start here! Predict survival on the Titanic and get familiar with ML basics
(4) https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
Toxic Comment Classification Challenge
Identify and classify toxic online comments
You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:
toxic
severe_toxic
obscene
threat
insult
identity_hate
You must create a model which predicts a probability of each type of toxicity for each comment.
(5) https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting
Recruit Restaurant Visitor Forecasting
Predict how many future visitors a restaurant will receive
(6) http://www.cbc.ca/news2/interactives/sh/wex94ODaUs/trump-robert-mercer-billionaire/
Money man
Reclusive U.S. billionaire Robert Mercer helped Donald Trump win the presidency. But what is his ultimate goal?
(7) https://www.kaggle.com/c/text-normalization-challenge-english-language
Text Normalization Challenge - English Language
Convert English text from written expressions into spoken forms
(8) https://www.kaggle.com/learn/overview
on-line courses
KSW after a quick look, I think they are too basic for me
(9) https://www.kaggle.com/c/web-traffic-time-series-forecasting
Web Traffic Time Series Forecasting
Forecast future traffic to Wikipedia pages
KSW
- my immediate thought is that this would depend on external factors such as what's in the news, whether schools are in session, etc
- it might be worth while to compare traffic for the same topic in different human languages
(10) https://www.kaggle.com/c/march-machine-learning-mania-2017
March Machine Learning Mania 2017
Predict the 2017 NCAA Basketball Tournament
(11) https://www.kaggle.com/c/integer-sequence-learning
) https://www.kaggle.com/c/integer-sequence-learning/kernels
Integer Sequence Learning
Kaggle is hosting this competition for the data science community to use for fun and education.
(12) https://www.kaggle.com/c/shelter-animal-outcomes
Shelter Animal Outcomes
Help improve outcomes for shelter animals
(13) http://www.image-net.org/
ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns),
in which each node of the hierarchy is depicted by hundreds and thousands of images.
Currently we have an average of over five hundred images per node.
We hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures.
KSW they offer challenges, that are also on kaggle
]]></Notes>
<_-.XholonClass>
<!-- domain objects -->
<PhysicalSystem/>
<Block/>
<Brick/>
<!-- quantities -->
<Height superClass="Quantity"/>
</_-.XholonClass>
<xholonClassDetails>
<Block>
<port name="height" connector="Height"/>
</Block>
</xholonClassDetails>
<PhysicalSystem>
<Block>
<Height>0.1 m</Height>
</Block>
<Brick multiplicity="2"/>
</PhysicalSystem>
<Blockbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[
var a = 123;
var b = 456;
var c = a * b;
if (console) {
console.log(c);
}
]]></Blockbehavior>
<Heightbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[
var myHeight, testing;
var beh = {
postConfigure: function() {
testing = Math.floor(Math.random() * 10);
myHeight = this.cnode.parent();
},
act: function() {
myHeight.println(this.toString());
},
toString: function() {
return "testing:" + testing;
}
}
]]></Heightbehavior>
<Brickbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[
$wnd.xh.Brickbehavior = function Brickbehavior() {}
$wnd.xh.Brickbehavior.prototype.postConfigure = function() {
this.brick = this.cnode.parent();
this.iam = " red brick";
};
$wnd.xh.Brickbehavior.prototype.act = function() {
this.brick.println("I am a" + this.iam);
};
]]></Brickbehavior>
<Brickbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[
console.log("I'm another brick behavior");
]]></Brickbehavior>
<SvgClient><Attribute_String roleName="svgUri"><![CDATA[data:image/svg+xml,
<svg width="100" height="50" xmlns="http://www.w3.org/2000/svg">
<g>
<title>Block</title>
<rect id="PhysicalSystem/Block" fill="#98FB98" height="50" width="50" x="25" y="0"/>
<g>
<title>Height</title>
<rect id="PhysicalSystem/Block/Height" fill="#6AB06A" height="50" width="10" x="80" y="0"/>
</g>
</g>
</svg>
]]></Attribute_String><Attribute_String roleName="setup">${MODELNAME_DEFAULT},${SVGURI_DEFAULT}</Attribute_String></SvgClient>
</XholonWorkbook>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment