Last active
January 26, 2018 17:12
-
-
Save kenwebb/159b55875112402890d926211468eb91 to your computer and use it in GitHub Desktop.
Data Science Learning Project
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<!--Xholon Workbook http://www.primordion.com/Xholon/gwt/ MIT License, Copyright (C) Ken Webb, Fri Jan 26 2018 12:11:50 GMT-0500 (EST)--> | |
<XholonWorkbook> | |
<Notes><![CDATA[ | |
Xholon | |
------ | |
Title: Data Science Learning Project | |
Description: | |
Url: http://www.primordion.com/Xholon/gwt/ | |
InternalName: 159b55875112402890d926211468eb91 | |
Keywords: | |
My Notes | |
-------- | |
January 19, 2018 | |
I'll suggest a set of 5 or so projects for a hackathon. | |
I would prefer to use R. | |
- people already familiar with Python or other tools could use that | |
The data should be ASCII data or text, rather than images or other binary data. | |
I don't know what should be involved in a hackathon | |
- maybe there are no specific rules | |
- work through "The Roles of the Data Scientist" in Jen's presentation | |
- data collection would typically just involve downloading some files | |
- could spend some time on data storage, as a useful exercise | |
- possibly, each person could work on a separate problem | |
- at intervals, each person or small group could report to the whole group on what they are doing | |
- could solicit help at those or any other times | |
- people could wander around between grous | |
- maybe one person would be responsible for a given competition | |
- but woulld be free to offer help to others, and accept help from others | |
- everyone will have their own expertise, and their own areas they want to learn more about | |
- anyone could give a short tutorial for anyone interested (or for everyone) | |
- for example, I could provide a quick example of moving a CSV dataset into a relational database | |
- and maybe compare R dplyr with SQL | |
- most of the kaggle competitions on this list, have either expired or are just there for fun/learning | |
- the hackathon will not be actually entering any of the kaggle competitions | |
- the only structure of the event would be from | |
(1) the process in Jen's presentation, and | |
(2) the list of competitions | |
(3) regular short breaks where people can quickly talk about what they have learned, problems they're having, requests to move to another projects, etc. | |
- as a minimum, a participant might just work through some existing kaggle "kernels" and discussions | |
- ideally try them out in R (or Python), and report back to the group | |
- what data science /machine learning algorithm was used | |
- for exploratory data analysis | |
- for final analysis | |
- whether data needed to be cleaned | |
- how the results were visualized | |
- we should focus just on R (although individuals could use whatever they want) | |
- ask everyone to bring a laptop with R and RStudio installed | |
- use R notebooks? optional | |
- | |
References | |
---------- | |
(1) https://www.kaggle.com/competitions | |
(2) https://www.kaggle.com/c/forest-cover-type-prediction | |
Forest Cover Type Prediction | |
Use cartographic variables to classify forest categories | |
1,694 teams, 3 years ago | |
(3) https://www.kaggle.com/c/titanic | |
Titanic: Machine Learning from Disaster | |
Start here! Predict survival on the Titanic and get familiar with ML basics | |
(4) https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge | |
Toxic Comment Classification Challenge | |
Identify and classify toxic online comments | |
You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are: | |
toxic | |
severe_toxic | |
obscene | |
threat | |
insult | |
identity_hate | |
You must create a model which predicts a probability of each type of toxicity for each comment. | |
(5) https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting | |
Recruit Restaurant Visitor Forecasting | |
Predict how many future visitors a restaurant will receive | |
(6) http://www.cbc.ca/news2/interactives/sh/wex94ODaUs/trump-robert-mercer-billionaire/ | |
Money man | |
Reclusive U.S. billionaire Robert Mercer helped Donald Trump win the presidency. But what is his ultimate goal? | |
(7) https://www.kaggle.com/c/text-normalization-challenge-english-language | |
Text Normalization Challenge - English Language | |
Convert English text from written expressions into spoken forms | |
(8) https://www.kaggle.com/learn/overview | |
on-line courses | |
KSW after a quick look, I think they are too basic for me | |
(9) https://www.kaggle.com/c/web-traffic-time-series-forecasting | |
Web Traffic Time Series Forecasting | |
Forecast future traffic to Wikipedia pages | |
KSW | |
- my immediate thought is that this would depend on external factors such as what's in the news, whether schools are in session, etc | |
- it might be worth while to compare traffic for the same topic in different human languages | |
(10) https://www.kaggle.com/c/march-machine-learning-mania-2017 | |
March Machine Learning Mania 2017 | |
Predict the 2017 NCAA Basketball Tournament | |
(11) https://www.kaggle.com/c/integer-sequence-learning | |
) https://www.kaggle.com/c/integer-sequence-learning/kernels | |
Integer Sequence Learning | |
Kaggle is hosting this competition for the data science community to use for fun and education. | |
(12) https://www.kaggle.com/c/shelter-animal-outcomes | |
Shelter Animal Outcomes | |
Help improve outcomes for shelter animals | |
(13) http://www.image-net.org/ | |
ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), | |
in which each node of the hierarchy is depicted by hundreds and thousands of images. | |
Currently we have an average of over five hundred images per node. | |
We hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures. | |
KSW they offer challenges, that are also on kaggle | |
]]></Notes> | |
<_-.XholonClass> | |
<!-- domain objects --> | |
<PhysicalSystem/> | |
<Block/> | |
<Brick/> | |
<!-- quantities --> | |
<Height superClass="Quantity"/> | |
</_-.XholonClass> | |
<xholonClassDetails> | |
<Block> | |
<port name="height" connector="Height"/> | |
</Block> | |
</xholonClassDetails> | |
<PhysicalSystem> | |
<Block> | |
<Height>0.1 m</Height> | |
</Block> | |
<Brick multiplicity="2"/> | |
</PhysicalSystem> | |
<Blockbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[ | |
var a = 123; | |
var b = 456; | |
var c = a * b; | |
if (console) { | |
console.log(c); | |
} | |
]]></Blockbehavior> | |
<Heightbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[ | |
var myHeight, testing; | |
var beh = { | |
postConfigure: function() { | |
testing = Math.floor(Math.random() * 10); | |
myHeight = this.cnode.parent(); | |
}, | |
act: function() { | |
myHeight.println(this.toString()); | |
}, | |
toString: function() { | |
return "testing:" + testing; | |
} | |
} | |
]]></Heightbehavior> | |
<Brickbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[ | |
$wnd.xh.Brickbehavior = function Brickbehavior() {} | |
$wnd.xh.Brickbehavior.prototype.postConfigure = function() { | |
this.brick = this.cnode.parent(); | |
this.iam = " red brick"; | |
}; | |
$wnd.xh.Brickbehavior.prototype.act = function() { | |
this.brick.println("I am a" + this.iam); | |
}; | |
]]></Brickbehavior> | |
<Brickbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[ | |
console.log("I'm another brick behavior"); | |
]]></Brickbehavior> | |
<SvgClient><Attribute_String roleName="svgUri"><![CDATA[data:image/svg+xml, | |
<svg width="100" height="50" xmlns="http://www.w3.org/2000/svg"> | |
<g> | |
<title>Block</title> | |
<rect id="PhysicalSystem/Block" fill="#98FB98" height="50" width="50" x="25" y="0"/> | |
<g> | |
<title>Height</title> | |
<rect id="PhysicalSystem/Block/Height" fill="#6AB06A" height="50" width="10" x="80" y="0"/> | |
</g> | |
</g> | |
</svg> | |
]]></Attribute_String><Attribute_String roleName="setup">${MODELNAME_DEFAULT},${SVGURI_DEFAULT}</Attribute_String></SvgClient> | |
</XholonWorkbook> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment