Previously...
Gemeinden in der Schweiz - ersten Überblick zu den Daten - Fragen: Hat mit dem Einlesen alles geklappt wie erwartet? Sind die Daten so kodiert, wie Sie es erwarten? Stimmen die Datenformate der Variablen? Gibt es fehlende Werte und wie sind diese kodiert?
define: loleg
- Introductory slides
- Open Source: http://loleg.github.io
- Open Data: http://opendata.ch/organisation/
- School of Data: http://schoolofdata.ch/
define: data
Data as concept
- da•ta (dāˈtə, dătˈə, däˈtə)
- n. Factual information, especially information organized for analysis or used to reason or make decisions.
- n. Computer Science Numerical or other information represented in a form suitable for processing by computer.
- n. Values derived from scientific experiments.
Everyday data manipulation and analysis
- We want to see facts (decisive statements) backed up with data (empirical evidence)
- Data as raw material of information - the building blocks of an Information Society
- The Web of Data helps to put relevant information at the center of the lives of billions.
- Data literacy is a critical skill that all people have - to some extent.
- "Good data" helps us to make informed, consistent decisions - from everyday lives to work and politics.
- It is, in part, the ability to make the most of the available resources to find, retrieve and republish such data sources, combined with the ability to critically discern accuracy, applicability and quality.
The following is an excerpt from the Open Data Handbook, which defines Open data by the Open Definition:
Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.
The full Open Definition gives precise details as to what this means. To summarize the most important:
- Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.
- Re-use and Redistribution: the data must be provided under terms that permit re-use and redistribution including the intermixing with other datasets.
- Universal Participation: everyone must be able to use, re-use and redistribute - there should be no discrimination against fields of endeavour or against persons or groups. For example, 'non-commercial' restrictions that would prevent 'commercial' use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.
If you're wondering why it is so important to be clear about what open means and why this definition is used, there's a simple answer: interoperability. Continue reading in the Open Data Handbook - What is Open Data? for more details on interoperability.
Open data licenses have parallels - compare:
- Terms of Use of Opendata.swiss to
- Creative Commons licenses (CC0, CC BY, CC BY-SA, ..)
Open data is integral to open technologies
- Open standards and formats, e.g. HTML5, CSV, JSON, ...
- Open source tools, e.g. R...
- Open data portals, e.g. DataHub, Opendata.swiss
Further links
- The Open Definition
- Swiss Open Source Initiative
- Open Access Initiatives, e.g. O.A. Button
- Open Knowledge International
"Das Thema Open Data bewegt eine grosse Vielfalt von Akteuren in Behörden, Medien, Firmen und der wachsenden Schweizer Community einzelner Entwickler, Designer und Aktivisten. Die Dynamik ist da, der politische Wille entsteht, Austausch findet statt." http://make.opendata.ch
"Nach wie vor sind auf allen Seiten Unsicherheiten vorhanden – ganz ähnlich wie am Anfang anderer grosser Entwicklungen zwischen Gesellschaft und Technologie – aber auch grosser Elan, und die Chance für einen echten Innovationsimpuls und einen bleibenden Kulturwandel." http://theodi.org/culture
"Open source is describing the free exchange of ideas in any atmosphere." (Forbes 2014)
Open data is the raw material of a new industrial revolution - “Open data has the potential to radically change the way organisations value data” (The Guardian 2014)
Who's downloading pirated papers? Everyone. Elbakyan A, Bohannon J 2016
"This data set is messy. There are many known problems, both with the format and the contents. Here's what you should know." (Reclaim the Records NYC)
"Die intelligente Nutzung von offenen Daten [kann] einen großen Mehrwert für die eigene Arbeit generieren. Diese Beispiele setzen folglich die richtigen Anreize, um verwaltungsintern Fürsprecher für Open Data zu gewinnen." -Niels Reinhard, idalab
Species of open data
- Government data/statistics: opendata.swiss
- Geographic data: GeoAdmin, OpenStreetMap
- Health data: OpenTrials, Mastodon C
- Science results: OpenResearchData.ch
- Data about companies: OpenCorporates
- Social network data, e.g. Twitter
- etc.
Finanzen
Umwelt
Öffentliche Verkehr
- fahrplan.py
- Transport API (transport.opendata.ch)
- transport apps (make.opendata.ch)
- data.sbb.ch
Further reading
What are hackathons for?
- Focusing on topics of interest
- Working in teams, mixing roles
- Finding and creating data sources
What tools do we use? What is a good tool for open data?
- Open source, multiplatform, community supported
- Source control (attribution, change history)
- Supports open standards, attention to metadata
- Spreadsheets and beyond...
Example: OpenRefine
- As used by and taught at School of Data
- As used in journalism (via Beobachter)
What tools support data analysis?
- Machine learning in R
- Sci-kit Learn and Project Jupyter (Python)
- Explore in Google Docs
Go over our group brainstorm (snapshot here), choose a topic and share any further thoughts/ ideas/ datasetes on January 26. Topics preview:
- Working with R datasets and External data
- Why and how of the Data Packages specification
- Visualising (R notebooks, D3.js, ..)
- Publishing datasets online
- Open data portals