Created
October 9, 2014 14:27
-
-
Save tkuhn/b82de11e176f4b282ddb to your computer and use it in GitHub Desktop.
Trusty URI presentation sources
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
\documentclass{beamer} | |
\usepackage{graphicx} | |
\usepackage{multicol} | |
\usepackage{bchart} | |
\usepackage{wasysym} | |
\usepackage{amssymb} | |
\usepackage{multirow} | |
\usepackage{rotating} | |
\usepackage{tikz} | |
\frenchspacing | |
\beamertemplatenavigationsymbolsempty | |
\usefonttheme{structurebold} | |
\useoutertheme{infolines} | |
\useinnertheme{circles} | |
\setbeamersize{text margin left=8mm, text margin right=8mm} | |
\setbeamertemplate{frametitle}{\begin{centering}\insertframetitle\par\end{centering}} | |
\definecolor{emphcolor}{rgb}{0.19,0.22,0.68} | |
\definecolor{backcolor}{rgb}{0.84,0.85,0.95} | |
\newcommand{\coloremph}[1]{{\color{emphcolor}#1}} | |
\newcommand{\colorbold}[1]{{\color{emphcolor}\bfseries#1}} | |
\newcommand{\arrowitem}{\item[\boldmath$\Rightarrow$]} | |
\newcommand{\plainfootnote}[1]{\let\thefootnote\relax\footnote{\vspace{-3.5mm}~\\\hspace{1.5mm}\tiny #1\vspace{0.5mm}}} | |
\setbeamercolor{block title}{bg=emphcolor,fg=white} | |
\setbeamercolor{block body}{bg=backcolor,fg=black} | |
\setbeamertemplate{blocks}[rounded] | |
\begin{document} | |
\setbeamertemplate{headline}{} | |
\setbeamertemplate{footline}{} | |
\title[Trusty URIs]{Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data\\\medskip\texttt{\#eswc2014Kuhn}} | |
\author[Tobias Kuhn]{\colorbold{Tobias Kuhn} and Michel Dumontier\\\medskip\small\url{http://www.tkuhn.ch} / \url{http://dumontierlab.com}\\\texttt{@txkuhn} / \texttt{@micheldumontier}} | |
\institute[ETH Zurich]{ETH Zurich / Stanford University} | |
\date{ESWC\\27 May 2014} | |
\begin{frame} | |
\titlepage | |
\end{frame} | |
\setbeamertemplate{footline}{% | |
\color{gray}~~% | |
\insertshortauthor, \insertshortinstitute \hfill % | |
\insertshorttitle \hfill % | |
%\insertshortdate \hfill % | |
\insertframenumber\,/\,\inserttotalframenumber% | |
~~\smallskip\newline% | |
} | |
\begin{frame} | |
\frametitle{Motivation 1} | |
\colorbold{The Semantic Web:} Web content becomes machine-interpretable. | |
\par\bigskip | |
\coloremph{Machines} (i.e. algorithms) can then perform --- on large amounts of linked data --- tasks such as: | |
\emph{automated aggregation, complex searches, problem solving, recommendations, and much more ...} | |
\begin{center} | |
\includegraphics[width=2cm]{computer.pdf} | |
\end{center} | |
\colorbold{But wait...} even human users are often \coloremph{easy to trick by spam and fraudulent content} found on the web. \colorbold{We should be even more concerned in the case of machines!} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Motivation 2} | |
Sue publishes a script that allows everybody to \coloremph{replicate} her scientific analysis:\bigskip\\ | |
\begin{minipage}{1.6cm} | |
\includegraphics[width=1.2cm]{researcher.pdf} | |
\end{minipage}% | |
\begin{minipage}{7cm} | |
\texttt{\coloremph{\small% | |
\# Download data:\\ | |
wget http://some-third-party.org/dataset/1.4\\ | |
\# Analyze data\\ | |
... | |
}} | |
\end{minipage}\bigskip\\ | |
But what if the third party \coloremph{silently changes} that version of the dataset? What if the resource becomes \coloremph{unavailable} at this location? What if the web site later gets hacked and the \coloremph{data manipulated}? | |
\end{frame} | |
\begin{frame} | |
\frametitle{Motivation 3} | |
\colorbold{Nanopublications:} Atomic pieces of scientific results together with their provenance, all represented in RDF. | |
\begin{itemize} | |
\item \coloremph{Citation networks:} nanopubs can cite or refer to other nanopubs | |
\item Nanopubs are supposed to be \coloremph{immutable} | |
\end{itemize} | |
\vspace{-3mm} | |
\begin{center} | |
\includegraphics[width=8cm]{nanopubs.pdf} | |
\end{center} | |
\vspace{-3mm} | |
\colorbold{Problem:} | |
\begin{itemize} | |
\item A scientist citing something wants to be sure that it is \coloremph{not silently changed afterwards} | |
\item The current web has \coloremph{no mechanism} to enforce immutability | |
\end{itemize} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Problem} | |
\begin{center} | |
{\texttt{http://some-third-party.org/dataset/1.4}} | |
\medskip\\ | |
\color{emphcolor}{\scalebox{3}{\phantom{?} $\Updownarrow$ \textbf{?}}} | |
\medskip\\ | |
\includegraphics[width=2cm]{file.png} | |
\end{center} | |
Given a URI for a digital artifact, there is no reliable standard procedure of checking whether a retrieved file really represents the \coloremph{correct and original state} of that artifact. | |
\end{frame} | |
\begin{frame} | |
\frametitle{We need URIs we can Trust!} | |
\begin{center} | |
\includegraphics[width=40mm]{trustyuri.png}\\ | |
\LARGE\textbf{Trusty URIs} | |
\end{center} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Trusty URIs} | |
\colorbold{Basic idea:} Use of \coloremph{cryptographic hash values} calculated on digital artifacts. | |
\par\bigskip | |
\colorbold{Requirements:} | |
\begin{itemize} | |
\item To allow for the verification of entire reference trees, \coloremph{the hash should be part of the reference} (i.e. the URI) | |
\item To allow for meta-data, digital artifacts should be allowed to contain \coloremph{self-references} (i.e. their own URI) | |
\item \coloremph{Format-independent} hash for \coloremph{different kinds of content} | |
\item The complete approach should be \coloremph{decentralized and open} | |
\item We want to use them \coloremph{right away} | |
\end{itemize} | |
\par\bigskip | |
\colorbold{Example:} | |
{\footnotesize | |
\coloremph{\texttt{http://example.org/r1.RA5AbXdpz5DcaYXCh9l3eI9ruBosiL5XDU3rxBbBaUO70}} | |
} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Trusty URIs: Range of Verifiability} | |
\vfill | |
With the hash as a part of the URI, the \coloremph{``range of verifiability''} extends to referenced artifacts (if they also use trusty URIs): | |
\vfill\par | |
\makebox[\textwidth][c]{\includegraphics[width=1.1\textwidth]{verifiability.pdf}} | |
\vfill | |
\end{frame} | |
\begin{frame} | |
\frametitle{Trusty URI Modules} | |
Currently, there are two trusty URI modules: | |
\begin{itemize} | |
\item \colorbold{FA:} Plain files (i.e. byte sequences) | |
\item \colorbold{RA:} \coloremph{Sets of RDF graphs} | |
\item More to come in the future... | |
\end{itemize} | |
\par\bigskip | |
The first character (\colorbold{F} or \colorbold{R}) represents the \coloremph{type} of the module; the second character (\colorbold{A}) its \coloremph{version}. | |
\end{frame} | |
\begin{frame} | |
\frametitle{Example: Nanobrowser} | |
\begin{center} | |
\makebox[\textwidth][c]{% | |
\scalebox{0.9}{% | |
\setlength{\fboxsep}{0mm}% | |
\begin{tikzpicture} | |
\node[inner sep=0mm] (img) at (0,0) {\fbox{\includegraphics[width=13.5cm]{nanobrowser.png}}}; | |
\node[inner sep=0mm] (i) at (-1.5,-2) {\setlength{\fboxrule}{0.5mm}\fbox{\includegraphics[scale=0.4]{nanobrowser-valid.png}}}; | |
\node[circle,fill=white,draw,very thick] at (i.north east) {\textbf{1}}; | |
\draw[->,very thick] (i) -- (-6.1,2.7); | |
\node[inner sep=0mm] (l) at (1.5,0.5) {\setlength{\fboxrule}{0.5mm}\fbox{\includegraphics[scale=0.4]{nanobrowser-buttons.png}}}; | |
\node[circle,fill=white,draw,very thick] at (l.north east) {\textbf{2}}; | |
\draw[->,very thick] (l) -- (-1.2,2.7); | |
\end{tikzpicture}% | |
}}% | |
\smallskip\par | |
\url{http://nanobrowser.inn.ac} | |
\end{center} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Verifiable {\normalfont --- Immutable --- Permanent}} | |
\begin{center} | |
\color{green}{\scalebox{7.0}{\checkmark}} | |
\end{center} | |
\vfill | |
Whether or not a given resource is the one a given trusty URI is supposed to represent can be \colorbold{verified with perfect confidence}. | |
\par\bigskip\footnotesize | |
(assuming that the trusty URI for the required artifact is known, e.g. because another artifact contains it as a link) | |
\end{frame} | |
\begin{frame} | |
\frametitle{{\normalfont Verifiable ---} Immutable {\normalfont --- Permanent}} | |
\begin{center} | |
\scalebox{5.0}{\includegraphics[width=0.5cm]{pen.pdf}} | |
\end{center} | |
\vfill | |
Trusty URI artifacts are \colorbold{immutable}, as any change in the content also changes its URI, thereby making it a \coloremph{new} artifact. | |
\par\bigskip\footnotesize | |
(as soon as your trusty URI has been picked up by third parties, e.g. cached or linked from other resources, every change will be noticed) | |
\end{frame} | |
\begin{frame} | |
\frametitle{{\normalfont Verifiable --- Immutable ---} Permanent} | |
\begin{center} | |
\color{brown}{\scalebox{9.0}{\clock}} | |
\end{center} | |
\vfill | |
Trusty URI artifacts are \colorbold{permanent}, as they can be retrieved from the cache of third-party websites if otherwise no longer available. | |
\par\bigskip\footnotesize | |
(if there are search engines and web archives regularly crawling and caching the artifacts on the web) | |
\end{frame} | |
\begin{frame} | |
\frametitle{Permanent Digital Artifacts} | |
\colorbold{Ideally,} a (trusty) artifact should be \coloremph{retrievable via its URI}: | |
\begin{itemize} | |
\item[\LARGE $\Rightarrow$] {\footnotesize\texttt{http://my-organization.org/datasets/RA5AbX...}} | |
\end{itemize} | |
\medskip\par | |
\colorbold{But if not,} we can also retrieve it \coloremph{from third-party sources}: | |
\begin{itemize} | |
\item[\color{red}{\LARGE $\nRightarrow$}] {\footnotesize\texttt{http://my-organization.org/datasets/RA5AbX...}} | |
\item[\LARGE $\Rightarrow$] {\footnotesize\texttt{http://hashcache.org/object/RA5AbX...}} | |
\item[\LARGE $\Rightarrow$] {\footnotesize\texttt{http://artifact-archive.com/artifacts/RA5AbX...}} | |
\item[\LARGE $\Rightarrow$] {\footnotesize\texttt{http://nasty-server.com/no-need-to-trust-me/RA5AbX...}} | |
\end{itemize} | |
\vspace{-3mm} | |
\begin{center} | |
\scalebox{5.0}{\includegraphics[width=15mm]{servers.pdf}} | |
\end{center} | |
\vspace{-7mm} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Implementations} | |
\colorbold{(Partial) Implementations in:} | |
\begin{itemize} | |
\item \coloremph{Java} {\footnotesize(\url{https://github.com/trustyuri/trustyuri-java})} | |
\item \coloremph{Python} {\footnotesize(\url{https://github.com/trustyuri/trustyuri-python})} | |
\item \coloremph{Perl} {\footnotesize(\url{https://github.com/trustyuri/trustyuri-perl})} | |
\item \emph{more to come...} | |
\end{itemize} | |
\vfill | |
\colorbold{Functions:} | |
\begin{itemize} | |
\item \coloremph{General:} CheckFile, RunBatch | |
\item \coloremph{Module FA only:} ProcessFile | |
\item \coloremph{Module RA only:} TransformRdf, TransformLargeRdf, TransformNanopub, CheckLargeRdf, CheckSortedRdf, CheckNanopubViaSparql | |
\item \emph{more to come...} | |
\end{itemize} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Evaluation 1: Nanopubs} | |
\vspace{-5mm} | |
\begin{center} | |
\includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\includegraphics[width=4mm]{file.png} \includegraphics[width=3mm]{file.png} | |
\end{center} | |
\vspace{-3mm} | |
We took \coloremph{$\sim$150,000 nanopublications} from previous work, transformed them to \coloremph{different formats} (TriG, N-Quads, and TriX), and then generated trusty URIs for them. | |
\begin{itemize} | |
\item[$\Rightarrow$] \coloremph{For any given nanopub, the same trusty URI was generated for the different formats} | |
\end{itemize} | |
\par\bigskip | |
Then we \coloremph{checked these trusty URIs}, also for corrupted copies of the files (one random byte changed). | |
\begin{itemize} | |
\item[$\Rightarrow$] \coloremph{All non-corrupted files are successfully validated} | |
\item[$\Rightarrow$] \coloremph{All corrupted files either lead to errors or the validation fails} (except for $<$1\% harmless cases in TriX format where the changed byte is not part of the RDF content) | |
\item[$\Rightarrow$] \coloremph{Checking with Java in batch mode takes 0.001s per nanopub} | |
\end{itemize} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Evaluation 2: Bio2RDF} | |
\vspace{-5mm} | |
\begin{center} | |
\includegraphics[width=12mm]{file.png} | |
\includegraphics[width=22mm]{file.png} | |
\includegraphics[width=8mm]{file.png} | |
\includegraphics[width=15mm]{file.png} | |
\end{center} | |
\vspace{-3mm} | |
To evaluate our approach on \coloremph{larger files}, we transformed and checked 858 RDF files from Bio2RDF. | |
\begin{itemize} | |
\item File sizes ranging from 1.4kB to 177GB | |
\item[$\Rightarrow$] \coloremph{Files smaller than 10MB require less than 3 seconds to be transformed or checked} | |
\item[$\Rightarrow$] \coloremph{Large files of 2GB require $\sim$5min to be transformed and $\sim$2min to be checked} | |
\item[$\Rightarrow$] \coloremph{Largest file of 177GB (much larger than memory) required 29h to be transformed and 3h to be checked} | |
\end{itemize} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Make This a Community Effort} | |
\begin{center} | |
\scalebox{5.0}{\includegraphics[width=12mm]{community.pdf}} | |
\end{center} | |
\vfill | |
\colorbold{Code on GitHub:} \url{https://github.com/trustyuri/} | |
\bigskip\par | |
\colorbold{Permissive Open Source License} | |
\bigskip\par | |
\colorbold{Open Development:} Let us know if you want to be involved! | |
\bigskip\par | |
\colorbold{Wiki} (including wish list):\\ | |
\url{https://github.com/trustyuri/trustyuri/wiki} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Conclusions and Future Work} | |
\colorbold{Contribution:} | |
\begin{itemize} | |
\item \coloremph{Unambiguous URI references} for verifiable, immutable, and permanent digital artifacts | |
\item Proposal of a central \coloremph{technical pillar} of the (semantic) web | |
\item In particular for scientific data, where \coloremph{provenance and verifiability} are crucial | |
\end{itemize} | |
\par\bigskip | |
\colorbold{Planned usage:} | |
\begin{itemize} | |
\item Next version of \coloremph{Bio2RDF} | |
\item Nanopublications for \coloremph{neXtProt} (currently $\sim$20 million nanopubs) | |
\item \coloremph{Nanopub server} (for publishing and archiving nanopubs) | |
\end{itemize} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Thank you for your Attention!} | |
\vfill | |
\vfill | |
\begin{center} | |
\Large Twitter: \coloremph{\texttt{@txkuhn}} and \coloremph{\texttt{\#eswc2014Kuhn}} | |
\par\medskip | |
Web: \coloremph{\url{http://trustyuri.net}} | |
\end{center} | |
\vfill | |
\end{frame} | |
\appendix | |
\newcounter{finalframe} | |
\setcounter{finalframe}{\value{framenumber}} | |
\begin{frame} | |
\frametitle{Some Additional Slides Follow...} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Related Approaches} | |
\colorbold{Git} inspired the design of trusty URIs: Git refers to \coloremph{commits} by hash values calculated in a recursive way. | |
\par\bigskip | |
\colorbold{Named Information (ni) URIs:} | |
\par\smallskip | |
{~~~~~\footnotesize\url{ni:///sha-256;UyaQV-Ev4rdLoHyJJWCi11OHfrYv9E1aGQAlMO2X_-Q}} | |
\par\smallskip | |
(Trusty URIs can be mapped to ni-URIs.) | |
\vfill | |
What is \colorbold{missing} in these approaches: | |
\begin{itemize} | |
\item Digital artifacts on a \coloremph{more abstract level than byte sequences} | |
\item Support for \coloremph{self-references} | |
\end{itemize} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Skolemization of Blank Nodes} | |
The hash also helps us to \coloremph{solve the problem of blank nodes} for canonicalization of RDF content: We use the hash to \coloremph{skolemize blank nodes}: | |
\medskip\par | |
{\footnotesize | |
\texttt{http://foo.org/r3.RACjKTA5dl23ed7JIpgPmS0E0dcU-XmWIBnGn6Iyk8B-U\#\_1}\\ | |
\texttt{http://foo.org/r3.RACjKTA5dl23ed7JIpgPmS0E0dcU-XmWIBnGn6Iyk8B-U\#\_2}\\ | |
\texttt{...} | |
} | |
\medskip\par | |
These URIs are guaranteed to \coloremph{have never been used before} (except possibly for exactly the same content). | |
\end{frame} | |
\begin{frame} | |
\frametitle{Performance for Nanopubs in Batch Mode} | |
\begin{center} | |
\includegraphics[width=\textwidth]{performance.png} | |
\end{center} | |
\end{frame} | |
\begin{frame} | |
\frametitle{Performance for Large Files (Bio2RDF)} | |
\begin{center} | |
\begin{minipage}{10cm} | |
\includegraphics[trim=15mm 11.5mm 15mm 3mm, clip=true, width=\textwidth]{bio2rdf-transform.pdf}\\ | |
\includegraphics[trim=15mm 0mm 15mm 6mm, clip=true, width=\textwidth]{bio2rdf-check.pdf} | |
\end{minipage} | |
\end{center} | |
\end{frame} | |
\setcounter{framenumber}{\value{finalframe}} | |
\end{document} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment