heartpunk/papers.md

## papers.md

      
    Raw
  

              papers.md
            
          
    Database Cracking (link)

This paper explores the possibility of optimizing the accessibility of data as it is read, rather than when it is written. It's a pretty great concept, and at least in their benchmarks it performs well. There are some serious caveats that are skimmed over in the paper, but it's an inspiringly original take, nonetheless.
Dremel: Interactive Analysis of Web-Scale Datasets (link)

This is what it says on the tin. The novel things in this paper are a deterministic, fast algorithm for mapping sparse, deeply nested, arbitrarily repeated, data to a seemingly tabular layout. Additionally, the overall architecture of the system seems to be a novel way of combining existing concepts from the search and database fields. Particularly important is the ability to consider a query finished when some user defined percentage of the data has been received, to limit the effect of straggling nodes in a highly distributed system.
A Relational Model for Large Shared Data Banks (link)

This is the original paper on SQL (or as close to it as I could find). It explores why pre-relational databases were insufficient, and how the relational model fixes these issues. In particular, the nested or side-by-side tree paradigm combined with a lack of query language means that every "query" must be implemented procedurally in a way that fully couples query to data layout. As well, the concept of joins and normalization are introduced in this paper. Originally, the idea was to have "complex domains", or relations between relations and other things. Codd shows how you can avoid the potential problems this may imply for storage by introducing joins.
An Overview of Query Optimization in Relational Systems (link)

This paper was particularly fun to read. It's about how the problems query planners face, and what approaches have been taken over the years. The paper starts with describing a simplified version of the linear join tree centric approach of System R, then fleshes that out a bit more. Then, it goes into extensible query planners. There's more to this paper, but these are the areas I focused on.
The Transaction Concept (link)

This paper describes what transactions are, provides motivation as to why they matter, describes a few ways they can be implemented, and then shows how we can go further. What's remarkable is that this paper is from 1985, and the challenges laid out by Gray are still, to the best of my knowledge, open problems. In particular, he shows the real-world utility of support for nested and long lived transactions. It's reasonable to forgive Gray for not mentioning at all the issues of distributed systems in this context, as this has become more of a practical concern only in in relatively recent times.