Skip to content

Instantly share code, notes, and snippets.

@nikomatsakis
Created January 29, 2019 19:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nikomatsakis/5b119a71465549b61743e8739a369b5e to your computer and use it in GitHub Desktop.
Save nikomatsakis/5b119a71465549b61743e8739a369b5e to your computer and use it in GitHub Desktop.

Salsa In More Depth (2019.01)

#[salsa::query_group(InputsStorage)]
pub trait MyQueryGroup {
  #[salsa::input] // `set_manifest` is auto-generated
  fn manifest(&self) -> Manifest;

  #[salsa::input] // `set_source_text` is auto-generated
  fn source_text(&self, name: String) -> String;

  fn ast(&self, name: String) -> Ast;

  fn whole_program_ast(&self) -> Ast;
}

fn ast(db: &dyn MyQueryGroup, name: String) -> Ast {
  let source_text: String = db.source_text(name);
  
  // do the actual parser on `source_text`
  
  return ast;
}

fn whole_program_ast(db: &dyn MyQueryGroup, name: String) -> Ast {
  let mut ast = Ast::default();
  for source_file in db.manifest() {
    let ast_source_file = db.ast(name);
    ast.extend(ast_source_file);
  }
  return ast;
}

Database storage

  • Database has internally a shared storage with:
    • revision (R0)
    • one struct per query group
      • one map per query
  • db.set_input_file(``"``a.rs``"``, "``fn main() { }``"``)
    • storing into the input_file map with the key a.rs and the value given
      • store the (new) revision R1
    • increment the revision (R1)
  • db.set_input_file("b.rs", "fn foo() { }")
    • go to revision R2
    • setters are &mut self
  • db.type_check() // &self

Query storage

  • Input query:

    • (Key) → (Value, Revision)

      db.source_text("a.rs"): // an input

      • record source_text("a.rs") as a dependency of the currently active query
      • look it in up the hash map and clone the result
        • and return the associated revision
  • Derived query:

    • (Key) → (Value, Dependencies, verified_at: Revision, changed_at: Revision)

      db.ast("a.rs"):

      • check for cycle -- panic
      • record ast("a.rs") as a dependency of the currently active query
      • we will check to see if we have a memoized value already
        • check if the verified_at field is equal to the current revision
          • if so, we will clone that value and return it
        • for all dependency d we had before, is changed(d) < verified_at
          • if so, we can update verified_at to the current revision
      • if there is no memoized value:
        • push a fresh record onto the "Currently active query" stack
        • invoke the function let v = ast(self, "a.rs".clone())
          • track the dependencies as ast executes
        • pop off the record, extract the dependencies from it
          • dependencies (Vec<DatabaseKey>) would be [source_text("a.rs")]
          • track max changed revision
        • store v into the map with the key a.rs
          • value: v
          • dependencies
          • changed_at: R2
          • verified_at: R2

      impl MyQueryGroup for T { fn ast(&self, a: str) { ... sometimes invoke the free fn ast(self, key) .. } }

      db.set_source_text("a.rs", "fn main() {}") // "a.rs" -> ("...", R1), now db is in R1 db.set_source_text("b.rs", ..) // "b.rs" -> ("...", R2), now db is in R2 db.ast("a.rs") // (verified_at: R2, changed_at: R1, deps: [source_text(a.rs)]) db.set_source_text("b.rs", ..) // "b.rs" -> ("...", R3), now db is in R3 db.ast("a.rs") // (verified_at: R3, changed_at: R1, deps: [source_text(a.rs)]) // on entry: // (verified_at: R2, changed_at: R1, deps: [source_text(a.rs)]) // iterate over the dependencies: // source_text(a.rs) -- the most recent version is R1 // nothing changed that affects us, so we just update verified_at to R3 // // (verified_at: R3, changed_at: R1, deps: [source_text(a.rs)]) db.set_source_text("a.rs", "fn main() { }") // "a.rs" -> ("...", R4), now db is in R4 db.ast("a.rs") // on entry: // (verified_at: R3, changed_at: R1, deps: [source_text(a.rs)]) // iterate over the dependencies: // source_text(a.rs) -- the most recent version is R4 <-- did change since we last verified // let old_value = /* the old ast */ // let new_value = re-execute the ast method
      // if old_value == new_value { // update verified_at but leave changed_at alone // (verified_at: R4, changed_at: R1, deps: [source_text(a.rs)]) // } else { // update both verified_at and changed_at to current revision // }

second generation derived queries and backdating

db.set_source_text("a.rs", "fn main() {}") // "a.rs" -> ("...", R1), now db is in R1
db.set_source_text("b.rs", ..) // "b.rs" -> ("...", R2), now db is in R2
db.whole_program_ast() // will execute (never invoked before):
  - db.manifest()
  - db.ast("a.rs")
  - db.ast("b.rs")
  - (verified_at: R2, changed_at: R2, deps: [manifest(), ast(a.rs), ast(b.rs)])

db.set_source_text("a.rs", "fn main() { }") // "a.rs" -> ("...", R4), now db is in R4
db.whole_program_ast() // can we reuse the result?
  - iterate over the deps list:
    - manifest(): this has not changed since R2, this is fine
    - ast(a.rs):
      - "have you changed since R2"?
      - look at its own input, determine they have changed
      - re-execute and produce a new ast
      - compare the old and new ast -- see that they are the same
      - "no, I have not changed" -- changed_at value is <= R2
    - ast(b.rs)
      - no input has changed since R2
      - trivially still valid
  - the value is still value, and we can just update `verified_at`

power of backdating

Strategies for re-use

enum Ast {
  Module(Vec<Ast>)
}

enum Ast {
  Module(Vec<AstId>)
}

// AstId becomes effectively a "path"

fn ast_for_id(AstId) -> Ast

mod foo { // value for "foo" `vec![foo::Bar]`
  struct Bar { ... } // has id `foo::Bar` // value for foo::Bar` would be the fields
}


fn main() {
  let x = 22;
  println();
  x.foo();
}

Questions for the future

  • How salsa works internally
  • What makes a good key/value type
    • why we use Seq and Text instead of Vec and String
    • interning — what is that?
      • indexmap — indexable hashmap
      • gives us an integer that represents the larger structure
  • Strategies for re-use
  • On-demand thinking
  • Parallel patterns
  • Cancellation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment