Skip to content

Instantly share code, notes, and snippets.

@jberthold
Last active January 24, 2023 04:09
Show Gist options
  • Save jberthold/d6bf164d6cc9c53add267147d7a5db38 to your computer and use it in GitHub Desktop.
Save jberthold/d6bf164d6cc9c53add267147d7a5db38 to your computer and use it in GitHub Desktop.
SPLASH Day 4 Notes

Notes at SPLASH 2022 - 20221205-10 Auckland NZ

Thu 8 12 : Splash 22 main tracks

c.mann@regmaster.com

Keynote: Mary Shaw "What does it mean to be a programming language"

Mary Shaw: of SWE fame, foundational work in modularity and SW architecture

  • (this talk: reprise of a HOPL 22 talk/paper)

Intro

  • myths as a narrative to make sense of the human condition

  • myths in software

    • software = composing programs from spec in sound languages and verifying them, then composing components in an ordered manner

    • Actually programmers try to solve problems, and the "software" is just the code.

  • We should look at these phenomena with a clear mind and treat them just like our code

"Professional programmer" personas:

  • Frieda, iterating code-and-fix each year with a budget spreadsheet
  • Scientific Programmer: incrementally building an analytic program from stepwise application and inspection
  • Photoshop scripting: a "replay" is supported but is barely usable due to scope issues and environment effects

The code is the software:

  • coding is a very small part of programming

  • scripting and multi-language interfaces prevail

  • systems are "coalitions"

  • technology ecosystems exist (-> stackoverflow analysis screenshot) and are often dominated by non-rigorous centerpieces

  • Notation of SW system architecture not well-supported in languages

  • tools for "vernacular developers" are sparse and inadequate -> Screenshot about "low-code"

Programming Language Purity

  • "Mathematical tractability myth":

    • trade-off between generality and power (-> the power curve):
      • DSLs trade generality for power. Support for particular domain-specific tasks come as restrictions to what can be done.
  • About UML (as an expression of "mathematical tractability"!)

    • collection of notations, each had a purpose and a formal basis

      • turned out to just be incomplete
      • did not handle interactions between several notations
  • about badly designed languages -> Fred Brooks: "Design of Design" (new book from a few years back)

    • Brooks: "JCL [Job Control Language] was indeed a language, but not perceived as such by its designers"
  • Correctness myth:

    • "Software can be provably correct" and "specifications are sufficient, complete, static, purely functional"

    • In practice, correctness proofs are (usually) too expensive.

    • specifications are incomplete, "credentials" (what we know about the code) are incomplete and evolving. 'Non-functional properties' (should be called "extra-functional")

    • Correctness is replaced by "Fitness for a purpose"

      • real software is uncertain. Even verified modules execute in unverified environments.

      • software component versioning (in cloud), physical system environments, comprehension of requirements, ..

  • Specification myth

    • Foll specification is possible, and should come first

    • contrast with the scientific programming activity

      • spot checks, hardly-guided iteration of versions
    • "exploratory programming" (-> Donald Schoen)

The AI (or ML) myth

  • AI requires an entirely new programming paradigm (?)

    • in reality, AI experimentation often provided impulses to SWE, whose origins were forgotten
  • collection and curation of large datasets

    • ML dependence on data curation: incorrectness, format errors, inconsistency

    • data provenance techniques and processes are necessary

  • Things we say about AI but which apply to other software, too

    • understandability

    • non-determinism

    • lack of specification

    • unpredictable effects (hidden dependencies, feature interaction)

    • criteria for correctness

Wrap-up: Myths should inspire us, not hold us captive

Assurance session

C to checked C by 3C

(Aravind Machiri)

  • Checked-C's restricted pointer types characterising the pointed-at structure (array,singleton, null-terminated)

    • no pointer arithmetic allowed for singletons, automatic null check
    • array indexing not allowed unless bounds known, dynamic checks added when bound annotation present
    • null-terminated arrays have automatic bounds management, arithmetic is allowed (statically checked)
  • Other features: interface types, checked/unchecked scope, safe casts, ..

  • goal here: annotate automatically to avoid spending developer effort

  • inference of checked types and bounds, mostly automatic and asking for human help for hard cases

    • inference: interesting case of "wild" pointers, approach tries to localise their use
    • two constraint sets to do the two analyses (wild scope, checked annotation)
  • bounds inference: (-> correlation analysis)

    • associate all pointers with bounds from creation
    • propagate bounds to all use sites (transitively, pointer flow graph and scalar flow graph)
    • interesting case: context sensitivity (in structs)
  • Heuristics for indefinite cases

    • consistent upper bound (all uses have a common variable check)
    • name prefix
    • adjacent method parameter
  • for unsolved cases:

    • find root cause for wild pointer inference and ask human
    • tool provides an IDE integration to auto-apply fixes
  • Questions:

    • Perf. evaluation after conversion, overhead
      • no evaluation, but impact expected to be low (maybe 10% max)
    • what if heuristics yield wrong checks
      • a report is generated that can be inspected. a second run of the checker will detect wrong annotations.
    • runtime checks are inserted. does it lead to unwanted crashes?
      • yes. should probably be modified.
    • what if/when you cannot convert wild pointers?
      • major reason: macro expansion (considered wild).

Splash Onwards Essays

Essay about virtual vs "terrestrial" conferences

  • reporting about a "PLOP conference" (writing workshop) which is a particular forum in a dedicated place (like Dagstuhl seminars). Few conferences are like that.
  • Fair point, and apparently backed by research: unfamiliar environments instigate creative impulses.

Ole Lehrmann Madsen: What OO programming was supposed to be

  • Intro:

    • dominance of OO programming vs. blog posts etc (varying competence) making claims that it causes us to suffer
    • more fundamental critique: original ideas got diluted and neglected
    • Original goal: modelling, not inheritance (code reuse)
      • Beta language setting out to deliver both
      • explained time and again, even as easy as the kindergarten knowledge
      • yet many textbooks and toturials do not make it central, maybe not even explicit.
      • focus on code reuse is misguided/misguiding (meaning inheritance)
  • Benefit of modeling is lost when reuse dominates

    • typical mistakes include false inheritance (stack < vector)
    • methods being "virtual" became default in a few languages,
      • going fundamentally against the idea of modelling the same behaviours for objects of a family
      • provides maximum flexibility for code reuse
      • but does not enforce the idea of "same behaviour"
    • Alan Kay: essence of OO is messages between objects (inheritance secondary)
    • Programming is modelling
      • missing aspect of programming curriculum (and textbooks) today
  • phenomena vs concepts

    • "extends" extends the intensional aspect, explicitly not the extension

Keynote 2: Robert O'Callahan, The state of Debugging in 2022

Problem space, Historical development

  • 1987: breakpoints, source view, stepping, variable inspection
  • 2022: printf debugging prevails
  • What are the problems, why can't we solve them?

The problem with classical debugging

  • pause, inspect, continue loop

    • some software just cannot be stopped, or does unrealistic things when stopped
    • requires iterative re-runs to trace the root cause of wrong data
      • non-deterministic bugs are hard to observe in this way
    • record execution to enable revisiting identical state sequence
    • drawback: overhead, impractical to apply to production sized data, cannot be always-on
  • single program state display

    • vs. the need to understand change over time of values in the state
    • "singlestepping is always a sign of inadequate debugger functionality"
    • solution: query-based and omniscient debugging
      • collect detailed information about (all or) interesting data over time, and a way to present and query it (DB backed IDE)
      • upfront cost, unfamiliar interface,
  • Siloed debuggers

    • usually specific to a language, new interface must be learned
    • alleviated by IDE unified debugger interface presentation with plugins
    • multi-language projects are extremely hard to work with
  • debuggers are often an afterthought

    • tools lag behind the evolution of the language and ecosystem
    • examples: cloud-deployed apps, optimised code (and relying on optimisations), novel languages features, special build requirements
  • Cultural problems (reputation of debuggers)

    • debuggers often don't work, people give up on them

Could interactive debugging be bad inherently?

  • can we just use logging?
    • seems to work, and avoids some technical barriers
    • requires iterations adding log statements, potential non-determinism, problem of data volume, change of program behaviour
    • cannot track dataflow (only captures single states)
  • debugging is hard, so ..
    • writing small tests (requires mocking etc.)
    • disabling flaky tests instead of fixing => loss of coverage
    • better debuggers would address this

Where from here?

  • better understanding of the debugging process
    • time to understand the root cause and what/how tools help with
  • pervasive always-on recording (need to solve technical issues)
    • does not solve inherent non-determinism (data races)
  • better or broader?
    • "the PII problem": cannot debug real production data

ICFP papers 2020/2021

Certifying the Synthesis of Heap-Manipulating Programs

(Ilya Sergey)

  • using separation logic to express disjoint segments of heap

  • example: copying a singly-linked list

    • using Suslik (generates a program from a separation logic spec)
  • this paper: providing proof certificate derived from insights during synthesis

  • how Suslik works

    • -> "Synthetic separation logic": gradual reduction of a goal, emitting program parts as a by-product.
    • problem: cannot rely on program structure to leave post-condition unchanged (the program is being composed on the side)
      • if post-condition is transformed, "delay their insight"
    • using one of several possible Sep. Logic verifier lib.s in Coq.
  • Contributions: implemented an abstract proof eval framework, 3 instances, evaluation on characteristic benchmarks

  • example walk-through -> screenshots

    • different kinds of steps
    • delayed post-condition modifications
    • carrying a proof context to store unifications with variables

"Kindly bent to free us"

(Peter Thiemann)

  • prototype language, resource mgmt. using kinds and constrained types

  • concept: distinguish kinds of use (linear, affine, unrestricted) for variables.

    • automatically inferred during type inference, no annotations required
  • paper goal: automate parts of Rust's analysis (which is fully-manual by default)

  • (small) kind system with subkinding and polymorphism

  • functions and capture

    • captured linear variables -> function usage mode restricted to linear
  • "borrowing" concept, distinguishing reads and writes

    • borrows have a kind of their own. Shared borrows = unrestricted, exclusive borrows = affine
    • important distinction between consuming use and (exclusively) borrowed use
    • restrictions on simultaneous borrows (and direct use) and on scoping of borrows (which includes nesting, complicated)
    • uses sophisticated type simplifications to get rid of all subkinding constraints accumulated through region nesting and function usage restrictions
  • Questions:

    • Elaborate on similarities with and differences to Rust
      • all essential concepts from Rust are present
      • however, this work comes from a different background (FP)

"Stable relations and abstract interpretation of higher-order programs"

(Benoit Montagu)

  • "stable relations" as opposed to deriving outputs classically from given input

    • unnecessary inputs won't change anything (stability)
  • environment of a denotational semantic interpretation stores stable relations

    • compositional interpretation, result is stable
  • straightforward extensions for pairs and projections

  • interpretation of lambda terms

    • all-quantified over both all possible outcome pairs and all input subsumptions (to achieve stability)
  • implemented as a POC for stable relations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment