Skip to content

Instantly share code, notes, and snippets.

@rmoehn
Last active August 29, 2015 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rmoehn/9627330 to your computer and use it in GitHub Desktop.
Save rmoehn/9627330 to your computer and use it in GitHub Desktop.
Manuscripts for oral exam in distributed systems
01 Remote Procedure Call + RMI
------------------------------
Motivation and Idea
- server offers some service, client wants to use it
- traditional: socket connection to server, use some text-oriented
protocol with requests and responses (fx HTTP) – PIC!
- idea: want to use the the server's service just as if it was another
library; function call whose body is executed remotely – PIC!
Mode of Operation
- client calls function (client stub), communicates with server stub,
executes body code, responds - PIC!
- arguments and return values have to be made available to the two
parties
- call by value – cannot modify original object
- call by reference – can modifiy original object
- call by copy/restore
- evtl. synch/asynch RPC, Marshalling with problems
Concrete Implementation: Java RMI
- Remote objects and stubs: PIC!
- call by value: Serializable
Finish: Just one implementation. Principles presented pefore also
implemented by many other programming languages. In modern times using
JSON-RPC, fx.
02 Naming
---------
Necessity
- have multiple entities, fx machines
- want to refer to them, fx because only one of them has mail server
- has to have a name – naming system to get from the name to the entity
- very important for all computer systems
Types
- flat naming:
- just bits, no location information
- 2620:0:862:ed1a::1
- structured naming:
- names composed of human-readable names, often with location
information
- de.wikipedia.org
- attributed-based naming
- giving properties an entity should have, name system decides
- /C=DK/O=Aarhus Universitet/OU=Comp. Sc./CN=Web server
Flat Naming
Structured naming
Attribute-based Naming
03 Clock Synchronisation
------------------------
Problematic Time
- need time information in many places: ordering of events, measurements
– timestamps, expiry of permissions, real-time guarantees
- easy in centralised systems (one machine): one time source
- central time source too slow in distributed systems ⇒ time source on
every machine
Time Sources
- official time: UTC, coming from atomar clocks
- obtainable by DCF77, GPS, person
- set time on computer, keeps it using a quartz and some electronics
- not absolutely precise (plus leap second-agnostic): increasing
difference with UTC, also between machines
- resynchronisation when difference becomes too big
Berkeley Algorithm
- PIC!
- only group agrees on time – no global source
Cristian's Algorithm and NTP
- ask some time server for the current time – Cristian's algorithm for
getting a good time value in spite of network and processing delays
- time servers organised in strata: if one from a lower stratum adjusts
its time to one of stratum n, becomes of stratum n-1 ⇒
estimate/maintain quality of time information
- evtl. present Cristian's algo with PIC!
Finish: Nice for giving machines approximately the same time, but
complete synchrony impossible. Necessary for ordering. ⇒ logical clocks
05 Mutual Exclusion and Election
--------------------------------
Purpose
- two rather different things
- mutual exclusion: when multiple machines want to access a resource -
in general only one at a time allowed
- election: when a network has a central coordinator who fails, ways to
find a new one
Approaches to Mutual Exclusion
- easy on one machine: obtain a lock from the OS
- application to distributed systems: ask central authority for
permission – gives it to only one process at a time – PIC!
- pro: very simple (BIG advantage)
- con: coordinator is single point of failure, performance bottleneck,
delayed response ambiguous
- decentralised algorithm: n-times replicated resource, each with
coordinator. Ask them for their votes. Access when more than half vote
OK, no access when NOTOK.
- pro: coordinators are allowed to fail
- con: "only" probabilistically correct, bad resource utilisation when
many requests
- distributed algorithm: Send request to all others. Response depending
on: not interested, interested, too, holds resource
- pro: distributed (!), good for you in an abstract way like eating
spinach and learning Latin
- con: needs clock, multiple points of failure (can be repaired),
needs group administration, multiple bottlenecks (can be repaired)
- token ring: Token send around logical ring of processes. Only who has
token is allowed to access resource.
- pro: quite simple
- con: detection of token loss difficult, regenerating token difficult
Election
- find a new coordinator when the old one has died, fx for generating
new token in token ring
- bully, ring, wireless
- evtl. explain that stuff
Finish: Presented two things, mutual exclusion and election. No really
good solutions for the former, good solutions for the latter, but a bit
boring.
06 Data-centric Consistency
---------------------------
Motivation
- replication: want data in different places for reliability and
performance, concurrent access: can lead to inconsistencies – PIC!
- having the same data at all places at all time (total consistency) is
too demanding, undermines purpose of replication
- loosen consistency in well-defined ways: consistency models
Consistency Models Overview
- continuous consistency: metrics for (in)consistency, take action when
degree of inconsistency too high
- consistent ordering: extra restrictions on the order in which
tentative updates updates are commited at the replicas
- sequential consistency: all replicas see the same order of writes
and reads
- causal consistency: all replicas see the causally related writes in
the same order
- entry consisitency: bracket access by locks, bring data up to date
when someone acquires lock on it
Implementation – Example
- quorum-based protocols: Old, but applies principle often used in
distributed systems: voting. Interesting, because not probabilistic,
but always correct and terminating.
- Gifford/Thomas: When writing, ask N/2 + 1 servers for permission and
increment version number. When reading, read file from N/2 + 1
servers. – Ok when all version numbers the same. – PIC!
- Gifford: use other numbers than N/2 + 1, ensure overlap – PIC!
Finish: Many other protocols. (Remember primary-based.)
07 Client-centric Consistency
-----------------------------
- replication: want data in different places for reliability and
performance, clients access same data at different replicas – PIC!
- having the same data at all places at all time (total consistency) is
too demanding
- loosen consistency in well-defined ways: consistency models
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment