- Which numbers are good / To trust or not / Take a closer look / Graphs and charts
YouTube: https://www.youtube.com/watch?v=OSGv2VnC0go
Note: This was 2013, so exercise discretion
- Use reversed(list) for iterating backwards over a list
- Use enumerate(list) to get (index,item)
- Use zip(lis1, list2) to iterate over lists in parallel (item1,item2) - but in memory, so use izip instead
- iter(func, bk) runs func repeatedly until bk is encountered
- for can have an else associated with it (remember as nobreak), which is executed if the for loop completed successfully without a break
This is a long week
- Mismatch: Large and unstructured data; lots of reads and writes; sometimes write heavy; foreign keys rarely needed; joins infrequent
- Needs: Speed, avoid SPoF, minimize total cost of operation, fewer sysadmin, incremental scalability, scale out(not up)
- Allows for easy sharing and discovery of files (famously music files) in a P2P manner
- Large number of client systems (peers) and few servers. Peers store the files themselves while servers store directory information
- On upload, clients connect to the server and upload list of files to share. Server stores info such as file metadata and client IP. On search, client requests server, which sends the info it has. Requesting client pings each host in the list to find transfer rates and present to user. On download, client fetches file directly from hosting node
- TCP for communication
- Servers use ternary tree for storage/search
Sending messages to multiple nodes in a network
Requirements: Fault tolerance + Scalability
Issues: Nodes may crash, packets may be dropped, network delays, many nodes
Approaches: Centralized, Tree based, or...
Periodically, a node sends out the message to b random other nodes. Each of the b nodes does the same.
Working definition: Cloud = lots of storage + compute cycles nearby
My favorite, from Wikipedia:
Cloud computing is an information technology (IT) paradigm that enables ubiquitous access to shared pools of configurable system resources and higher-level services that can be rapidly provisioned with minimal management effort, often over the Internet
A single-site cloud (datacenter) consists of:
- Compute nodes, grouped into racks
A distributed streaming platform
Horizontally scalable, fault-tolerant message system. Allows multiple publishers to write to, and multiple subscribers to read from. Each server in a kafka cluster is a broker.
Publishers write to topics, which consumers subscribe to. Topics are split into partitions, which are distributed across the cluster. All partitions maintain same ordering of events. Messages are retained in the system for a configurable amount of time and are persisted to disk.
Consumers groups are servers that perform the same functionality (More servers for horizontal scale). Each message in a topic is load balanced to the servers in a subscribed consumer group, just like a message queue. Each message is delivered to every subscribed consumer group, just like pub/sub.
Documentation here
Predictable and repeatable deployments
AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion.
Template is a json format file that specifies the resources and their architecture.