yuanqili/notes.md

## notes.md

      
    Raw
  

              notes.md
            
          
    9618 Notes

Data representation

File organization

Serial file organization

The serial file organization method physically stores records of data in a file, one after another, in the order they were added to the file. New records are appended to the end of the file. Serial file organization is often used for temporary files storing transactions to be made to more permanent files.
For example, storing customer meter readings for gas or electricity before they are used to send the bills to all customers. As each transaction is added to the file in the order of arrival, these records will be in chronological order.
Sequential file organization

The sequential file organization method physically stores records of data in a file, one after another, in a given order. The order is usually based on the key field of the records as this is a unique identifier.
For example, a file could be used by a supplier to store customer records for gas or electricity in order to send regular bills to each customer. All records are stored in ascending customer number order, where the customer number is the key field that uniquely identifies each record. New records must be added to the file in the correct place.
Random file organization

The random file organization method physically stores records of data in a file in any available position. The location of any record in the file is found by using a hashing algorithm on the key field of a record.
Network

Packet switching vs. circuit switching

Packet switching

Description

Packet switching is a method of transmission in which a message is broken up into a number of packets that can be sent independently to each other from start point to end point. The data packets will need to be reassembled into their correct order at the destination.

Each packet follows its own path.
Routing selection depends on the number of datagram packets waiting to be processed at each node (router).
The shortest path available is selected.
Packets can reach the destination in a different order to that in which they are sent.

Pros


No need to tie up a communication line.
It is possible to overcome failed or faulty lines by simply re-routing packages.
Circuit switching charges the user on the distance and duration of a connection, but packet switching charges users only for the duration of the connectivity.
High data transmission is possible with packet switching.
Packet switching always uses digital networks which means digital data is transmitted directly to the destination.
Packets can be rerouted if there are problems.
More secure as harder to intercept messages.

Cons


The protocols for packet switching can be more complex than those for circuit switching.
If a packet is lost, the sender must re-send the packet (which wastes time).
Does not work well with realtime data streams.
The circuit/channel has to share its bandwidth with other packets.
There is a delay at the destination while packets are reassembled.
Needs large amounts of RAM to handle the large amounts of data.

Circuit switching

Description

Circuit switching uses a dedicated circuit which lasts throughout the connection: the communication line is effectively 'tied up'. When sending data across a network, there are three stages:

First, a circuit/channel between sender and receiver must be established.
Data transfer then takes place (which can be analogue or digital); transmission is usually bidirectional.
After the data transfer is complete, the connection is terminated.

Pros


The circuit used is dedicated to the single transmission only.
The whole of the bandwidth is available.
The data transfer rate is faster than with packet switching.
The packets of data (frames) arrive at the destination in the same order as they were sent. There is no need to reassemble packets.
A packet of data cannot get lost since all packets follow on in sequence along the same single route.
It works better than packet switching in realtime applications.
Less secure as only one route used.

Cons


It is not very flexible (e.g., it will send empty frames and it has to use a single, dedicated line).
Nobody else can use the circuit/channel even when it is idle
The circuit is always there whether or not it is used.
If there is a failure/fault on the dedicated line, there is no alternative routing available.
Dedicated channels require a greater bandwidth.
Prior to actual transmission, the time required to establish a link can be long.

Application


Public/private telephone networks
Private data networks

Routing

Routing tables contain the information necessary to forward a package along the shortest/best route to allow it to reach its destination. As soon as the packet reaches a router, the packet header is examined and compared with the routing table. The table supplies the router with instructions to send the packet (hop) to the next available router. Routing tables include

number of hops
MAC address of the next router where the packet is to be forwarded to (hopped)
metrics (a cost is assigned to each available route so that the most efficient route/path is found)
network destination (network ID) or pathway
gateway (the same information as the next hop; it points to the gateway through which target network can be reached)
netmask (used to generate network ID)
interface (indicates which locally available interface is responsible for reaching the gateway)

Protocols

The need for protocols

When communicating over networks, it is essential that some form of protocol is used by the sender and receiver of the data. Both parties need to agree the protocol being used to ensure successful communication takes place.
TCP/IP protocol suite

Application layer

The application layer contains all the programs that exchange data, such as web browsers or server software; it sends files to the transport layer. This layer allows applications to access the services used in other layers and also defines the protocols that any app uses to allow the exchange of data.
Transport layer

The transport layer regulates the network connections; this is where data is broken up into packets which are then sent to the internet/network layer (IP protocol). The transport layer ensures that packets arrive in sequence, without errors, by swapping acknowledgements and retransmitting packets if they become lost or corrupted. The main protocols associated with the transport layer are transmission control protocol (TCP) and user datagram protocol (UDP).
Internet layer

The internet layer identifies the intended network and host. The common protocol is IP (internet protocol).
Link layer

The link layer identifies and moves traffic across local segments, encapsulates IP packets into frames for transmission, maps IP addresses to MAC (physical) addresses and ensures correct protocols are followed. The physical network layer specifies requirements of the hardware to be used for the network. The link layer identifies network protocols in the packet header (TCP/IP in the case here) and delivers packets to the network.
Application layer protocols

HTTP: hypertext transfer protocol

The HTTP protocol underpins the world wide web. It is used when, for example, fetching an HTML document from a web server. This makes use of hyperlinks (rules for the transferring of data over the internet).
HTTP is a client/server protocol: request messages are sent out to the web servers which then respond.
HTTP protocols define the format of the messages sent and received. The web browser (which is part of the application layer) initiates the web page request and also converts HTML into a format which can be displayed on the user's screen or can be played through their media player.
FTP: file transfer protocol

The FTP file transfer protocol (FTP) is a network protocol used when transferring files from one computer/device to another via the internet or other networks. It is similar to HTTP and SMTP, but FTP’s only task is the application protocol for the transfer of files over a network. Web browsers can be used to connect to an FTP address in a way similar to HTTP, for example, ftp://username@ftp.example.gov/.

anonymous FTP – this allows a user to access files without the need to identify who they are to the ftp server; for example, ‘331 Anonymous access allowed’ would be a message received to confirm anonymous access
FTP commands – a user is able to carry out actions that can change files stored on the ftp server; for example, delete, close, rename, cd (change directory on a remote machine), lcd (change directory on a local machine)
FTP server – this is where the files, which can be downloaded as required by a user, are stored.

SMTP: simple mail transfer protocol

Simple mail transfer protocol (SMTP) is a text-based (and connection-based) protocol used when sending emails. It is sometimes referred to as a push protocol (in other words, a client opens a connection to a server and keeps the connection active all the time; the client then uploads a new email to the server).
POP3 and IMAP

POP Post office protocol (POP3/4) and internet message access protocol (IMAP) are protocols used when receiving emails from the email server. These are known as pull protocols (the client periodically connects to a server; checks for and downloads new emails from the server – the connection is then closed; this process is repeated to ensure the client is updated). IMAP is a more a recent protocol than POP3/4, but both have really been superseded by the increasing use of HTTP protocols. However, SMTP is still used when transferring emails between email servers.
BitTorrent

The BitTorrent is a protocol which is based on the peer-to-peer networking concept. This allows for very fast sharing of files between computers (known as peers). While peer-to-peer networks only work well with very small numbers of computers, the concept of sharing files using BitTorrent can be used by thousands of users who connect together over the internet. Because user computers are sharing files directly with each other (rather than using a web server) they are sharing files in a way similar to that used in a peer-to-peer network; the main difference is that the BitTorrent protocol allows many computers (acting as peers) to share files.

Swarm – a group of peers connected together is known as a swarm; one of the most important facts when considering whether or not a swarm can continue to allow peers to complete a torrent is its availability; availability refers to the number of complete copies of torrent contents that are distributed amongst a swarm. Note: a torrent is simply the name given to a file being shared on the peer-to-peer network.
Seed – a peer that has downloaded a file (or pieces of a file) and has then made it available to other peers in the swarm.
Tracker – this is a central server that stores details about other computers that make up the swarm; it will store details about all the peers downloading or uploading the file, allowing the peers to locate each other using the stored IP addresses.
Leech – a peer that has a negative impact on the swarm by having a poor share ratio, that is, they are downloading much more data than they are uploading to the others; the ratio is determined using the formula:
$$
\text{ratio} = \dfrac{
\text{amount of data the peer has uploaded}
}{
\text{amount of data the peer has downloaded}
}
$$
If the ratio is greater than one then the peer has a positive impact on the swarm; otherwise, the peer has a negative effect on the swarm.

Processor

CISC processors

CISC processor architecture makes use of more internal instruction formats than RISC. The design philosophy is to carry out a given task with as few lines of assembly code as possible. Processor hardware must therefore be capable of handling more complex assembly code instructions. Essentially, CISC architecture is based on single complex instructions which need to be converted by the processor into a number of sub-instructions to carry out the required operation.
RISC processors

RISC processors have fewer built-in instruction formats than CISC. This can lead to higher processor performance. The RISC design philosophy is built on the use of less complex instructions, which is done by breaking up the assembly code instructions into a number of simpler single-cycle instructions. Ultimately, this means there is a smaller, but more optimized set of instructions than CISC.


CISC features
RISC features


Many instruction formats are possible
Uses fewer instruction formats/sets


There are more addressing modes
Uses fewer addressing modes


Makes use of multi-cycle instructions
Makes use of single-cycle instructions


Instructions can be of a variable length
Instructions are of a fixed length


Longer execution time for instructions
Faster execution time for instructions


Decoding of instructions is more complex
Makes use of general multi-purpose registers


It is more difficult to make pipelining work
Easier to make pipelining function correctly


The design emphasis is on the hardware
The design emphasis is on the software


Uses the memory unit to allow complex instructions to be carried out
Processor chips require fewer transistors


Pipelining

Pipelining allows several instructions to be processed simultaneously without having to wait for previous instructions to be completed. The execution of a given instruction is split into its five stages

IF instruction fetch cycle
ID instruction decode cycle
OF operand fetch cycle
IE instruction execution cycle
WB writeback result process

Interrupt

With pipelining, as the interrupt is received, there could be a number of instructions still in the pipeline. The usual way to deal with this is to discard all instructions in the pipeline except for the last instruction in the write-back (WB) stage. The interrupt handler routine can then be applied to this remaining instruction and, once serviced, the processor can restart with the next instruction in the sequence.
Parallel computing

SISD, SIMD, MISD, MIMD

SISD: single instruction single data

SISD (single instruction single data) uses a single processor that can handle a single instruction and which also uses one data source at a time. Each task is processed in a sequential order. Since there is a single processor, this architecture does not allow for parallel processing. It is most commonly found in applications such as early personal computers.
SIMD: single instruction multiple data

SIMD (single instruction multiple data) uses many processors. Each processor executes the same instruction but uses different data inputs – they are all doing the same calculations but on different data at the same time.

For example, suppose the brightness of an image made up of 4000 pixels needs to be increased. Since SIMD can work on many data items at the same time, 4000 small processors (one per pixel) can each alter the brightness of each pixel by the same amount at the same time. This means the whole of the image will have its brightness increased consistently.


Other applications include sound sampling – or any application where a large number of items need to be altered by the same amount (since each processor is doing the same calculation on each data item).

MISD: multiple instruction single data

MISD (multiple instruction single data) uses several processors. Each processor uses different instructions but uses the same shared data source. MISD is not a commonly used architecture (MIMD tends to be used instead). However, the American Space Shuttle flight control system did make use of MISD processors.
MIMD: multiple instruction multiple data

MIMD (multiple instruction multiple data) uses multiple processors. Each one can take its instructions independently, and each processor can use data from a separate data source (the data source may be a single memory unit which has been suitably partitioned). The MIMD architecture is used in multicore systems (for example, by super computers or in the architecture of multi-core chips).
Parallel computer systems

Large number of computers/processors

working collaboratively on the same program
working together simultaneously on the same program
communicating via a messaging interface using network connections

Virtual machine

Guest operating system


This is the OS running in a virtual machine.
It controls the virtual hardware during the emulation.
This OS is being emulated within another OS (the host OS).
The guest OS is running under the control of the host OS software.

Host operating system


This is the OS that is controlling the actual physical hardware.
It is the normal OS for the host/physical computer.
The OS runs/monitors the virtual machine software.

Benefits


The guest OS hosted on a virtual machine can be used without impacting anything outside the virtual machine; any other virtual machines and host computer are protected by the virtual machine software.
It is possible to run apps which are not compatible with the host computer/OS by using a guest OS which is compatible with the app.
Virtual machines are useful if you have old/legacy software which is not compatible with a new computer system/hardware. It is possible to emulate the old software on the new system by running a compatible guest OS as a virtual machine. For example, the software controlling a nuclear power station could be transferred to new hardware in a control room – the old software would run as an emulation on the new hardware (justifying the cost and complexity issues – see Limitations).
Virtual machines are useful for testing a new OS or new app since they will not crash the host computer if something goes wrong (the host computer is protected by the virtual machine software).

Limitations


You do not get the same performance running as a guest OS as you do when running the original system.
Building an in-house virtual machine can be quite expensive for a large company. They can also be complex to manage and maintain.

Functions of virtual machine software


Create/delete/manage virtual machine
Translate instructions used by guest operating system to that required by host operating system
Hardware emulation
Protecting each virtual machine
CISC features	RISC features
Many instruction formats are possible	Uses fewer instruction formats/sets
There are more addressing modes	Uses fewer addressing modes
Makes use of multi-cycle instructions	Makes use of single-cycle instructions
Instructions can be of a variable length	Instructions are of a fixed length
Longer execution time for instructions	Faster execution time for instructions
Decoding of instructions is more complex	Makes use of general multi-purpose registers
It is more difficult to make pipelining work	Easier to make pipelining function correctly
The design emphasis is on the hardware	The design emphasis is on the software
Uses the memory unit to allow complex instructions to be carried out	Processor chips require fewer transistors