Skip to content

Instantly share code, notes, and snippets.

@jwasinger
Created May 6, 2016 22:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jwasinger/66937176c3240f1cab7d38ee982ae562 to your computer and use it in GitHub Desktop.
Save jwasinger/66937176c3240f1cab7d38ee982ae562 to your computer and use it in GitHub Desktop.
\documentclass[10pt,draftclsnofoot,onecolumn]{IEEEtran}
\usepackage{graphicx}
\graphicspath{ {/} }
\title{Random Network Coding}
\author{Jared Wasinger, Chris Kennard, Hampus Hammarlund}
\begin{document}
\maketitle
\begin{abstract}
The purpose of this document is to discuss the progress made by the group during Winter 2016. In this document the state of the four main aspects of this project are explored in depth: encoding/decoding, encoding server, media server, and client. For each of the four parts, an explanation of where we are in development is given along with a view of what still needs to be done. Additionally this document also highlights crucial fragments of code from the encoding and decoding algorithm.
\end{abstract}
% Turn off page numbering.
\pagenumbering{gobble}
\newpage
% Turn page numbering back on.
\pagenumbering{arabic}
\section{Introduction}
A content delivery network is a "system of distributed servers that deliver web-pages and other Web content to a user based on the geographic locations of the user, the origin of the webpage and a content delivery server." Modern content delivery networks use a variety of different methods to serve files including storing multiple full copies of files they serve and using parallel drive systems such as Hadoop or RAID. In the first example where multiple copies of the same file are stored on multiple servers, it is easy to see the drawbacks. First and foremost much more disk space is required to store duplicates of files than to store random network encoded chunks. There is also the risk of files becoming corrupted or a service interruption if a single server become unresponsive..
We are designing a content delivery network, or CDN, that will distribute files among servers using random network coding, or RNC. The implementation of random network coding is an attempt to improve file storage efficiency and optimize service uptime. When the user wants to access a piece of content, rather than a single system sending a stream of data, multiple servers send specific chunks to the user, whose device then decodes and reassembles the content. Similar to peer-to-peer systems, this allows an array of servers to deliver content even if multiple servers go offline. However unlike peer-to-peer systems, random network coded chunks take up about one third the space as whole files. You can distribute encoded chunks to any number of servers, and as long as the end user can connect to at least 3 of them, then the file will be served successfully. The only drawback is that they are rate limited by the slowest server.
\includegraphics[scale=0.4]{structure}
This provides remarkable reliability and scalability compared to traditional content delivery networks. Whereas traditionally a front end load balancer is needed to gain the advantage of multiple delivery servers, now the chunk servers can connect directly to the user, eliminating a point of congestion and failure, as well as reducing the complexity of implementation..
The current state of the project at this beta stage is that all the key parts are present and are working together. Encoding and decoding algorithms have been implemented and successfully encode and decode files according to the RNC algorithm. The client to data server and data server to encoding server code works and communicates together via TCP over port 3000. More detail about the current state of these pieces and is discussed in depth below.
\newpage
\section{Encoding and Decoding}
The encoding and decoding algorithms are the main engine of the random network coding project. The encoding process takes some data, performs its algorithm, and returns unique chunks that can be distributed to various servers. The decoding algorithm can then take any three unique chunks generated by the encoding algorithm and use them to reconstruct the original file.
At the end of the fall term the group had a program performing encoding and decoding correctly on a hard-coded piece of data. Over winter break the group started work on getting the encoding and decoding programs reading and outputting files. By the end of the break, the programs were reading, and outputting, but slightly incorrectly. First we will cover specifics about the encoding algorithm and then move onto the decoding algorithm.
The first thing the encoding algorithm needs to do is make decisions based on the input data’s length. These decisions included how many blank bytes to add to the end of the file in order to use the algorithm. The reason blank bytes need to be added is because the algorithm needs three 16-bit elements to create one element of a chunk. Overall this was straightforward to implement, but there was one aspect which caused trouble for me while coding. When reading in the file we store it in an array of 16-bit elements. This can cause some trouble when the original file contains an odd number of bytes. Initially we couldn't figure out why the program wasn't working for certain files. Once the problem was found, implementing it took some delicate work because we needed to constantly juggle the use of bytes and 16-bit element
For the midterm report the encoding algorithm was implemented in a program that read a whole file, performed the algorithm and then wrote each of the chunks to separate files. One major change that needed to be made was to make the encoding and decoding algorithm more memory efficient. This took some time to implement because it involved restructuring the code.
Moving on to the decoding algorithm, this was a much more complex problem. For one the math for doing the actual decoding was much more complex, and we also needed to read and store three chunks correctly throughout the program.
The biggest problem that was faced while developing the decoding program was getting the final file size to line up with the size of the original file. This took some time to fix because the origin of the problem wasn’t in the decode algorithm. After searching it turned out the problem was due to an incorrect output in the encoding algorithm. After fixing that along with correcting counters, the final file and original file started lining up. The lessons learned during development of the encoding problem made writing the decoding program a lot easier. For example juggling the difference between bytes and 16-bit elements was not a problem in the decoding program.
Just like the encoding algorithm, decoding also needed to be updated to store only part of the chunks and the output at a time. This followed a similar pattern to changing the encoding algorithm so work went smoothly. While the actual change wasn’t too difficult to implement it took a good amount of time because variable use need to be changed in order to get things lined up for the algorithm to work.
Currently the encoding and decoding programs work. Correct chunks are created and the decoding program takes them and outputs a final file that can be viewed just like the original. The main thing that needs to be added to the two algorithms moving forward is to add more safety checks. Rather than just having the program die when it encounters something it doesn't expect, it needs to output appropriate error messages.
Since the alpha update the group and our sponsor decided upon a slight restructuring of the project’s servers. This change has caused the encoding and decoding algorithms to rely more heavily on sockets rather than almost exclusively using file input and output. More information about this implementation is given in the encoding server section and the client section.
\newpage
\section{Encoding Server}
The content delivery network the group is developing consists of three main parts, the encoding server, the data server, and the client. This set up represents the flow of data starting with a file, encoding it and passing it through servers until eventually it gets to the user as the reconstructed file. The encoding server is the start of the flow, Taking a given file, the encoding server utilizes the encoding algorithm to generate several chunks and sends it over the network for each of the data servers.
\includegraphics[scale=0.7]{EncodingServer}
Previously in the year and for the alpha this part of the content delivery network didn’t exist. Not because it wasn't developed but because the group and our sponsor didn’t feel it was necessary for the server setup we had. However, shortly after the alpha the group decided that in order to improve the scalability of the project the addition of a central server was necessary, for us this took the form of the encoding server. Rather than each data server being responsible for generating its own chunks, now a central encoding server takes up the responsibility of generating the encoded content. This change keeps the project more in line with the spirit of a RNC enabled content delivery network.
Implementing the network code for the encoding server wasn’t extremely difficult because a lot of the code the data server used to send chunks to the client could be repurposed to send the chunks to the different data servers. The main change that needed to happen when repurposing the code was that the encoding server needed to initiate the communication as well as push the chunk file data. To do this the group created a messaging header system for communication between the different parts of the system. Additionally the chunk header needed to be overhauled to work more effectively with network transfers.
Once the network code was setup it needed to be integrated with the encoding algorithm. Currently the encoding server is writing the chunks to the local disk and then sending them to the correct data servers over a socket. This kind of integration allows the algorithm and the network code to stay isolated from each other and this made combining the two much simpler. Because this is new part of the system is very new code it has not been heavily tested and will most likely need further adjustment.
\newpage
\section{Data Server}
The Data Server helps to link the Encoding server and client programs together. It is composed of two processes that communicate with the other programs over the network using TCP. In addition, the Data server implements (and exposes) a custom protocol that has been created to facilitate the storage and exchange of encoded data.
This protocol is very simple and works by attaching a common header to every TCP transmission. This header provides a stateful description of the purpose of said message and also has the ability to specify that a piece of data, or payload will be sent after the message. Here is the rough format of the message in C:
\begin{verbatim}
#define TYPE_ACK 1
#define TYPE_ERR 2
#define TYPE_RESPOND_CHUNK 3
#define TYPE_REQUEST_CHUNK 4
struct MessageHeader {
int Type;
int Length;
}
\end{verbatim}
One of the more important purposes of this protocol is to define how file chunks (a key portion of the RNC algorithm) are transmitted. To do this, it implements a request/response procedure. Of note is the payload data for the message with a type of 3 (respond chunk):
\begin{verbatim}
struct ChunkHeader {
char file_name[30];
float *coef[3] //3x3 array;
int chunk_size;
}
\end{verbatim}
By having an interface to receive chunks new chunks, store them on disk, and transmit them, the Data server implements the full necessary functionality of mediating between the Client and Encoding Servers. Currently, all of the necessary functionality is built in code. However, it needs to be tested against many different edge cases that could cause it to break (such as the receiving/storing/transmission of very large files).
\newpage
\section{Client}
The client for this project is pretty simple in abstract. You call the “client” executable with the filename of the file that you would like to fetch. The client then negotiates a connection and downloads the file that corresponds with the filename that you specified. In our beta build pictured below we are have support for downloading multiple files dynamically, since we need to be able to fetch each of the encoded chunks to reassemble the file. We have changed the server from being hard coded to one specific IP address to being able to communicate to three different data server IP addresses, specified before compile time. The client is also currently attempting to read the file size that it is receiving, but is not working correctly yet.
\includegraphics[scale=0.8]{chris3}
We are doing all of our communication between the data servers and client over port 3000 in our beta release, and port 3001 for communication between the encoding server and the data servers. This is still subject to change as these are commonly used development ports. It is an easy, one line change to switch communication to a different port. The figure below are our declarations, which show how our initial variables are set up in this beta build. Message size is still hard coded right now because we are debugging the parsing of our custom file headers, which will enable us to dynamically determine file size. Writing the base client code that takes a word from the command line, sends it to the server, and receives the server’s response was easy to write. Between the alpha release and this beta release, we have combined the client and decoding applications, which allows us to both fetch and decode encoded file chunks in one executable. The functions that power this, fetchFile and decodeFile, are shown in the figure below.
\includegraphics[scale=0.8]{chris2}
\includegraphics[scale=0.8]{chris1}
One smart programming step that we took building this client application is that it isn’t memory bound, since it doesn’t attempt to hold the whole received file in memory before printing it or writing it to disk. Every 32 bits that come in are printed stored in buffer, printed out, appended to the file on the disk, and then erased as buffer is cleared after every iteration of the receiving loop. This might not be as fast as simply holding it in memory until the file is done transferring, but allows greater flexibility for receiving large files. Shown below is our code for the receiving loop. Once we have the code to receive multiple files written, we need to call the decode executable that was covered earlier. We need to call it with the names of the three chunks that we downloaded. The client is robust and the main bug we need to fix going forward is parsing the file headers.
\includegraphics[scale=0.8]{chris4}
\newpage
\section{Interesting Code}
The key piece of code for this project is the math that drives the encode and decode algorithms. While each piece of code is important in driving the project, the core of random network coding is the encoding and decoding engine itself.
The first piece of code is the process by which a file is split up into chunks. The code fragment shown below has two main parts, generating the coefficients and a for loop which performs the operation which encodes the data.
The coefficients are randomly generated non-zero 16-bit numbers. Three numbers are generated per chunk and they must be kept with the chunk in order to be able to recreate the original file from the chunks.
Moving on to the last bit of code, the for loop which does the actual encoding. The loop iterates through the 16-bit file array, three elements at a time. With each iteration, the library our sponsor provided requires giving a coefficient and a piece of the file as input. The three results are then merged and stored in an output array, which can then be written to a file or sent to the storage server.
\includegraphics[scale=0.4]{interesting2}
\newpage
The second piece of code is reversing the process performed in the above code. The code fragment below is the process by which any three of the generated chunks can be used to recreate the original file.
The first thing done is to create the variables c0, c1, c2, c4. These variables store different combinations of the coefficients of the chunks given. These variables will be later used with the chunks data to create the final file.
The next step is to create t0, and t1. This is the first stage of combining chunk data with the coefficients. t0 and t1 are basically 2 additional chunks. They are a third the size of the original file, like the chunks. The difference being that t0 and t1 are formed from the chunks themselves rather than the original file.
The final two things to do is to create the three parts of the file and assemble them. This is done by using all the variables we made earlier. x0, x1, and x2 constitute the whole file, all we need to do to merge them is to interweave them. Each first element of x0, x1, and x2 are the 1st, 2nd, and 3rd element of the final file respectively. Once everything is set we have the final file stored in the array called final. This can then be written out to a file where is can used just like the original file.
\includegraphics[scale=0.4]{interesting1}
\newpage
\section{Conclusion}
Overall, our project has made great progress over the last several weeks. We have built all the code necessary to meet the definition of MVP as per our Design Document. We have built the three major features necessary to implement a CDN: An interface to put files into the system, server(s) to store the CDN data on and a client interface/program to receive data from the CDN. Furthermore, we have adapted these requirements to the use case of a Randomly Coded CDN.
We have several major objectives for the remainder of the project. First, we need to test the framework we have against important edge-cases. Any flaws that appear will need to be fixed. Secondly, we need to build a UI for the client-side. This last feature is one of our stretch goals that we believe is feasible to create in the time we have remaining.
Moving forward, we should have no problem continuing an excellent rate of project completion over the Spring term. Hopefully, we will complete our stretch goals in addition to having a fully functioning RNC CDN by the end of Spring term.
\end{document}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment