chunkyguy/theory.cpp

## theory.cpp
#include<iostream>
#include<fstream>

using namespace std;

struct test
{
    short sss;
    char ccc;
    int iii;
};

//print the stream, each byte.
void print_stream(char *stream, int count) {
    for ( int i = 0; i < count; ++i) {
        printf("%d:",stream[i]);
    }
    printf("\n");
}


int main()
{
    test t_wr;
    t_wr.sss = 10;
    t_wr.ccc = 65;
    t_wr.iii = 25;

    print_stream((char*)&t_wr, sizeof(test));

    ofstream obj("abc.txt",ios::binary);
    obj.write((char *)&t_wr,sizeof(test));
    obj.close();

    ifstream obj2("abc.txt",ios::binary);

    cout << "sizoeof(test) = " << sizeof(test) << endl; // Total size of an object.
    obj2.seekg(ios::beg);

    int count = sizeof(test);   //Num of bytes to read at each iteration.
    while(!obj2.eof())  {
        test t_rd;
        print_stream((char*)&t_rd, sizeof(test));
        cout<<"reading: ["<< obj2.tellg() << "-" << count+obj2.tellg() << "]" << endl;
        obj2.read((char *)&t_rd,count);
        print_stream((char*)&t_rd, sizeof(test));
        cout << "{sss:" << t_rd.sss << " ccc:" << t_rd.ccc << " iii: " << t_rd.iii << " }" << endl;
        cout << "{goodbit: " << obj2.good()  << " eofbit: " << obj2.eof()
        << " badbit: " << obj2.bad() << " failbit: " << obj2.fail() << "}" << endl;
    }
    obj2.close();
}


/*
    OUTPUT

10:0:65:81:25:0:0:0:
sizoeof(test) = 8
16:115:-14:81:-1:127:0:0:
reading: [0-8]
10:0:65:81:25:0:0:0:
{sss:10 ccc:A iii: 25 }
{goodbit: 1 eofbit: 0 badbit: 0 failbit: 0}
10:0:65:81:25:0:0:0:
reading: [8-16]
10:0:65:81:25:0:0:0:
{sss:10 ccc:A iii: 25 }
{goodbit: 0 eofbit: 1 badbit: 0 failbit: 1}
*/

/*
    Theory

[n] means refer to the nth line number in this text.
[n-m] means refer to line numbers n to m, both inclusive.

The test class has 3 data types [8-10]. A short, a char and an int.

We use the print_stream function to print the byte-wise memory layout [14-19]
This is the memory layout.
A:B:C:D:E:F:G:H
Each alphabet represents a byte as
sss = A:B = 2 bytes
ccc = C:D = 1 byte + 1 byte padding
iii = E:F:G:H = 4 bytes.

As is clear from the diagram above that byte D is useless, just for padding purpose.

We create our first object to be written to a file [24-27]
This is how it is stored in memory [58]
10:0:65:81:25:0:0:0:
Here byte D = 81 is garbage.

While reading the memory layout is printed twice once when the new object is created [43] and
again when the data is read [46]
This is done, so that we can check what as the garbage value when the object is created,
and what is the actual value after read operation.

At first read for bytes 0 - 8 (that is the entire file except EOF), we observe that
the memory layout is [60]
16:115:-14:81:-1:127:0:0:
This is totally garbage values.
Then after read op it becomes [62]
10:0:65:81:25:0:0:0:
Which the perfect data what we expected. To confirm we print the data values
{sss:10 ccc:A iii: 25 }
And also the state of the stream. More on this later
{goodbit: 1 eofbit: 0 badbit: 0 failbit: 0}

At next iteration, a new object is created with default values [65]
10:0:65:81:25:0:0:0:
Which is supposed to be garbage but is not. Why? More on this later.

Next, we attempt to read bytes 8 - 16 [66]
This is an illegal operation. The first thing that is being is read is an EOF,
and rest is all ignored.
So, when printed the same garbage values get printed again, and we feel like
why the fuck is the data being read successfully twice. It is an illusion.
We confirm by printing the stream state, that the stream is in fact not in good state
{goodbit: 0 eofbit: 1 badbit: 0 failbit: 1}

So, next iteration quits the loop.

Q1: What are these stream states?

A1: The way C++ handles data is that it whenever we load data from any stream(keyboard, file,..)
it loads it in a  ONE WAY buffered fashion. Actually, we do have a way to load unbuffered data
using extremely low level read function ( OS level )

The default behavior of stream is to just move in forward direction,
but there are ways to move back a bit using seekg(). That's where the BUFFERed keyword
comes in.

So, the stream of data maintains its own state. Here's the meaning of all the flags:

goodbit: 1 means everything is alright, 0 means something is wrong.

eofbit: 1 mean the end of file has been reached. 0 means no EOF seen yet.

badbit: 1 means the stream got corrupted for some reason. 0 means stream is healthy

failbit: 1 means some illegal operation was performed. Keep in mind that illegal does not
means invalid. The stream is still healthy, just that you did some thing you were not
supposed to do. It's more like a moral flag.

Remember, stream moves only in forward direction, so once it has seen the EOF and set eofbit,
it has seen it forever. Similarly for all the other bits.
If that happens, it is your responsibility to reset the bits by using the clear() method.

Another thing, like in the code above I'm using methods obj2.fail() to get the flag values.
This is not the correct way, you should use rdstate() to get the bit flag value.


Q2: Why was garbage data next time so accurate?
Now, the main question, why is the garbage values so accurate.
First of all, there is no such thing as 'garbage value'. We call any unexpected values as
garbage. Keep in mind that computer are lazy as fuck, they don't 0 out anything. They just
keep things the way the were, as long as someone changes them.

When we created the test object for the first time we allocated 8 bytes from the stack.
When the object got destroyed the stack pointer just moved back 8 bytes keeping the data
as it was.

Next, when we tried to allocate the test object again, it again moved the stack-pointer
8 bytes forward and wola we got the data already filled up for us.

If we do some extra things like allocating something between the two allocations, the
data will corrupted.

For example, try changing the code [41-50] with this:

  int i = 0;
	while(!obj2.eof()) 	{
		test t_rd[2];
		print_stream((char*)&t_rd, sizeof(test));
		cout<<"reading: ["<< obj2.tellg() << "-" << count+obj2.tellg() << "]" << endl;
		obj2.read((char *)&t_rd[i],count);
		print_stream((char*)&t_rd[i], sizeof(test));
		cout << "{sss:" << t_rd[i].sss << " ccc:" << t_rd[i].ccc << " iii: " << t_rd[i].iii << " }" << endl;
		cout << "{goodbit: " << obj2.good()  << " eofbit: " << obj2.eof() << " badbit: " << obj2.bad() << " failbit: " << obj2.fail() << "}" << endl;
		i++;
	}

You will see the output similar to:
10:0:65:0:25:0:0:0:
sizoeof(test) = 8
5:0:0:0:0:0:0:0:
reading: [0-8]
10:0:65:0:25:0:0:0:
{sss:10 ccc:A iii: 25 }
{goodbit: 1 eofbit: 0 badbit: 0 failbit: 0}
10:0:65:0:25:0:0:0:
reading: [8-16]
-120:26:-37:99:-1:127:0:0:
{sss:6792 ccc:? iii: 32767 }
{goodbit: 0 eofbit: 1 badbit: 0 failbit: 1}

You can see the second time data is not garbage [195],
because now we are allocating two test objects in the stack and first iteration read
the object at index=0 and leaving the object at index=1 still garbage. In the next
iteration we read the data into object at index=1, and since the read is not
successful, hence the data is garbage.
Still, if you print the object at index=0 during second iteration, you could get the
original data.
*/
	#include<iostream>
	#include<fstream>

	using namespace std;

	struct test
	{
	short sss;
	char ccc;
	int iii;
	};

	//print the stream, each byte.
	void print_stream(char *stream, int count) {
	for ( int i = 0; i < count; ++i) {
	printf("%d:",stream[i]);
	}
	printf("\n");
	}


	int main()
	{
	test t_wr;
	t_wr.sss = 10;
	t_wr.ccc = 65;
	t_wr.iii = 25;

	print_stream((char*)&t_wr, sizeof(test));

	ofstream obj("abc.txt",ios::binary);
	obj.write((char *)&t_wr,sizeof(test));
	obj.close();

	ifstream obj2("abc.txt",ios::binary);

	cout << "sizoeof(test) = " << sizeof(test) << endl; // Total size of an object.
	obj2.seekg(ios::beg);

	int count = sizeof(test); //Num of bytes to read at each iteration.
	while(!obj2.eof()) {
	test t_rd;
	print_stream((char*)&t_rd, sizeof(test));
	cout<<"reading: ["<< obj2.tellg() << "-" << count+obj2.tellg() << "]" << endl;
	obj2.read((char *)&t_rd,count);
	print_stream((char*)&t_rd, sizeof(test));
	cout << "{sss:" << t_rd.sss << " ccc:" << t_rd.ccc << " iii: " << t_rd.iii << " }" << endl;
	cout << "{goodbit: " << obj2.good() << " eofbit: " << obj2.eof()
	<< " badbit: " << obj2.bad() << " failbit: " << obj2.fail() << "}" << endl;
	}
	obj2.close();
	}


	/*
	OUTPUT

	10:0:65:81:25:0:0:0:
	sizoeof(test) = 8
	16:115:-14:81:-1:127:0:0:
	reading: [0-8]
	10:0:65:81:25:0:0:0:
	{sss:10 ccc:A iii: 25 }
	{goodbit: 1 eofbit: 0 badbit: 0 failbit: 0}
	10:0:65:81:25:0:0:0:
	reading: [8-16]
	10:0:65:81:25:0:0:0:
	{sss:10 ccc:A iii: 25 }
	{goodbit: 0 eofbit: 1 badbit: 0 failbit: 1}
	*/

	/*
	Theory

	[n] means refer to the nth line number in this text.
	[n-m] means refer to line numbers n to m, both inclusive.

	The test class has 3 data types [8-10]. A short, a char and an int.

	We use the print_stream function to print the byte-wise memory layout [14-19]
	This is the memory layout.
	A:B:C:D:E:F:G:H
	Each alphabet represents a byte as
	sss = A:B = 2 bytes
	ccc = C:D = 1 byte + 1 byte padding
	iii = E:F:G:H = 4 bytes.

	As is clear from the diagram above that byte D is useless, just for padding purpose.

	We create our first object to be written to a file [24-27]
	This is how it is stored in memory [58]
	10:0:65:81:25:0:0:0:
	Here byte D = 81 is garbage.

	While reading the memory layout is printed twice once when the new object is created [43] and
	again when the data is read [46]
	This is done, so that we can check what as the garbage value when the object is created,
	and what is the actual value after read operation.

	At first read for bytes 0 - 8 (that is the entire file except EOF), we observe that
	the memory layout is [60]
	16:115:-14:81:-1:127:0:0:
	This is totally garbage values.
	Then after read op it becomes [62]
	10:0:65:81:25:0:0:0:
	Which the perfect data what we expected. To confirm we print the data values
	{sss:10 ccc:A iii: 25 }
	And also the state of the stream. More on this later
	{goodbit: 1 eofbit: 0 badbit: 0 failbit: 0}

	At next iteration, a new object is created with default values [65]
	10:0:65:81:25:0:0:0:
	Which is supposed to be garbage but is not. Why? More on this later.

	Next, we attempt to read bytes 8 - 16 [66]
	This is an illegal operation. The first thing that is being is read is an EOF,
	and rest is all ignored.
	So, when printed the same garbage values get printed again, and we feel like
	why the fuck is the data being read successfully twice. It is an illusion.
	We confirm by printing the stream state, that the stream is in fact not in good state
	{goodbit: 0 eofbit: 1 badbit: 0 failbit: 1}

	So, next iteration quits the loop.

	Q1: What are these stream states?

	A1: The way C++ handles data is that it whenever we load data from any stream(keyboard, file,..)
	it loads it in a ONE WAY buffered fashion. Actually, we do have a way to load unbuffered data
	using extremely low level read function ( OS level )

	The default behavior of stream is to just move in forward direction,
	but there are ways to move back a bit using seekg(). That's where the BUFFERed keyword
	comes in.

	So, the stream of data maintains its own state. Here's the meaning of all the flags:

	goodbit: 1 means everything is alright, 0 means something is wrong.

	eofbit: 1 mean the end of file has been reached. 0 means no EOF seen yet.

	badbit: 1 means the stream got corrupted for some reason. 0 means stream is healthy

	failbit: 1 means some illegal operation was performed. Keep in mind that illegal does not
	means invalid. The stream is still healthy, just that you did some thing you were not
	supposed to do. It's more like a moral flag.

	Remember, stream moves only in forward direction, so once it has seen the EOF and set eofbit,
	it has seen it forever. Similarly for all the other bits.
	If that happens, it is your responsibility to reset the bits by using the clear() method.

	Another thing, like in the code above I'm using methods obj2.fail() to get the flag values.
	This is not the correct way, you should use rdstate() to get the bit flag value.


	Q2: Why was garbage data next time so accurate?
	Now, the main question, why is the garbage values so accurate.
	First of all, there is no such thing as 'garbage value'. We call any unexpected values as
	garbage. Keep in mind that computer are lazy as fuck, they don't 0 out anything. They just
	keep things the way the were, as long as someone changes them.

	When we created the test object for the first time we allocated 8 bytes from the stack.
	When the object got destroyed the stack pointer just moved back 8 bytes keeping the data
	as it was.

	Next, when we tried to allocate the test object again, it again moved the stack-pointer
	8 bytes forward and wola we got the data already filled up for us.

	If we do some extra things like allocating something between the two allocations, the
	data will corrupted.

	For example, try changing the code [41-50] with this:

	int i = 0;
	while(!obj2.eof()) {
	test t_rd[2];
	print_stream((char*)&t_rd, sizeof(test));
	cout<<"reading: ["<< obj2.tellg() << "-" << count+obj2.tellg() << "]" << endl;
	obj2.read((char *)&t_rd[i],count);
	print_stream((char*)&t_rd[i], sizeof(test));
	cout << "{sss:" << t_rd[i].sss << " ccc:" << t_rd[i].ccc << " iii: " << t_rd[i].iii << " }" << endl;
	cout << "{goodbit: " << obj2.good() << " eofbit: " << obj2.eof() << " badbit: " << obj2.bad() << " failbit: " << obj2.fail() << "}" << endl;
	i++;
	}

	You will see the output similar to:
	10:0:65:0:25:0:0:0:
	sizoeof(test) = 8
	5:0:0:0:0:0:0:0:
	reading: [0-8]
	10:0:65:0:25:0:0:0:
	{sss:10 ccc:A iii: 25 }
	{goodbit: 1 eofbit: 0 badbit: 0 failbit: 0}
	10:0:65:0:25:0:0:0:
	reading: [8-16]
	-120:26:-37:99:-1:127:0:0:
	{sss:6792 ccc:? iii: 32767 }
	{goodbit: 0 eofbit: 1 badbit: 0 failbit: 1}

	You can see the second time data is not garbage [195],
	because now we are allocating two test objects in the stack and first iteration read
	the object at index=0 and leaving the object at index=1 still garbage. In the next
	iteration we read the data into object at index=1, and since the read is not
	successful, hence the data is garbage.
	Still, if you print the object at index=0 during second iteration, you could get the
	original data.
	*/