Skip to content

Instantly share code, notes, and snippets.

@Steve132
Forked from anonymous/gist:4515990
Last active December 11, 2015 01:28
Show Gist options
  • Save Steve132/4523240 to your computer and use it in GitHub Desktop.
Save Steve132/4523240 to your computer and use it in GitHub Desktop.
#include<iostream>
#include<fstream>
#include<iomanip>
#include<string>
#include<cmath> //You do have the correct versions of the includes. Some teachers teach the C-versions, so kudos to you for getting the right ones.
using namespace std; //using namespaces isn't harmful per-se, but be careful you never use it in a header. However, it conventionally comes after the includes.
//a data set is a concept of a single thing that can be read from an io stream,operated on, and passed around.
//make it a class.
class data_set
{
public:
/*It seems that your problem specification seems to imply that ALL data sets
* are made up of exactly 11x5. In a real application most data sets would be dynamic,
* so you'd want to use something like std::vector<> to store the data.
* however, I didn't want the added complexity to confuse you, and as long
* as your problem size is a part of the spec, there are a few advantages to using normal arrays.
*/
static const unsigned int COLUMN_SIZE=11;
static const unsigned int NUM_LINES=4;
//its better to use constants for these sizes, because then you can ensure they
//are consistent and you don't have magic numbers all over the code.
float xdata[COLUMN_SIZE];
float ydata[NUM_LINES][COLUMN_SIZE]; //space for 4 data inputs. I switched the order.....you'll see why later.
};
//Don't name functions with "it"..."readit" "writeit"...thats vague, you wouldn't use english that way.
//read WHAT? always name things as descriptively as is reasonable.
//All functions are verbs. The "arguments" are the HOW you want to perform the action. the return value
//is the expected result.
//when you design functions, design them with that in mind if possible.
//In this case, the verb to be performed is "read the data" (read_data). The how is "read the data
//from an istream reference called 'in' "(istream& in)" and and returns the resulting data (data_set).
//of course, there are other ways to design this...
data_set read_data(std::istream& in);
//The action to be perfomed is "write the data". How is "use this data, and print to a certain output stream"
//No results are expected to be returned.
void write_data(const data_set& dat,std::ostream& out);
/*this problem of vagueness is doubly compounded by your 'calcit' function. Calculate WHAT? calculate HOW? What are the inputs? what are the outputs? "Calculate" could mean ANYTHING. However, I'm handling that a little bit differently, so its not declared here.
*/
//similarly to data_set, another thing conceptually that you work on operate on, and pass around, is the RESULT of your
//computation, the results of the regression.
//You use a ton of variables, passing them all around together. In programming, a bunch of variables that always
//show up together should clue you in that they are all really one concept...one kind of thing, and you
//should make a class out of them...that class is below.
//If you were to describe your program's purpose in english, you might say it
//takes in a data_set, and outputs the result. STRUCTURE YOUR PROGRAM LIKE YOU CONCEPTUALLY DESCRIBE IT.
//We use another concept: lines. line data also seem to go around together...
//that is, a line is a concept that groups information too!
struct line
{
//lines have a slope,intercept,and a fit goodness associated with each one.
float slope;
float intercept;
//speaking of names, put underscores or some other kind of seperator between words
float goodness_of_fit;
};
//the regression_result class mentioned earleir.
class regression_result
{
public:
//Your regression result seems to have some information associated with it.
//primarily...
//a copy of the original data
data_set original_data;
//4 lines, one for each column of the dataset
line fitlines[data_set::NUM_LINES];
//and which of the 4 lines is the best
int bestfit;
//Your regression result type CANNOT exist unless it is created by the means of some processing
//from a data set object. That is, does it make sense to say, in english "The results existed before I
//collected the data? of course not. Instead, you would say "results can only be created using some
//existing data. I represented this relationship by saying that the only valid regression_result
//constructor requires a reference to an unchanging dataset to build from.
regression_result(const data_set& d);
};
//The next verb we define is to write a regression result to an output stream. This verb should not result
//in anything back, so its void again.
void write_result(const regression_result&,ostream& out);
//its a personal preference of mine to define functions before main, not after, but I did it your way here.
int main(int argc,char** argv)
{
//its best to not hardcode things. Here, I add the default values, but
//scan the command-line arguments to the program so that users can change the input and output
//at runtime
std::string input_filename="C:\\egr111\\temperaturedata.txt";
std::string output_filename="C:\\egr111\\results.txt";
if(argc > 1) //if there is one argument
{
input_filename=argv[1]; //get the first (meaningful) argument. arg[0] is always "regression.exe"
}
if(argc > 2) //if there is two arguments
{
output_filename=argv[2];
}
//because we did such descriptive work in naming, our main method reads sort of like
//an english description of what our program does.
//First, we open the input file, and if it failed to open (something wasn't good) we report an error and quit
std::ifstream input_file(input_filename.c_str());
if(!input_file)
{
cerr << "Failed to open " << input_filename;
return -1;
}
//Next, we read the data from the input file.
data_set data=read_data(input_file);
//Next, we create a regression_result from the data
regression_result results(data);
//Finally, we open the output file
std::ofstream output_file(output_filename.c_str());
//and print the results to the output file
write_result(results,output_file);
//now we are done!
return 0;
//see how easy it was to read and understand what is going on in main()? By being descriptive the code is clear.
}
//now, lets go back and define our functions.
data_set read_data(std::istream& in)
{
//this function reads the data...so we have to have a data object to read into...
data_set temp;
//skip a line by reading it into a string
std::string line;
getline(in,line);
//read the data.
for(int row=0;row<data_set::COLUMN_SIZE;row++)
{ //always use curly braces on loops...it can prevent strange errors from occurring.
in >> temp.xdata[row];
for(int column=0;column<data_set::NUM_LINES;column++)
{
in >> temp.ydata[column][row];
}
}
//the output result should be the data we just read.
return temp;
}
//write a data set is seperate from writing a result..
void write_data(const data_set& data,ostream& out)
{
out << "T(F)\tR1\tR2\tR3\tR4\n"; //I prefer to use tabulators to create columns.
//I tried to figure out what you were doing with your fancy formatting, but it honestly appeared to me as //if what you were trying to do was kludge it to customize your output to line up for a particular output result that you were getting.
//Thats sorta a very bad idea. Pick one formatting convention and stick to it.
for(int row=0;row<data_set::COLUMN_SIZE;row++)
{
out << setw(5) << data.xdata[row] << "\t";
for(int c=0;c<data_set::NUM_LINES;c++)
{
out << data.ydata[c][row] << "\t";
}
}
}
//We want to do a simple line fit. the line fit verb does what? It reads in a column of xdata,
//a column of ydata, and outputs a line
line compute_line_fit(const float* xdata,const float* ydata)
{
line l;
//to many programmers, the 'of' is implied if you use _
float sum_x=0.0f,
sum_y=0.0f,
sum_xy=0.0f,
sum_xx=0.0f,
sum_yy=0.0f;
//for each row
for(int row=0;row<data_set::COLUMN_SIZE;row++) //why break convention with 'counter'...you used row/column everywhere else.
{
//declare variables in the scope they are needed
float x=xdata[row];
float y=ydata[row];
//"a+=b" is a faster and more concise version of "a=a+b"
sum_x += x;
sum_y += y;
sum_xy += x*y;
sum_xx += x*x; //don't use the pow() function if you can help it...it is VERY VERY slow.
sum_yy += y*y;
}
float avex = sum_x/11; //and declare variables at the last possible second they are needed.
float avey = sum_y/11;
//do the math...
//when using floating point equations, its generally a good idea to use floating point constants)
l.slope=(11.0f*sum_xy-sum_x*sum_y)/(11.0f*sum_xx-sum_x*sum_x);
l.intercept=(sum_y - l.slope*sum_x)/11.0;
//only call sqrt once...its very slow. its valid because sqrt(ab)=sqrt(a)sqrt(b). Also, match the sqrt call to the data type.
l.goodness_of_fit = (sum_xy - 11.0*avex*avey)/(sqrtf(
(sum_xx-11.0*avex*avex)*(sum_yy-11.0*avey*avey)));
//return the line
return l;
}
//This is what is known as a constructor. it is a special function that gets called when you create a
//regression_result variable. This means that simply declaring a regression_result variable with a data_set
//will perform the calculations immediately.
regression_result::regression_result(const data_set& dat):
original_data(dat) //copy the data to original_data using an initializer list.
{
//to create our regresson result, we perform the calculations
//first, assume the best fit is 0
bestfit=0;
float biggest=0.0f;
//for each data input, compute the line and see if its the best fit so far.
for(int i=0;i<data_set::NUM_LINES;i++)
{
//because I changed the order earlier, dat.ydata[i] means 'the entire block of 11 elements'. Nice!
fitlines[i]=compute_line_fit(dat.xdata,dat.ydata[i]); //one line of code to perform the fit.
if(fitlines[i].goodness_of_fit > biggest)
{
biggest=fitlines[i].goodness_of_fit;
bestfit=i;
}
}
}
//finally, print the results to the file and screen.
void write_result(const regression_result& result,ostream& out)
{
//printing the original data literally is one line now. Very clear.
write_data(result.original_data,out);
//Again, your version tweaked it to get special output for a specific result...don't do that.
out << "\nslope:";
for(int i=0;i<data_set::NUM_LINES;i++)
{
out << "\t" << result.fitlines[i].slope;
}
out << "\nintcpt:";
for(int i=0;i<data_set::NUM_LINES;i++)
{
out << "\t" << result.fitlines[i].intercept;
}
out << "\nrsqrd:";
for(int i=0;i<data_set::NUM_LINES;i++)
{
out << "\t" << result.fitlines[i].goodness_of_fit;
}
//if you MUST duplicate code (you could do this with a sstream and not)...but if you must
//put it right next to the duplicate, so you can compare them visually.
cout << "The most linear data set is R" << result.bestfit + 1 << " with an rsquared value of " << result.fitlines[result.bestfit].goodness_of_fit << ".\n\n";
out << "The most linear data set is R" << result.bestfit + 1 << " with an rsquared value of " << result.fitlines[result.bestfit].goodness_of_fit << ".\n\n";
}
#include<iostream>
#include<fstream>
#include<iomanip>
#include<string>
#include<cmath>
using namespace std;
class data_set
{
public:
static const unsigned int COLUMN_SIZE=11;
static const unsigned int NUM_LINES=4;
float xdata[COLUMN_SIZE];
float ydata[NUM_LINES][COLUMN_SIZE];
};
data_set read_data(std::istream& in);
void write_data(const data_set& dat,std::ostream& out);
struct line
{
float slope;
float intercept;
float goodness_of_fit;
};
class regression_result
{
public:
data_set original_data;
line fitlines[data_set::NUM_LINES];
int bestfit;
regression_result(const data_set& d);
};
void write_result(const regression_result&,ostream& out);
int main(int argc,char** argv)
{
std::string input_filename="C:\\egr111\\temperaturedata.txt";
std::string output_filename="C:\\egr111\\results.txt";
if(argc > 1)
{
input_filename=argv[1];
}
if(argc > 2)
{
output_filename=argv[2];
}
std::ifstream input_file(input_filename.c_str());
if(!input_file)
{
cerr << "Failed to open " << input_filename;
return -1;
}
data_set data=read_data(input_file);
regression_result results(data);
std::ofstream output_file(output_filename.c_str());
write_result(results,output_file);
return 0;
}
data_set read_data(std::istream& in)
{
data_set temp;
std::string line;
getline(in,line);
for(int row=0;row<data_set::COLUMN_SIZE;row++)
{
in >> temp.xdata[row];
for(int column=0;column<data_set::NUM_LINES;column++)
{
in >> temp.ydata[column][row];
}
}
return temp;
}
void write_data(const data_set& data,ostream& out)
{
out << "T(F)\tR1\tR2\tR3\tR4\n";
for(int row=0;row<data_set::COLUMN_SIZE;row++)
{
out << setw(5) << data.xdata[row] << "\t";
for(int c=0;c<data_set::NUM_LINES;c++)
{
out << data.ydata[c][row] << "\t";
}
}
}
line compute_line_fit(const float* xdata,const float* ydata)
{
line l;
float sum_x=0.0f,
sum_y=0.0f,
sum_xy=0.0f,
sum_xx=0.0f,
sum_yy=0.0f;
for(int row=0;row<data_set::COLUMN_SIZE;row++)
{
float x=xdata[row];
float y=ydata[row];
sum_x += x;
sum_y += y;
sum_xy += x*y;
sum_xx += x*x;
sum_yy += y*y;
}
float avex = sum_x/11;
float avey = sum_y/11;
l.slope=(11.0f*sum_xy-sum_x*sum_y)/(11.0f*sum_xx-sum_x*sum_x);
l.intercept=(sum_y - l.slope*sum_x)/11.0;
l.goodness_of_fit = (sum_xy - 11.0*avex*avey)/(sqrtf(
(sum_xx-11.0*avex*avex)*(sum_yy-11.0*avey*avey)));
return l;
}
regression_result::regression_result(const data_set& dat):
original_data(dat)
{
bestfit=0;
float biggest=0.0f;
for(int i=0;i<data_set::NUM_LINES;i++)
{
fitlines[i]=compute_line_fit(dat.xdata,dat.ydata[i]); //one line of code to perform the fit.
if(fitlines[i].goodness_of_fit > biggest)
{
biggest=fitlines[i].goodness_of_fit;
bestfit=i;
}
}
}
void write_result(const regression_result& result,ostream& out)
{
write_data(result.original_data,out);
out << "\nslope:";
for(int i=0;i<data_set::NUM_LINES;i++)
{
out << "\t" << result.fitlines[i].slope;
}
out << "\nintcpt:";
for(int i=0;i<data_set::NUM_LINES;i++)
{
out << "\t" << result.fitlines[i].intercept;
}
out << "\nrsqrd:";
for(int i=0;i<data_set::NUM_LINES;i++)
{
out << "\t" << result.fitlines[i].goodness_of_fit;
}
cout << "The most linear data set is R" << result.bestfit + 1 << " with an rsquared value of " << result.fitlines[result.bestfit].goodness_of_fit << ".\n\n";
out << "The most linear data set is R" << result.bestfit + 1 << " with an rsquared value of " << result.fitlines[result.bestfit].goodness_of_fit << ".\n\n";
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment