Skip to content

Instantly share code, notes, and snippets.

@tbuckl
Created May 16, 2018 15:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tbuckl/9fd7e14e14f127536666b814991bbe0d to your computer and use it in GitHub Desktop.
Save tbuckl/9fd7e14e14f127536666b814991bbe0d to your computer and use it in GitHub Desktop.
Title Post Link Id Score ViewCount Body
What is the difference between discrete data and continuous data? { "id": 206, "title": "What is the difference between discrete data and continuous data?" } 206 47 635826 <p>What is the difference between discrete data and continuous data?</p>
How to normalize data to 0-1 range? { "id": 70801, "title": "How to normalize data to 0-1 range?" } 70801 184 633404 <p>I am lost in normalizing, could anyone guide me please.</p> <p>I have a minimum and maximum values, say -23.89 and 7.54990767, respectively.</p> <p>If I get a value of 5.6878 how can I scale this value on a scale of 0 to 1.</p>
What's the difference between variance and standard deviation? { "id": 35123, "title": "What's the difference between variance and standard deviation?" } 35123 83 500669 <p>I was wondering what the difference between the variance and the standard deviation is. </p> <p>If you calculate the two values, it is clear that you get the standard deviation out of the variance, but what does that mean in terms of the distribution you are observing?</p> <p>Furthermore, why do you really need a standard deviation?</p>
How do I get the number of rows of a data.frame in R? { "id": 5253, "title": "How do I get the number of rows of a data.frame in R?" } 5253 126 487003 <p>After reading a dataset:</p> <pre><code>dataset &lt;- read.csv("forR.csv") </code></pre> <ul> <li>How can I get R to give me the number of cases it contains?</li> <li>Also, will the returned value include of exclude cases omitted with <code>na.omit(dataset)</code>?</li> </ul>
How to summarize data by group in R? { "id": 8225, "title": "How to summarize data by group in R?" } 8225 163 473087 <p>I have R data frame like this:</p> <pre><code> age group 1 23.0883 1 2 25.8344 1 3 29.4648 1 4 32.7858 2 5 33.6372 1 6 34.9350 1 7 35.2115 2 8 35.2115 2 9 35.2115 2 10 36.7803 1 ... </code></pre> <p>I need to get data frame in the following form:</p> <pre><code>group mean sd 1 34.5 5.6 2 32.3 4.2 ... </code></pre> <p>Group number may vary, but their names and quantity could be obtained by calling <code>levels(factor(data$group))</code></p> <p><strong>What manipulations should be done with the data to get the result?</strong></p>
How to interpret F- and p-value in ANOVA? { "id": 12398, "title": "How to interpret F- and p-value in ANOVA?" } 12398 34 409259 <p>I am new to statistics and I currently deal with ANOVA. I carry out an ANOVA test in R using</p> <pre><code>aov(dependendVar ~ IndependendVar) </code></pre> <p>I get – among others – an F-value and a p-value. </p> <p>My null hypothesis ($H_0$) is that all group means are equal. </p> <p>There is a lot of information available on <a href="http://onlinestatbook.com/2/analysis_of_variance/one-way.html">how F is calculated</a>, but I don't know how to read an F-statistic and how F and p are connected. </p> <p>So, my questions are:</p> <ol> <li>How do I determine the critical F-value for rejecting $H_0$?</li> <li>Does each F have a corresponding p-value, so they both mean basically the same? (e.g., if $p&lt;0.05$, then $H_0$ is rejected) </li> </ol>
What is the meaning of p values and t values in statistical tests? { "id": 31, "title": "What is the meaning of p values and t values in statistical tests?" } 31 203 407378 <p>After taking a statistics course and then trying to help fellow students, I noticed one subject that inspires much head-desk banging is interpreting the results of statistical hypothesis tests. It seems that students easily learn how to perform the calculations required by a given test but get hung up on interpreting the results. Many computerized tools report test results in terms of "p values" or "t values".</p> <p>How would you explain the following points to college students taking their first course in statistics:</p> <ul> <li><p>What does a "p-value" mean in relation to the hypothesis being tested? Are there cases when one should be looking for a high p-value or a low p-value?</p></li> <li><p>What is the relationship between a p-value and a t-value?</p></li> </ul>
How to choose between Pearson and Spearman correlation? { "id": 8071, "title": "How to choose between Pearson and Spearman correlation?" } 8071 90 401329 <p>How do I know when to choose between Spearman's $\rho$ and Pearson's $r$? My variable includes satisfaction and the scores were interpreted using the sum of the scores. However, these scores could also be ranked. </p>
Making sense of principal component analysis, eigenvectors & eigenvalues { "id": 2691, "title": "Making sense of principal component analysis, eigenvectors & eigenvalues" } 2691 738 375560 <p>In today's pattern recognition class my professor talked about PCA, eigenvectors &amp; eigenvalues. </p> <p>I got the mathematics of it. If I'm asked to find eigenvalues etc. I'll do it correctly like a machine. But I didn't <strong>understand</strong> it. I didn't get the purpose of it. I didn't get the feel of it. I strongly believe in </p> <blockquote> <p>you do not really understand something unless you can explain it to your grandmother -- Albert Einstein</p> </blockquote> <p>Well, I can't explain these concepts to a layman or grandma.</p> <ol> <li>Why PCA, eigenvectors &amp; eigenvalues? What was the <em>need</em> for these concepts?</li> <li>How would you explain these to a layman?</li> </ol>
How to choose the number of hidden layers and nodes in a feedforward neural network? { "id": 181, "title": "How to choose the number of hidden layers and nodes in a feedforward neural network?" } 181 356 358121 <p>Is there a standard and accepted method for selecting the number of layers, and the number of nodes in each layer, in a feed-forward neural network? I'm interested in automated ways of building neural networks.</p>
What is the difference between test set and validation set? { "id": 19048, "title": "What is the difference between test set and validation set?" } 19048 330 353824 <p>I found this confusing when I use the neural network toolbox in Matlab.<br> It divided the raw data set into three parts:</p> <ol> <li>training set</li> <li>validation set</li> <li>test set</li> </ol> <p>I notice in many training or learning algorithm, the data is often divided into 2 parts, the training set and the test set.</p> <p>My questions are:</p> <ol> <li>what is the difference between validation set and test set? </li> <li>Is the validation set really specific to neural network? Or it is optional.</li> <li>To go further, is there a difference between validation and testing in context of machine learning?</li> </ol>
What is the difference between fixed effect, random effect and mixed effect models? { "id": 4700, "title": "What is the difference between fixed effect, random effect and mixed effect models?" } 4700 205 339817 <p>In simple terms, how would you explain (perhaps with simple examples) the difference between fixed effect, random effect and mixed effect models? </p>
Removing duplicated rows data frame in R { "id": 6759, "title": "Removing duplicated rows data frame in R" } 6759 71 306335 <p>How can I remove duplicate rows from this example data frame?</p> <pre><code>A 1 A 1 A 2 B 4 B 1 B 1 C 2 C 2 </code></pre> <p>I would like to remove the duplicates based on both the columns:</p> <pre><code>A 1 A 2 B 4 B 1 C 2 </code></pre> <p>Order is not important.</p>
Examples for teaching: Correlation does not mean causation { "id": 36, "title": "Examples for teaching: Correlation does not mean causation" } 36 68 294041 <p>There is an old saying: "Correlation does not mean causation". When I teach, I tend to use the following standard examples to illustrate this point:</p> <ol> <li>number of storks and birth rate in Denmark;</li> <li>number of priests in America and alcoholism;</li> <li>in the start of the 20th century it was noted that there was a strong correlation between 'Number of radios' and 'Number of people in Insane Asylums'</li> <li>and my favorite: <a href="http://en.wikipedia.org/wiki/File%3aPiratesVsTemp%28en%29.svg">pirates cause global warming</a>.</li> </ol> <p>However, I do not have any references for these examples and whilst amusing, they are obviously false.</p> <p>Does anyone have any other good examples?</p>
What is the difference between a population and a sample? { "id": 269, "title": "What is the difference between a population and a sample?" } 269 31 293737 <p>What is the difference between a population and a sample? What common variables and statistics are used for each one, and how do those relate to each other? </p>
How to interpret and report eta squared / partial eta squared in statistically significant and non-significant analyses? { "id": 15958, "title": "How to interpret and report eta squared / partial eta squared in statistically significant and non-significant analyses?" } 15958 37 285687 <p>I have data that has eta squared values and partial eta squared values calculated as a measure of effect size for group mean differences.</p> <ul> <li><p>What is the difference between eta squared and partial eta squared? Can they both be interpreted using the same Cohen's guidelines (1988 I think: 0.01 = small, 0.06 = medium, 0.13 = large)?</p></li> <li><p>Also, is there use in reporting effect size if the comparison test (ie t-test or one-way ANOVA) is non-significant? In my head, this is like saying "the mean difference did not reach statistical significance but is still of particular note because the effect size indicated from the eta squared is medium". Or, is effect size a replacement value for significance testing, rather than complementary? </p></li> </ul>
In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values? { "id": 298, "title": "In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?" } 298 141 259132 <p>Am I looking for a better behaved distribution for the independent variable in question, or to reduce the effect of outliers, or something else?</p>
Difference between logit and probit models { "id": 20523, "title": "Difference between logit and probit models" } 20523 252 255511 <p>What is the difference between <a href="https://en.wikipedia.org/wiki/Logistic_regression">Logit</a> and <a href="https://en.wikipedia.org/wiki/Probit_model">Probit model</a>?</p> <p>I'm more interested here in knowing when to use logistic regression, and when to use Probit.</p> <p>If there is any literature which defines it using <a href="http://en.wikipedia.org/wiki/R_%28programming_language%29">R</a>, that would be helpful as well.</p>
What's the difference between correlation and simple linear regression? { "id": 2125, "title": "What's the difference between correlation and simple linear regression?" } 2125 84 226389 <p>In particular, I am referring to the Pearson product-moment correlation coefficient.</p>
What is the difference between linear regression and logistic regression? { "id": 29325, "title": "What is the difference between linear regression and logistic regression?" } 29325 100 224744 <p>What is the difference between linear regression and logistic regression?</p> <p>When would you use each?</p>
Is there a minimum sample size required for the t-test to be valid? { "id": 37993, "title": "Is there a minimum sample size required for the t-test to be valid?" } 37993 55 220684 <p>I'm currently working on a quasi-experimental research paper. I only have a sample size of 15 due to low population within the chosen area and that only 15 fit my criteria. Is 15 the minimum sample size to compute for t-test and F-test? If so, where can I get an article or book to support this small sample size?</p> <p>This paper was already defended last Monday and one of the panel asked to have a supporting reference because my sample size is too low. He said it should've been at least 40 respondents. </p>
How to check for normal distribution using Excel for performing a t-test? { "id": 72418, "title": "How to check for normal distribution using Excel for performing a t-test?" } 72418 17 202954 <p>I want to know <strong>how to check a data set for normality in Excel, just to verify that the requirements for using a t-test are being met</strong>. </p> <p>For the right tail, is it appropriate to just calculate a mean and standard deviation, add 1, 2 &amp; 3 standard deviations from the mean to create a range then compare that to the normal 68/95/99.7 for the standard normal distribution after using the norm.dist function in excel to test each standard deviation value.</p> <p>Or is there a better way to test for normality?</p>
What does AUC stand for and what is it? { "id": 132777, "title": "What does AUC stand for and what is it?" } 132777 144 202098 <p>Searched high and low and have not been able to find out what AUC, as in related to prediction, stands for or means.</p>
What is the difference between "likelihood" and "probability"? { "id": 2641, "title": "What is the difference between \"likelihood\" and \"probability\"?" } 2641 367 199176 <p>The <a href="http://en.wikipedia.org/wiki/Likelihood_function">wikipedia page</a> claims that likelihood and probability are distinct concepts.</p> <blockquote> <p>In non-technical parlance,"likelihood" is usually a synonym for "probability," but in statistical usage there is a clear distinction in perspective: the number that is the probability of some observed outcomes given a set of parameter values is regarded as the likelihood of the set of parameter values given the observed outcomes. </p> </blockquote> <p>Can someone give a more down-to-earth description of what this means? In addition, some examples of how "probability" and "likelihood" disagree would be nice.</p>
Narrow confidence interval -- higher accuracy? { "id": 16164, "title": "Narrow confidence interval -- higher accuracy?" } 16164 16 198205 <p>I have two questions about confidence intervals:</p> <p>Apparently a narrow confidence interval implies that there is a smaller chance of obtaining an observation within that interval, therefore, our accuracy is higher.</p> <p>Also a 95% confidence interval is narrower than a 99% confidence interval which is wider.</p> <p>The 99% confidence interval is more accurate than the 95%.</p> <p>Can someone give a simple explanation that could help me understand this difference between accuracy and narrowness?</p>
When conducting multiple regression, when should you center your predictor variables & when should you standardize them? { "id": 29781, "title": "When conducting multiple regression, when should you center your predictor variables & when should you standardize them?" } 29781 219 196461 <p>In some literature, I have read that a regression with multiple explanatory variables, if in different units, needed to be standardized. (Standardizing consists in subtracting the mean and dividing by the standard deviation.) In which other cases do I need to standardize my data? Are there cases in which I should only center my data (i.e., without dividing by standard deviation)? </p>
R - QQPlot: how to see whether data are normally distributed { "id": 52293, "title": "R - QQPlot: how to see whether data are normally distributed" } 52293 39 189120 <p>I have plotted this after I did a Shapiro-Wilk normality test. The test showed that it is likely that the population is normally distributed. However, how to see this "behaviour" on this plot? <img src="https://i.stack.imgur.com/NpI0O.png" alt="enter image description here"></p> <p><strong>UPDATE</strong></p> <p>A simple histogram of the data:</p> <p><img src="https://i.stack.imgur.com/3nOTu.png" alt="enter image description here"></p> <p><strong>UPDATE</strong></p> <p>The Shapiro-Wilk test says:</p> <p><img src="https://i.stack.imgur.com/39ABw.png" alt="enter image description here"></p>
Correlations with unordered categorical variables { "id": 108007, "title": "Correlations with unordered categorical variables" } 108007 108 188621 <p>I have a dataframe with many observations and many variables. Some of them are categorical (unordered) and the others are numerical.</p> <p>I'm looking for associations between these variables. I've been able to compute correlation for numerical variables (Spearman's correlation) but :</p> <ul> <li>I don't know how to measure correlation between unordered categorical variables.</li> <li>I don't know how to measure correlation between unordered categorical variables and numerical variables.</li> </ul> <p>Does anyone know how this could be done? If so, are there R functions implementing these methods?</p>
How do you put values over a simple bar chart in Excel? { "id": 16816, "title": "How do you put values over a simple bar chart in Excel?" } 16816 4 188136 <p>I'd like to put values over a simple bar/column chart in excel. </p> <p>A <a href="https://stats.stackexchange.com/questions/3879/how-to-put-values-over-bars-in-barplot-in-r">similar question was asked for R</a>, and I know how to get my data into R, but not how to make the charts. What I'm doing is very simple seems easier to do in Excel than learning how to do it in R. </p>
How to interpret a QQ plot { "id": 101274, "title": "How to interpret a QQ plot" } 101274 138 183937 <p>I am working with a small dataset (21 observations) and have the following normal QQ plot in R: </p> <p><img src="https://i.stack.imgur.com/kLcP7.jpg" alt="enter image description here"></p> <p>Seeing that the plot does not support normality, what could I infer about the underlying distribution? It seems to me that a distribution more skewed to the right would be a better fit, is that right? Also, what other conclusions can we draw from the data?</p>
How would you explain the difference between correlation and covariance? { "id": 18082, "title": "How would you explain the difference between correlation and covariance?" } 18082 95 183329 <p>Following up on this question, <a href="https://stats.stackexchange.com/questions/18058/how-would-you-explain-covariance-to-someone-who-understands-only-the-mean">How would you explain covariance to someone who understands only the mean?</a>, which addresses the issue of explaining covariance to a lay person, brought up a similar question in my mind.</p> <p>How would one explain to a statistics neophyte the difference between <em>covariance</em> and <em>correlation</em>? It seems that both refer to the change in one variable linked back to another variable.</p> <p>Similar to the referred-to question, a lack of formulae would be preferable.</p>
Statistics Jokes { "id": 1337, "title": "Statistics Jokes" } 1337 143 179187 <p>Well, we've got favourite statistics quotes. What about statistics jokes?</p>
How to 'sum' a standard deviation? { "id": 25848, "title": "How to 'sum' a standard deviation?" } 25848 56 178953 <p>I have a monthly average for a value and a standard deviation corresponding to that average. I am now computing the annual average as the sum of monthly averages, how can I represent the standard deviation for the summed average ?</p> <p>For example considering output from a wind farm:</p> <pre><code>Month MWh StdDev January 927 333 February 1234 250 March 1032 301 April 876 204 May 865 165 June 750 263 July 780 280 August 690 98 September 730 76 October 821 240 November 803 178 December 850 250 </code></pre> <p>We can say that in the average year the wind farm produces 10,358 MWh, but what is the standard deviation corresponding to this figure ?</p>
Rules of thumb for minimum sample size for multiple regression { "id": 10079, "title": "Rules of thumb for minimum sample size for multiple regression" } 10079 65 178538 <p>Within the context of a research proposal in the social sciences, I was asked the following question:</p> <blockquote> <p>I have always gone by 100 + m (where m is the number of predictors) when determining minimum sample size for multiple regression. Is this appropriate?</p> </blockquote> <p>I get similar questions a lot, often with different rules of thumb. I've also read such rules of thumb quite a lot in various textbooks. I sometimes wonder whether popularity of a rule in terms of citations is based on how low the standard is set. However, I'm also aware of the value of good heuristics in simplifying decision making.</p> <h3>Questions:</h3> <ul> <li>What is the utility of simple rules of thumb for minimum sample sizes within the context of applied researchers designing research studies?</li> <li>Would you suggest an alternative rule of thumb for minimum sample size for multiple regression?</li> <li>Alternatively, what alternative strategies would you suggest for determining minimum sample size for multiple regression? In particular, it would be good if value is assigned to the degree to which any strategy can readily be applied by a non-statistician.</li> </ul>
What are the differences between Factor Analysis and Principal Component Analysis? { "id": 1576, "title": "What are the differences between Factor Analysis and Principal Component Analysis?" } 1576 176 175460 <p>It seems that a number of the statistical packages that I use wrap these two concepts together. However, I'm wondering if there are different assumptions or data 'formalities' that must be true to use one over the other. A real example would be incredibly useful. </p>
What is the difference between data mining, statistics, machine learning and AI? { "id": 5026, "title": "What is the difference between data mining, statistics, machine learning and AI?" } 5026 189 170272 <p>What is the difference between data mining, statistics, machine learning and AI?</p> <p>Would it be accurate to say that they are 4 fields attempting to solve very similar problems but with different approaches? What exactly do they have in common and where do they differ? If there is some kind of hierarchy between them, what would it be?</p> <p>Similar questions have been asked previously but I still don't get it:</p> <ul> <li><a href="https://stats.stackexchange.com/questions/1521/data-mining-and-statistical-analysis">Data Mining and Statistical Analysis</a></li> <li><a href="https://stats.stackexchange.com/questions/6/the-two-cultures-statistics-vs-machine-learning">The Two Cultures: statistics vs. machine learning?</a></li> </ul>
When (and why) should you take the log of a distribution (of numbers)? { "id": 18844, "title": "When (and why) should you take the log of a distribution (of numbers)?" } 18844 124 165390 <p>Say I have some historical data e.g., past stock prices, airline ticket price fluctuations, past financial data of the company...</p> <p>Now someone (or some formula) comes along and says "let's take/use the log of the distribution" and here's where I go <em>WHY</em>?</p> <p>Questions:</p> <ol> <li>WHY should one take the log of the distribution in the first place?</li> <li>WHAT does the log of the distribution 'give/simplify' that the original distribution couldn't/didn't?</li> <li>Is the log transformation 'lossless'? I.e., when transforming to log-space and analyzing the data, do the same conclusions hold for the original distribution? How come?</li> <li>And lastly WHEN to take the log of the distribution? Under what conditions does one decide to do this?</li> </ol> <p>I've really wanted to understand log-based distributions (for example lognormal) but I never understood the when/why aspects - i.e., the log of the distribution is a normal distribution, so what? What does that even tell and me and why bother? Hence the question!</p> <p><strong>UPDATE</strong>: As per @whuber's comment I looked at the posts and for some reason I do understand the use of log transforms and their application in linear regression, since you can draw a relation between the independent variable and the log of the dependent variable. However, my question is generic in the sense of analyzing the distribution itself - there is no relation per se that I can conclude to help understand the reason of taking logs to analyze a distribution. I hope I'm making sense :-/</p> <p>In regression analysis you do have constraints on the type/fit/distribution of the data and you can transform it and define a relation between the independent and (not transformed) dependent variable. But when/why would one do that for a distribution in isolation where constraints of type/fit/distribution are not necessarily applicable in a framework (like regression). I hope the clarification makes things more clear than confusing :)</p> <p>This question deserves a clear answer as to "WHY and WHEN"</p>
How to perform a test using R to see if data follows normal distribution { "id": 3136, "title": "How to perform a test using R to see if data follows normal distribution" } 3136 37 163756 <p>I have a data set with following structure:</p> <pre><code>a word | number of occurrence of a word in a document | a document id </code></pre> <p>How can I perform a test for normal distribution in R? Probably it is an easy question but I am a R newbie.</p>
Why square the difference instead of taking the absolute value in standard deviation? { "id": 118, "title": "Why square the difference instead of taking the absolute value in standard deviation?" } 118 339 162320 <p>In the definition of standard deviation, why do we have to <strong>square</strong> the difference from the mean to get the mean (E) and take the <strong>square root back</strong> at the end? Can't we just simply take <strong>the absolute value</strong> of the difference instead and get the expected value (mean) of those, and wouldn't that also show the variation of the data? The number is going to be different from square method (the absolute-value method will be smaller), but it should still show the spread of data. Anybody know why we take this square approach as a standard?</p> <p>The definition of standard deviation:</p> <p>$\sigma = \sqrt{E\left[\left(X - \mu\right)^2\right]}.$ </p> <p>Can't we just take the absolute value instead and still be a good measurement?</p> <p>$\sigma = E\left[|X - \mu|\right]$ </p>
How exactly does one “control for other variables”? { "id": 17336, "title": "How exactly does one “control for other variables”?" } 17336 113 160630 <p>Here is the article that motivated this question: <a href="http://www.washingtonpost.com/blogs/ezra-klein/post/does-impatience-make-us-fat/2011/10/10/gIQA1eMnaL_blog.html">Does impatience make us fat?</a></p> <p>I liked this article, and it nicely demonstrates the concept of “controlling for other variables” (IQ, career, income, age, etc) in order to best isolate the true relationship between just the 2 variables in question. </p> <p>Can you explain to me <strong><em>how</em></strong> you actually control for variables on a typical data set? </p> <p>E.g., if you have 2 people with the same impatience level and BMI, but different incomes, how do you treat these data? Do you categorize them into different subgroups that do have similar income, patience, and BMI? But, eventually there are dozens of variables to control for (IQ, career, income, age, etc) How do you then aggregate these (potentially) 100’s of subgroups? In fact, I have a feeling this approach is barking up the wrong tree, now that I’ve verbalized it.</p> <p>Thanks for shedding any light on something I've meant to get to the bottom of for a few years now...!</p>
Pearson's or Spearman's correlation with non-normal data { "id": 3730, "title": "Pearson's or Spearman's correlation with non-normal data" } 3730 93 160338 <p>I get this question frequently enough in my statistics consulting work, that I thought I'd post it here. I have an answer, which is posted below, but I was keen to hear what others have to say.</p> <p><strong>Question:</strong> If you have two variables that are not normally distributed, should you use Spearman's rho for the correlation?</p>
How to interpret the output of the summary method for an lm object in R? { "id": 59250, "title": "How to interpret the output of the summary method for an lm object in R?" } 59250 33 156910 <p>I am using sample algae data to understand data mining a bit more. I have used the following commands:</p> <pre><code>data(algae) algae &lt;- algae[-manyNAs(algae),] clean.algae &lt;-knnImputation(algae, k = 10) lm.a1 &lt;- lm(a1 ~ ., data = clean.algae[, 1:12]) summary(lm.a1) </code></pre> <p>Subsequently I received the results below. However I can not find any good documentation which explains what most of this means, especially Std. Error,t value and Pr. </p> <p>Can someone please be kind enough to shed some light please? Most importantly, which variables should I look at to ascertain on whether a model is giving me good prediction data?</p> <pre><code>Call: lm(formula = a1 ~ ., data = clean.algae[, 1:12]) Residuals: Min 1Q Median 3Q Max -37.679 -11.893 -2.567 7.410 62.190 Coefficients: Estimate Std. Error t value Pr(&gt;|t|) (Intercept) 42.942055 24.010879 1.788 0.07537 . seasonspring 3.726978 4.137741 0.901 0.36892 seasonsummer 0.747597 4.020711 0.186 0.85270 seasonwinter 3.692955 3.865391 0.955 0.34065 sizemedium 3.263728 3.802051 0.858 0.39179 sizesmall 9.682140 4.179971 2.316 0.02166 * speedlow 3.922084 4.706315 0.833 0.40573 speedmedium 0.246764 3.241874 0.076 0.93941 mxPH -3.589118 2.703528 -1.328 0.18598 mnO2 1.052636 0.705018 1.493 0.13715 Cl -0.040172 0.033661 -1.193 0.23426 NO3 -1.511235 0.551339 -2.741 0.00674 ** NH4 0.001634 0.001003 1.628 0.10516 oPO4 -0.005435 0.039884 -0.136 0.89177 PO4 -0.052241 0.030755 -1.699 0.09109 . Chla -0.088022 0.079998 -1.100 0.27265 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 17.65 on 182 degrees of freedom Multiple R-squared: 0.3731, Adjusted R-squared: 0.3215 F-statistic: 7.223 on 15 and 182 DF, p-value: 2.444e-12 </code></pre>
How do I group a list of numeric values into ranges? { "id": 4341, "title": "How do I group a list of numeric values into ranges?" } 4341 7 156354 <p>I do have a big list of numeric values (including duplicates) and I do want to group them into ranges in order to see if how do they distribute.</p> <p>Let's say there are 1000 values ranging from 0 to 2.000.000 and I do want to group them. </p> <p>How can I achieve this, preferably in Excel or SQL.</p>
How do I evaluate standard deviation? { "id": 23519, "title": "How do I evaluate standard deviation?" } 23519 14 156067 <p>I have collected responses from 85 people on their ability to undertake certain tasks. </p> <p>The responses are on a five point Likert scale:</p> <p>5 = Very Good, 4 = Good, 3 = Average, 2 = Poor, 1 = Very Poor,</p> <p>The mean score is 2.8 and the standard deviation is 0.54.</p> <p>I understand what the mean and standard deviation stand for. </p> <p>My question is: how good (or bad) is this standard deviation? </p> <p>In other words, are there any guidelines that can assist in the evaluation of standard deviation.</p>
What are principal component scores? { "id": 222, "title": "What are principal component scores?" } 222 59 155479 <p>What are principal component scores (PC scores, PCA scores)?</p>
How does the correlation coefficient differ from regression slope? { "id": 32464, "title": "How does the correlation coefficient differ from regression slope?" } 32464 57 155275 <p>I would have expected the correlation coefficient to be the same as a regression slope (beta), however having just compared the two, they are different. How do they differ - what different information do they give?</p>
How to stop excel from changing a range when you drag a formula down? { "id": 7457, "title": "How to stop excel from changing a range when you drag a formula down?" } 7457 8 150576 <p>I'm trying to normalize a set of columns of data in an excel spreadsheet.</p> <p>I need to get the values so that the highest value in a column is = 1 and lowest is = to 0, so I've come up with the formula:</p> <p><code>=(A1-MIN(A1:A30))/(MAX(A1:A30)-MIN(A1:A30))</code></p> <p>This seems to work fine, but when I drag down the formula to populate the cells below it, now only does <code>A1</code> increase, but <code>A1:A30</code> does too.</p> <p>Is there a way to lock the range while updating just the number I'm interested in?</p> <p>I've tried putting the Max and min in a different cell and referencing that but it just references the cell under the one that the Max and min are in and I get divide by zero errors because there is nothing there.</p>
What is the difference between descriptive and inferential statistics? { "id": 71962, "title": "What is the difference between descriptive and inferential statistics?" } 71962 20 148290 <p>My understanding was that descriptive statistics quantitatively described features of a data sample, while inferential statistics made inferences about the populations from which samples were drawn.</p> <p>However, the <a href="http://en.wikipedia.org/wiki/Statistical_inference">wikipedia page for statistical inference</a> states:</p> <blockquote> <p>For the most part, statistical inference makes propositions about populations, using data drawn from the population of interest via some form of random sampling.</p> </blockquote> <p>The "for the most part" has made me think I perhaps don't properly understand these concepts. Are there examples of inferential statistics that don't make propositions about populations?</p>
Bayesian and frequentist reasoning in plain English { "id": 22, "title": "Bayesian and frequentist reasoning in plain English" } 22 282 146152 <p>How would you describe in plain English the characteristics that distinguish Bayesian from Frequentist reasoning?</p>
What is your favorite "data analysis" cartoon? { "id": 423, "title": "What is your favorite \"data analysis\" cartoon?" } 423 310 144082 <p>This is one of my favorites:</p> <p><img src="https://imgs.xkcd.com/comics/correlation.png" alt="alt text"></p> <p>One entry per answer. (This is in the vein of the Stack Overflow question <em><a href="https://stackoverflow.com/questions/84556/whats-your-favorite-programmer-cartoon">What’s your favorite “programmer” cartoon?</a></em>.)</p> <p>P.S. Do not hotlink the cartoon without the site's permission please.</p>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment