tbuckl/top_50_stats_questions_by_views.csv

## top_50_stats_questions_by_views.csv

          
            Title
            Post Link
            Id
            Score
            ViewCount
            Body

            
              What is the difference between discrete data and continuous data?
              {
  "id": 206,
  "title": "What is the difference between discrete data and continuous data?"
}
              206
              47
              635826
              <p>What is the difference between discrete data and continuous data?</p>

            
              How to normalize data to 0-1 range?
              {
  "id": 70801,
  "title": "How to normalize data to 0-1 range?"
}
              70801
              184
              633404
              <p>I am lost in normalizing, could anyone guide me please.</p>
<p>I have a minimum and maximum values, say -23.89 and 7.54990767, respectively.</p>
<p>If I get a value of 5.6878 how can I scale this value on a scale of 0 to 1.</p>

            
              What's the difference between variance and standard deviation?
              {
  "id": 35123,
  "title": "What's the difference between variance and standard deviation?"
}
              35123
              83
              500669
              <p>I was wondering what the difference between the variance and the standard deviation is. </p>
<p>If you calculate the two values, it is clear that you get the standard deviation out of the variance, but what does that mean in terms of the distribution you are observing?</p>
<p>Furthermore, why do you really need a standard deviation?</p>

            
              How do I get the number of rows of a data.frame in R?
              {
  "id": 5253,
  "title": "How do I get the number of rows of a data.frame in R?"
}
              5253
              126
              487003
              <p>After reading a dataset:</p>
<pre><code>dataset &lt;- read.csv("forR.csv")
</code></pre>
<ul>
<li>How can I get R to give me the number of cases it contains?</li>
<li>Also, will the returned value include of exclude cases omitted with <code>na.omit(dataset)</code>?</li>
</ul>

            
              How to summarize data by group in R?
              {
  "id": 8225,
  "title": "How to summarize data by group in R?"
}
              8225
              163
              473087
              <p>I have R data frame like this:</p>
<pre><code>        age group
1   23.0883     1
2   25.8344     1
3   29.4648     1
4   32.7858     2
5   33.6372     1
6   34.9350     1
7   35.2115     2
8   35.2115     2
9   35.2115     2
10  36.7803     1
...
</code></pre>
<p>I need to get data frame in the following form:</p>
<pre><code>group mean     sd
1     34.5     5.6
2     32.3     4.2
...
</code></pre>
<p>Group number may vary, but their names and quantity could be obtained by calling <code>levels(factor(data$group))</code></p>
<p><strong>What manipulations should be done with the data to get the result?</strong></p>

            
              How to interpret F- and p-value in ANOVA?
              {
  "id": 12398,
  "title": "How to interpret F- and p-value in ANOVA?"
}
              12398
              34
              409259
              <p>I am new to statistics and I currently deal with ANOVA. I carry out an ANOVA test in R using</p>
<pre><code>aov(dependendVar ~ IndependendVar)
</code></pre>
<p>I get – among others – an F-value and a p-value. </p>
<p>My null hypothesis ($H_0$) is that all group means are equal. </p>
<p>There is a lot of information available on <a href="http://onlinestatbook.com/2/analysis_of_variance/one-way.html">how F is calculated</a>, but I don't know how to read an F-statistic and how F and p are connected. </p>
<p>So, my questions are:</p>
<ol>
<li>How do I determine the critical F-value for rejecting $H_0$?</li>
<li>Does each F have a corresponding p-value, so they both mean basically the same? (e.g., if $p&lt;0.05$, then $H_0$ is rejected) </li>
</ol>

            
              What is the meaning of p values and t values in statistical tests?
              {
  "id": 31,
  "title": "What is the meaning of p values and t values in statistical tests?"
}
              31
              203
              407378
              <p>After taking a statistics course and then trying to help fellow students, I noticed one subject that inspires much head-desk banging is interpreting the results of statistical hypothesis tests.  It seems that students easily learn how to perform the calculations required by a given test but get hung up on interpreting the results.  Many computerized tools report test results in terms of "p values" or "t values".</p>
<p>How would you explain the following points to college students taking their first course in statistics:</p>
<ul>
<li><p>What does a "p-value" mean in relation to the hypothesis being tested?  Are there cases when one should be looking for a high p-value or a low p-value?</p></li>
<li><p>What is the relationship between a p-value and a t-value?</p></li>
</ul>

            
              How to choose between Pearson and Spearman correlation?
              {
  "id": 8071,
  "title": "How to choose between Pearson and Spearman correlation?"
}
              8071
              90
              401329
              <p>How do I know when to choose between Spearman's $\rho$ and Pearson's $r$? My variable includes satisfaction and the scores were interpreted using the sum of the scores. However, these scores could also be ranked. </p>

            
              Making sense of principal component analysis, eigenvectors & eigenvalues
              {
  "id": 2691,
  "title": "Making sense of principal component analysis, eigenvectors & eigenvalues"
}
              2691
              738
              375560
              <p>In today's pattern recognition class my professor talked about PCA, eigenvectors &amp; eigenvalues. </p>
<p>I got the mathematics of it. If I'm asked to find eigenvalues etc. I'll do it correctly like a machine. But I didn't <strong>understand</strong> it. I didn't get the purpose of it. I didn't get the feel of it.  I strongly believe in </p>
<blockquote>
  <p>you do not really understand something unless you can explain it to your grandmother -- Albert Einstein</p>
</blockquote>
<p>Well, I can't explain these concepts to a layman or grandma.</p>
<ol>
<li>Why PCA, eigenvectors &amp; eigenvalues? What was the <em>need</em> for these concepts?</li>
<li>How would you explain these to a layman?</li>
</ol>

            
              How to choose the number of hidden layers and nodes in a feedforward neural network?
              {
  "id": 181,
  "title": "How to choose the number of hidden layers and nodes in a feedforward neural network?"
}
              181
              356
              358121
              <p>Is there a standard and accepted method for selecting the number of layers, and the number of nodes in each layer, in a feed-forward neural network? I'm interested in automated ways of building neural networks.</p>

            
              What is the difference between test set and validation set?
              {
  "id": 19048,
  "title": "What is the difference between test set and validation set?"
}
              19048
              330
              353824
              <p>I found this confusing when I use the neural network toolbox in Matlab.<br>
It divided the raw data set into three parts:</p>
<ol>
<li>training set</li>
<li>validation set</li>
<li>test set</li>
</ol>
<p>I notice in many training or learning algorithm, the data is often divided into 2 parts, the training set and the test set.</p>
<p>My questions are:</p>
<ol>
<li>what is the difference between validation set and test set? </li>
<li>Is the validation set really specific to neural network? Or it is optional.</li>
<li>To go further, is there a difference between validation and testing in context of machine learning?</li>
</ol>

            
              What is the difference between fixed effect, random effect and mixed effect models?
              {
  "id": 4700,
  "title": "What is the difference between fixed effect, random effect and mixed effect models?"
}
              4700
              205
              339817
              <p>In simple terms, how would you explain (perhaps with simple examples) the difference between fixed effect, random effect and mixed effect models? </p>

            
              Removing duplicated rows data frame in R
              {
  "id": 6759,
  "title": "Removing duplicated rows data frame in R"
}
              6759
              71
              306335
              <p>How can I remove duplicate rows from this example data frame?</p>
<pre><code>A   1
A   1
A   2
B   4  
B   1
B   1
C   2
C   2
</code></pre>
<p>I would like to remove the duplicates based on both the columns:</p>
<pre><code>A   1
A   2
B   4
B   1
C   2
</code></pre>
<p>Order is not important.</p>

            
              Examples for teaching: Correlation does not mean causation
              {
  "id": 36,
  "title": "Examples for teaching: Correlation does not mean causation"
}
              36
              68
              294041
              <p>There is an old saying: "Correlation does not mean causation". When I teach, I tend to use the following standard examples to illustrate this point:</p>
<ol>
<li>number of storks and birth rate in Denmark;</li>
<li>number of priests in America and alcoholism;</li>
<li>in the start of the 20th century it was noted that there was a strong correlation between 'Number of radios' and 'Number of people in Insane Asylums'</li>
<li>and my favorite: <a href="http://en.wikipedia.org/wiki/File%3aPiratesVsTemp%28en%29.svg">pirates cause global warming</a>.</li>
</ol>
<p>However, I do not have any references for these examples and whilst amusing, they are obviously false.</p>
<p>Does anyone have any other good examples?</p>

            
              What is the difference between a population and a sample?
              {
  "id": 269,
  "title": "What is the difference between a population and a sample?"
}
              269
              31
              293737
              <p>What is the difference between a population and a sample? What common variables and statistics are used for each one, and how do those relate to each other? </p>

            
              How to interpret and report  eta squared / partial eta squared in statistically significant and non-significant analyses?
              {
  "id": 15958,
  "title": "How to interpret and report  eta squared / partial eta squared in statistically significant and non-significant analyses?"
}
              15958
              37
              285687
              <p>I have data that has eta squared values and partial eta squared values calculated as a measure of effect size for group mean differences.</p>
<ul>
<li><p>What is the difference between eta squared and partial eta squared? Can they both be interpreted using the same Cohen's guidelines (1988 I think: 0.01 = small, 0.06 = medium, 0.13 = large)?</p></li>
<li><p>Also, is there use in reporting effect size if the comparison test (ie t-test or one-way ANOVA) is non-significant? In my head, this is like saying "the mean difference did not reach statistical significance but is still of particular note because the effect size indicated from the eta squared is medium". Or, is effect size a replacement value for significance testing, rather than complementary? </p></li>
</ul>

            
              In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?
              {
  "id": 298,
  "title": "In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?"
}
              298
              141
              259132
              <p>Am I looking for a better behaved distribution for the independent variable in question, or to reduce the effect of outliers, or something else?</p>

            
              Difference between logit and probit models
              {
  "id": 20523,
  "title": "Difference between logit and probit models"
}
              20523
              252
              255511
              <p>What is the difference between <a href="https://en.wikipedia.org/wiki/Logistic_regression">Logit</a> and <a href="https://en.wikipedia.org/wiki/Probit_model">Probit model</a>?</p>
<p>I'm more interested here in knowing when to use logistic regression, and when to use Probit.</p>
<p>If there is any literature which defines it using <a href="http://en.wikipedia.org/wiki/R_%28programming_language%29">R</a>, that would be helpful as well.</p>

            
              What's the difference between correlation and simple linear regression?
              {
  "id": 2125,
  "title": "What's the difference between correlation and simple linear regression?"
}
              2125
              84
              226389
              <p>In particular, I am referring to the Pearson product-moment correlation coefficient.</p>

            
              What is the difference between linear regression and logistic regression?
              {
  "id": 29325,
  "title": "What is the difference between linear regression and logistic regression?"
}
              29325
              100
              224744
              <p>What is the difference between linear regression and logistic regression?</p>
<p>When would you use each?</p>

            
              Is there a minimum sample size required for the t-test to be valid?
              {
  "id": 37993,
  "title": "Is there a minimum sample size required for the t-test to be valid?"
}
              37993
              55
              220684
              <p>I'm currently working on a quasi-experimental research paper. I only have a sample size of 15 due to low population within the chosen area and that only 15 fit my criteria. Is 15 the minimum sample size to compute for t-test and F-test? If so, where can I get an article or book to support this small sample size?</p>
<p>This paper was already defended last Monday and one of the panel asked to have a supporting reference because my sample size is too low.   He said it should've been at least 40 respondents.  </p>

            
              How to check for normal distribution using Excel for performing a t-test?
              {
  "id": 72418,
  "title": "How to check for normal distribution using Excel for performing a t-test?"
}
              72418
              17
              202954
              <p>I want to know <strong>how to check a data set for normality in Excel, just to verify that the requirements for using a t-test are being met</strong>.  </p>
<p>For the right tail, is it appropriate to just calculate a mean and standard deviation, add 1, 2 &amp; 3 standard deviations from the mean to create a range then compare that to the normal 68/95/99.7 for the standard normal distribution after using the norm.dist function in excel to test each standard deviation value.</p>
<p>Or is there a better way to test for normality?</p>

            
              What does AUC stand for and what is it?
              {
  "id": 132777,
  "title": "What does AUC stand for and what is it?"
}
              132777
              144
              202098
              <p>Searched high and low and have not been able to find out what AUC, as in related to prediction, stands for or means.</p>

            
              What is the difference between "likelihood" and "probability"?
              {
  "id": 2641,
  "title": "What is the difference between \"likelihood\" and \"probability\"?"
}
              2641
              367
              199176
              <p>The <a href="http://en.wikipedia.org/wiki/Likelihood_function">wikipedia page</a> claims that likelihood and probability are distinct concepts.</p>
<blockquote>
  <p>In non-technical parlance,"likelihood" is usually a synonym for "probability," but in statistical usage there is a clear distinction in perspective: the number that is the probability of some observed outcomes given a set of parameter values is regarded as the likelihood of the set of parameter values given the observed outcomes. </p>
</blockquote>
<p>Can someone give a more down-to-earth description of what this means?  In addition, some examples of how "probability" and "likelihood" disagree would be nice.</p>

            
              Narrow confidence interval -- higher accuracy?
              {
  "id": 16164,
  "title": "Narrow confidence interval -- higher accuracy?"
}
              16164
              16
              198205
              <p>I have two questions about confidence intervals:</p>
<p>Apparently a narrow confidence interval implies that there is a smaller chance of obtaining an observation within that interval, therefore, our accuracy is higher.</p>
<p>Also a 95% confidence interval is narrower than a 99% confidence interval which is wider.</p>
<p>The 99% confidence interval is more accurate than the 95%.</p>
<p>Can someone give a simple explanation that could help me understand this difference between accuracy and narrowness?</p>

            
              When conducting multiple regression, when should you center your predictor variables & when should you standardize them?
              {
  "id": 29781,
  "title": "When conducting multiple regression, when should you center your predictor variables & when should you standardize them?"
}
              29781
              219
              196461
              <p>In some literature, I have read that a regression with multiple explanatory variables, if in different units, needed to be standardized.  (Standardizing consists in subtracting the mean and dividing by the standard deviation.)  In which other cases do I need to standardize my data?  Are there cases in which I should only center my data (i.e., without dividing by standard deviation)? </p>

            
              R - QQPlot: how to see whether data are normally distributed
              {
  "id": 52293,
  "title": "R - QQPlot: how to see whether data are normally distributed"
}
              52293
              39
              189120
              <p>I have plotted this after I did a Shapiro-Wilk normality test. The test showed that it is likely that the population is normally distributed. However, how to see this "behaviour" on this plot? <img src="https://i.stack.imgur.com/NpI0O.png" alt="enter image description here"></p>
<p><strong>UPDATE</strong></p>
<p>A simple histogram of the data:</p>
<p><img src="https://i.stack.imgur.com/3nOTu.png" alt="enter image description here"></p>
<p><strong>UPDATE</strong></p>
<p>The Shapiro-Wilk test says:</p>
<p><img src="https://i.stack.imgur.com/39ABw.png" alt="enter image description here"></p>

            
              Correlations with unordered categorical variables
              {
  "id": 108007,
  "title": "Correlations with unordered categorical variables"
}
              108007
              108
              188621
              <p>I have a dataframe with many observations and many variables. Some of them are categorical (unordered) and the others are numerical.</p>
<p>I'm looking for associations between these variables. I've been able to compute correlation for numerical variables (Spearman's correlation) but :</p>
<ul>
<li>I don't know how to measure correlation between unordered categorical variables.</li>
<li>I don't know how to measure correlation between unordered categorical variables and numerical variables.</li>
</ul>
<p>Does anyone know how this could be done? If so, are there R functions implementing these methods?</p>

            
              How do you put values over a simple bar chart in Excel?
              {
  "id": 16816,
  "title": "How do you put values over a simple bar chart in Excel?"
}
              16816
              4
              188136
              <p>I'd like to put values over a simple bar/column chart in excel. </p>
<p>A <a href="https://stats.stackexchange.com/questions/3879/how-to-put-values-over-bars-in-barplot-in-r">similar question was asked for R</a>, and I know how to get my data into R, but not how to make the charts. What I'm doing is very simple seems easier to do in Excel than learning how to do it in R. </p>

            
              How to interpret a QQ plot
              {
  "id": 101274,
  "title": "How to interpret a QQ plot"
}
              101274
              138
              183937
              <p>I am working with a small dataset (21 observations) and have the following normal QQ plot in R: </p>
<p><img src="https://i.stack.imgur.com/kLcP7.jpg" alt="enter image description here"></p>
<p>Seeing that the plot does not support normality, what could I infer about the underlying distribution? It seems to me that a distribution more skewed to the right would be a better fit, is that right? Also, what other conclusions can we draw from the data?</p>

            
              How would you explain the difference between correlation and covariance?
              {
  "id": 18082,
  "title": "How would you explain the difference between correlation and covariance?"
}
              18082
              95
              183329
              <p>Following up on this question, <a href="https://stats.stackexchange.com/questions/18058/how-would-you-explain-covariance-to-someone-who-understands-only-the-mean">How would you explain covariance to someone who understands only the mean?</a>, which addresses the issue of explaining covariance to a lay person, brought up a similar question in my mind.</p>
<p>How would one explain to a statistics neophyte the difference between <em>covariance</em> and <em>correlation</em>? It seems that both refer to the change in one variable linked back to another variable.</p>
<p>Similar to the referred-to question, a lack of formulae would be preferable.</p>

            
              Statistics Jokes
              {
  "id": 1337,
  "title": "Statistics Jokes"
}
              1337
              143
              179187
              <p>Well, we've got favourite statistics quotes. What about statistics jokes?</p>

            
              How to 'sum' a standard deviation?
              {
  "id": 25848,
  "title": "How to 'sum' a standard deviation?"
}
              25848
              56
              178953
              <p>I have a monthly average for a value and a standard deviation corresponding to that average. I am now computing the annual average as the sum of monthly averages, how can I represent the standard deviation for the summed average ?</p>
<p>For example considering output from a wind farm:</p>
<pre><code>Month        MWh     StdDev
January      927     333 
February     1234    250
March        1032    301
April        876     204
May          865     165
June         750     263
July         780     280
August       690     98
September    730     76
October      821     240
November     803     178
December     850     250
</code></pre>
<p>We can say that in the average year the wind farm produces 10,358 MWh, but what is the standard deviation corresponding to this figure ?</p>

            
              Rules of thumb for minimum sample size for multiple regression
              {
  "id": 10079,
  "title": "Rules of thumb for minimum sample size for multiple regression"
}
              10079
              65
              178538
              <p>Within the context of a research proposal in the social sciences, I was asked the following question:</p>
<blockquote>
  <p>I have always gone by 100 + m (where m
  is the number of predictors) when
  determining minimum sample size for
  multiple regression. Is this
  appropriate?</p>
</blockquote>
<p>I get similar questions a lot, often with different rules of thumb.
I've also read such rules of thumb quite a lot in various textbooks.
I sometimes wonder whether popularity of a rule in terms of citations is based on how low the standard is set.
However, I'm also aware of the value of good heuristics in simplifying decision making.</p>
<h3>Questions:</h3>
<ul>
<li>What is the utility of simple rules of thumb for minimum sample sizes within the context of applied researchers designing research studies?</li>
<li>Would you suggest an alternative rule of thumb for minimum sample size for multiple regression?</li>
<li>Alternatively, what alternative strategies would you suggest for determining minimum sample size for multiple regression? In particular, it would be good if value is assigned to the degree to which any strategy can readily be applied by a non-statistician.</li>
</ul>

            
              What are the differences between Factor Analysis and Principal Component Analysis?
              {
  "id": 1576,
  "title": "What are the differences between Factor Analysis and Principal Component Analysis?"
}
              1576
              176
              175460
              <p>It seems that a number of the statistical packages that I use wrap these two concepts together. However, I'm wondering if there are different assumptions or data 'formalities' that must be true to use one over the other. A real example would be incredibly useful. </p>

            
              What is the difference between data mining, statistics, machine learning and AI?
              {
  "id": 5026,
  "title": "What is the difference between data mining, statistics, machine learning and AI?"
}
              5026
              189
              170272
              <p>What is the difference between data mining, statistics, machine learning and AI?</p>
<p>Would it be accurate to say that they are 4 fields attempting to solve very similar problems but with different approaches? What exactly do they have in common and where do they differ? If there is some kind of hierarchy between them, what would it be?</p>
<p>Similar questions have been asked previously but I still don't get it:</p>
<ul>
<li><a href="https://stats.stackexchange.com/questions/1521/data-mining-and-statistical-analysis">Data Mining and Statistical Analysis</a></li>
<li><a href="https://stats.stackexchange.com/questions/6/the-two-cultures-statistics-vs-machine-learning">The Two Cultures: statistics vs. machine learning?</a></li>
</ul>

            
              When (and why) should you take the log of a distribution (of numbers)?
              {
  "id": 18844,
  "title": "When (and why) should you take the log of a distribution (of numbers)?"
}
              18844
              124
              165390
              <p>Say I have some historical data e.g., past stock prices, airline ticket price fluctuations, past financial data of the company...</p>
<p>Now someone (or some formula) comes along and says "let's take/use the log of the distribution" and here's where I go <em>WHY</em>?</p>
<p>Questions:</p>
<ol>
<li>WHY should one take the log of the distribution in the first place?</li>
<li>WHAT does the log of the distribution 'give/simplify' that the original distribution couldn't/didn't?</li>
<li>Is the log transformation 'lossless'? I.e., when transforming to log-space and analyzing the data, do the same conclusions hold for the original distribution? How come?</li>
<li>And lastly WHEN to take the log of the distribution? Under what conditions does one decide to do this?</li>
</ol>
<p>I've really wanted to understand log-based distributions (for example lognormal) but I never understood the when/why aspects - i.e., the log of the distribution is a normal distribution, so what? What does that even tell and me and why bother? Hence the question!</p>
<p><strong>UPDATE</strong>: As per @whuber's comment I looked at the posts and for some reason I do understand the use of log transforms and their application in linear regression, since you can draw a relation between the independent variable and the log of the dependent variable. However, my question is generic in the sense of analyzing the distribution itself - there is no relation per se that I can conclude to help understand the reason of taking logs to analyze a distribution. I hope I'm making sense :-/</p>
<p>In regression analysis you do have constraints on the type/fit/distribution of the data and you can transform it and define a relation between the independent and (not transformed) dependent variable. But when/why would one do that for a distribution in isolation where constraints of type/fit/distribution are not necessarily applicable in a framework (like regression). I hope the clarification makes things more clear than confusing :)</p>
<p>This question deserves a clear answer as to "WHY and WHEN"</p>

            
              How to perform a test using R to see if data follows normal distribution
              {
  "id": 3136,
  "title": "How to perform a test using R to see if data follows normal distribution"
}
              3136
              37
              163756
              <p>I have a data set with following structure:</p>
<pre><code>a word | number of occurrence of a word in a document | a document id 
</code></pre>
<p>How can I perform a test for normal distribution in R? Probably it is an easy question but I am a R newbie.</p>

            
              Why square the difference instead of taking the absolute value in standard deviation?
              {
  "id": 118,
  "title": "Why square the difference instead of taking the absolute value in standard deviation?"
}
              118
              339
              162320
              <p>In the definition of standard deviation, why do we have to <strong>square</strong> the difference from the mean to get the mean (E) and take the <strong>square root back</strong> at the end? Can't we just simply take <strong>the absolute value</strong> of the difference instead and get the expected value (mean) of those, and wouldn't that also show the variation of the data? The number is going to be different from square method (the absolute-value method will be smaller), but it should still show the spread of data. Anybody know why we take this square approach as a standard?</p>
<p>The definition of standard deviation:</p>
<p>$\sigma = \sqrt{E\left[\left(X - \mu\right)^2\right]}.$
</p>
<p>Can't we just take the absolute value instead and still be a good measurement?</p>
<p>$\sigma = E\left[|X - \mu|\right]$
</p>

            
              How exactly does one “control for other variables”?
              {
  "id": 17336,
  "title": "How exactly does one “control for other variables”?"
}
              17336
              113
              160630
              <p>Here is the article that motivated this question: <a href="http://www.washingtonpost.com/blogs/ezra-klein/post/does-impatience-make-us-fat/2011/10/10/gIQA1eMnaL_blog.html">Does impatience make us fat?</a></p>
<p>I liked this article, and it nicely demonstrates the concept of “controlling for other variables” (IQ, career, income, age, etc) in order to best isolate the true relationship between just the 2 variables in question.  </p>
<p>Can you explain to me <strong><em>how</em></strong> you actually control for variables on a typical data set?    </p>
<p>E.g., if you have 2 people with the same impatience level and BMI, but different incomes, how do you treat these data?  Do you categorize them into different subgroups that do have similar income, patience, and BMI?  But, eventually there are dozens of variables to control for (IQ, career, income, age, etc)  How do you then aggregate these (potentially) 100’s of subgroups?  In fact, I have a feeling this approach is barking up the wrong tree, now that I’ve verbalized it.</p>
<p>Thanks for shedding any light on something I've meant to get to the bottom of for a few years now...!</p>

            
              Pearson's or Spearman's correlation with non-normal data
              {
  "id": 3730,
  "title": "Pearson's or Spearman's correlation with non-normal data"
}
              3730
              93
              160338
              <p>I get this question frequently enough in my statistics consulting work, that I thought I'd post it here. I have an answer, which is posted below, but I was keen to hear what others have to say.</p>
<p><strong>Question:</strong> If you have two variables that are not normally distributed, should you use Spearman's rho for the correlation?</p>

            
              How to interpret the output of the summary method for an lm object in R?
              {
  "id": 59250,
  "title": "How to interpret the output of the summary method for an lm object in R?"
}
              59250
              33
              156910
              <p>I am using sample algae data to understand data mining a bit more. I have used the following commands:</p>
<pre><code>data(algae)
algae &lt;- algae[-manyNAs(algae),]
clean.algae &lt;-knnImputation(algae, k = 10)
lm.a1 &lt;- lm(a1 ~ ., data = clean.algae[, 1:12])
summary(lm.a1)
</code></pre>
<p>Subsequently I received the results below. However I can not find any good documentation which explains what most of this means, especially Std. Error,t value and Pr. </p>
<p>Can someone please be kind enough to shed some light please? Most importantly, which variables should I look at to ascertain on whether a model is giving me good prediction data?</p>
<pre><code>Call:
lm(formula = a1 ~ ., data = clean.algae[, 1:12])
Residuals:
  Min      1Q  Median      3Q     Max 
  -37.679 -11.893  -2.567   7.410  62.190 
  Coefficients:
                Estimate Std. Error t value Pr(&gt;|t|)   
  (Intercept)  42.942055  24.010879   1.788  0.07537 . 
  seasonspring  3.726978   4.137741   0.901  0.36892   
  seasonsummer  0.747597   4.020711   0.186  0.85270   
  seasonwinter  3.692955   3.865391   0.955  0.34065   
  sizemedium    3.263728   3.802051   0.858  0.39179   
  sizesmall     9.682140   4.179971   2.316  0.02166 * 
  speedlow      3.922084   4.706315   0.833  0.40573   
  speedmedium   0.246764   3.241874   0.076  0.93941   
  mxPH         -3.589118   2.703528  -1.328  0.18598   
  mnO2          1.052636   0.705018   1.493  0.13715   
  Cl           -0.040172   0.033661  -1.193  0.23426   
  NO3          -1.511235   0.551339  -2.741  0.00674 **
  NH4           0.001634   0.001003   1.628  0.10516   
  oPO4         -0.005435   0.039884  -0.136  0.89177   
  PO4          -0.052241   0.030755  -1.699  0.09109 . 
  Chla         -0.088022   0.079998  -1.100  0.27265   
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  Residual standard error: 17.65 on 182 degrees of freedom
  Multiple R-squared:  0.3731,    Adjusted R-squared:  0.3215 
  F-statistic: 7.223 on 15 and 182 DF,  p-value: 2.444e-12
</code></pre>

            
              How do I group a list of numeric values into ranges?
              {
  "id": 4341,
  "title": "How do I group a list of numeric values into ranges?"
}
              4341
              7
              156354
              <p>I do have a big list of numeric values (including duplicates) and I do want to group them into  ranges in order to see if how do they distribute.</p>
<p>Let's say there are 1000 values ranging from 0 to 2.000.000 and I do want to group them. </p>
<p>How can I achieve this, preferably in Excel or SQL.</p>

            
              How do I evaluate standard deviation?
              {
  "id": 23519,
  "title": "How do I evaluate standard deviation?"
}
              23519
              14
              156067
              <p>I have collected responses from 85 people on their ability to undertake certain tasks. </p>
<p>The responses are on a five point Likert scale:</p>
<p>5 = Very Good,
4 = Good,
3 = Average,
2 = Poor,
1 = Very Poor,</p>
<p>The mean score is 2.8 and the standard deviation is 0.54.</p>
<p>I understand what the mean and standard deviation stand for. </p>
<p>My question is: how good (or bad) is this standard deviation? </p>
<p>In other words, are there any guidelines that can assist in the evaluation of standard deviation.</p>

            
              What are principal component scores?
              {
  "id": 222,
  "title": "What are principal component scores?"
}
              222
              59
              155479
              <p>What are principal component scores (PC scores, PCA scores)?</p>

            
              How does the correlation coefficient differ from regression slope?
              {
  "id": 32464,
  "title": "How does the correlation coefficient differ from regression slope?"
}
              32464
              57
              155275
              <p>I would have expected the correlation coefficient to be the same as a regression slope (beta), however having just compared the two, they are different. How do they differ - what different information do they give?</p>

            
              How to stop excel from changing a range when you drag a formula down?
              {
  "id": 7457,
  "title": "How to stop excel from changing a range when you drag a formula down?"
}
              7457
              8
              150576
              <p>I'm trying to normalize a set of columns of data in an excel spreadsheet.</p>
<p>I need to get the values so that the highest value in a column is = 1 and lowest is = to 0, so I've come up with the formula:</p>
<p><code>=(A1-MIN(A1:A30))/(MAX(A1:A30)-MIN(A1:A30))</code></p>
<p>This seems to work fine, but when I drag down the formula to populate the cells below it, now only does <code>A1</code> increase, but <code>A1:A30</code> does too.</p>
<p>Is there a way to lock the range while updating just the number I'm interested in?</p>
<p>I've tried putting the Max and min in a different cell and referencing that but it just references the cell under the one that the Max and min are in and I get divide by zero errors because there is nothing there.</p>

            
              What is the difference between descriptive and inferential statistics?
              {
  "id": 71962,
  "title": "What is the difference between descriptive and inferential statistics?"
}
              71962
              20
              148290
              <p>My understanding was that descriptive statistics quantitatively described features of a data sample, while inferential statistics made inferences about the populations from which samples were drawn.</p>
<p>However, the <a href="http://en.wikipedia.org/wiki/Statistical_inference">wikipedia page for statistical inference</a> states:</p>
<blockquote>
  <p>For the most part, statistical inference makes propositions about
  populations, using data drawn from the population of interest via some
  form of random sampling.</p>
</blockquote>
<p>The "for the most part" has made me think I perhaps don't properly understand these concepts. Are there examples of inferential statistics that don't make propositions about populations?</p>

            
              Bayesian and frequentist reasoning in plain English
              {
  "id": 22,
  "title": "Bayesian and frequentist reasoning in plain English"
}
              22
              282
              146152
              <p>How would you describe in plain English the characteristics that distinguish Bayesian from Frequentist reasoning?</p>

            
              What is your favorite "data analysis" cartoon?
              {
  "id": 423,
  "title": "What is your favorite \"data analysis\" cartoon?"
}
              423
              310
              144082
              <p>This is one of my favorites:</p>
<p><img src="https://imgs.xkcd.com/comics/correlation.png" alt="alt text"></p>
<p>One entry per answer. (This is in the vein of the Stack Overflow question <em><a href="https://stackoverflow.com/questions/84556/whats-your-favorite-programmer-cartoon">What’s your favorite “programmer” cartoon?</a></em>.)</p>
<p>P.S. Do not hotlink the cartoon without the site's permission please.</p>