This gist is designed to provide a brief overview of how Stata is configured, as well as some basic commands.
A concise way to describe Stata would be that Stata is to R as a multitool is to a team of contractors. Stata has been designed with a different purpose in mind than other pythonic or R libraries.
Its strengths include:
- Users can use either a type commands with a command-line interface (CLI), or point and click with a graphical user interface (GUI) to perform the same command
- Highly customizable charts and graphs can be generated with a single mouse click
- Although admittedly a bit quirky at first glance, Stata is fairly easy to pick up
- Documentation is surprisingly informative and readily available via Google searches and within the App
Its weaknesses include:
- It is possible to design pipelines and workflows within Stata, but this process is neither straightforward nor simple
- You can only view/load/manipulate one dataset at a time, but you can create subsets from the initial dataset
- Exporting analyzed data (but not created charts and graphs) can be tricky, and occasionally relies on copy-paste
To put it another way, Stata is perfect tool at the end of a pipeline or workflow, but decidedly less ideal at the initial or intermdiate step of said workflow.
The general template for work in Stata is:
- Load dataset into memory
- Get statistics on column/row/whatever within the dataset
- Create groupings for the dataset
- Perform statistical analyses on the dataset
- Visualize the dataset (or the analyses from the data) with charts and figures
- (Optional) Write a subset of the dataset to disk using defined parameters
- Return to Step 1 for a different dataset, or return to Step 1 for the dataset that you created in Step 6, or return to Step 2 for additional analyses on the current dataset
Sources: https://youtu.be/c5btifh3EPE
https://www.stata.com/support/ssc-installation/
https://www.stata.com/help.cgi?summarize
Usually this happens with the error, 'command unrecognized'. Packages can be downloaded and installed manually with caution, or via the internet with a similar level of caution.
-
Use Boston College Statistical Software Components (SSC) repository
. ssc install PACKAGE
-
Install from the internet
net install PACKAGE
-
Install vai GUI
search for the package name under help, and then click to install the package
Select 'Save' under the file menu to save your current dataset or subset of the dataset that you've created
The commands below have been run in order, specifically all reference the dataset from stats.idre.ucal.edu
-
Load dataset using the
use
command, and clear the current contents of memory using theclear
commanduse https://stats.idre.ucla.edu:/stat/stata/notes/hsb1, clear
-
Get summary statistics from column names (read, write, math, science, socst in this example) with the
summarize
commandsummarize read write math science socst
-
Get n, mean, SD, and quartiles for the dataset
univar read write math science socst
-
Show statistics for missing values for the dataset or for specified columns using
mdesc
ortabmiss
. In either case, if no arguments are provided, statistics for the entire dataset will be providedmdesc read write tabmiss read write
-
Load the dataset
use https://stats.idre.ucla.edu:/stat/stata/notes/hsb1, clear
-
Create a group of the read, write, and math columns
egen important = group(read write math), label
-
(Optional) Edit the group important
edit important
-
Within group important, get a summary of the mean, SD, min, max and sort in order of math,science,read
by important, sort: summarize math science read
-
Generate an ANOVA comparing the scores to the science score
anova science important
-
Generate a boxplot, using values from the 'id' column to identify outliers
graph box science, over(important) mark(1,mlabel(id))
This is a bit wonky. First, the syntax:
. estout using "PATH/TO/OUTPUT/FILE", replace cells(list,of,cells,you,want,in,CSV,file)
-
The directories in the path to the output file must exist, and the file itself will be overwritten.
-
If you provide no arguments for
replace cells()
, then it will export an average of all values. -
Note that Stata creates new columns for values that you generate, so you could create a summary statistics column called 'sum_stats' and then list that as the cell to export to the CSV file.
-
Concrete example that will create the file 'examplecsv.csv' with column names math,read,write:
. estout using "C:\Users\jrcaskey\Documents\examplecsv.csv", replace cells(math,read,write)
-
When in doubt, create a table, select the table with the mouse, right-click, select 'copy as table', then paste it into an Excel Spreadsheet.