clintoncampbell/INFO 340 - Lab 1.md

## INFO 340 - Lab 1.md

      
    Raw
  

              INFO 340 - Lab 1.md
            
          
    Lab #1

Part 0 - How to read this markdown

Things that are bold are important and you should pay attention to them. Things in bold and italics are very importatnt and you should 100% remember them.
Things that look like this are code snippets that you can either put into your program or into the command line

Things that look like this are side notes that you should pay attention to, but aren't directly relevant to the lab

Part 1 - Starting up SQLite and creating a database

Remember to check the CLI documentation if you run into issues with the SQLite command line tools
Open a terminal and launch sqlite3 from the command line with the following command: sqlite3
The application is installed within the system path, so there is no special syntax required to launch.
At this point, we'll be launched into the sqlite shell. The shell will enable us to create and work with databases using a mix of sqlite3 commands (all prefaced by a period) and standard SQL. Most database engines provide a similar shell (and there are many graphical options we can use to manage a database too ... we'll get there soon).

Sqlite is largely intended to be embedded within an application. Thus, the shell plays an extra duty in that it actually runs the database server too. When we get to PostgreSQL, you'll see that the shell is actually a fairly basic client that connects to a database engine service running on the server. In sqlite, we wouldn't normally work this way (except for administrative or developer tasks). Generally the sqlite engine will be pulled into the application by way of a library include.
We'll demonstrate this in a later section so that you can get a good sense of how the architecture differs from PostgreSQL.

The default database instance is stored in memory and will be wiped when we close out of the shell
We can create a persistent database called lab1 that will be stored as a file in the local directory by calling .open lab1
Next, call .databases to see a list of known databases and verify that lab1 has been added to the list.
Part 2 - Creating, editing, and viewing tables

Now we need to add a table. The syntax to create a table is defined by SQL data definition language (DDL) (see the assigned Vertabelo tutorials for some practice with this command). We'll start with a simple table called t1 with two columns for an integer id and arbitrary length message.
CREATE TABLE t1 (id int, message text);
This command will create a table in our database, name it t1, and create the columns id (which is an integer), and message (which is text). Our table now looks like this:


id (int)
message (text)


Note the caps used for the SQL keywords. While SQL is not case sensitive, the primary style convention is to use all caps for keywords. A semicolon is required at the end of each statement (though some database engines are more forgiving).

Call sqlite's .tables command to verify that t1 was created.
Now, let's insert a few rows in this database using SQL data manipulation language (DML).
INSERT INTO t1 VALUES (0, 'Hello, world!'), (1, 'Hello, class!'), (2, 'Hello, database!');
This will insert into table t1 the 3 values that follow the command. Our table will now look like the following:


id (int)
message (text)


0
Hello world!


1
Hello class!


2
Hello database!


Finally, we'll use the SQL data query language (DQL) to see what we have so far.
SELECT * FROM t1;
This selects everything (*) from the table t1. This should return to us the table we created in the previous command
Take a few minutes to experiment and add some additional rows. Can you add a row with a repeat ID? What happens if you include extra fields in your row definition (the part within the parens)?
Before we move on, let's look at a couple more built-in (not SQL standard) commands. Go ahead and give .help a try and see if you can sort out the command to view the schema for the table we've created.
Also interesting is the .save command, which would have given us an option to save our work if we hadn't started out by opening a new database. We'll go ahead and double check the persistence claim by exiting with .exit and running sqlite3 again from the command line (this time passing the lab1 database as the argument ... sqlite3 lab1).
Alright, let's do something a bit more interesting. We'll dip into SQL DDL once more to rid ourself of this rather useless table.
DROP TABLE t1;
This command will drop (permanently delete!) the table t1. This action is not reversible!
View the tables again to verify using the appropriate sqlite command, and then exit the shell.
Part 3 - Generating and importing sample data

Let's download a more realistic data set. To do this, we'll make use of the web-based Mockaroo tool. Mockaroo provides the tools to generate realistic looking sample data with user-defined parameters in a variety of output formats. We're going to modify the default fields slightly to generate a comma-separated values (CSV) file.
Poke around the interface a bit to see what you come up with. I'm going to remove the ip_address field and replace it with a field called birthdate. Change the type of birthdate to Date and set an appropriate range. For the example, I'll use 1/1/1951 to 12/31/1999, and I'll set the format to the yyyy-mm-dd (which is a format that SQLite can understand and interpret correctly for date arithmetic).
For this example, I'd also like to ensure that we have some missing data, so I'll set email address to blank for a percentage of entries (let's say 12%). Now we are prepared to generate and download the data. Open finder and rename the MOCK_DATA.csv file as lab1.csv.
Returning to SQLite, let's import this data. First, we need to set the input/output mode to recognize the CSV format. In the shell, this is a simple command: .mode csv. Now, all we need to do is call the built-in import command with the name of our file and the name of the table to create: .import Downloads/lab1.csv Person.
Check that the Person table was created correctly using .schema Person. If the schema shows all data pulled into a single column, drop the table and try the import again, ensuring that you first set the mode to csv. You should see that SQLite was smart enough to select appropriate column names from the header row in the CSV file. If our CSV didn't have column names, we would need to create the table and add all attributes using the SQL CREATE TABLE syntax that we learned earlier.
Also noteworthy is that SQLite automatically designated all the columns as type TEXT. We'll learn more about data types in the coming weeks. For now, I want to point out that database engines differ somewhat in their support for data types and that SQLite is more limited in this respect with only six named types: TEXT, NUMERIC, INTEGER, REAL, and NONE. As you'll see in a moment, SQLite can do quite a lot with these types, e.g., correctly interpreting TEXT dates and INTEGER boolean values for comparison and arithmentic operations.
Now, let's experiment a bit with this sample data and learn a bit more about SQL in the process. I'd like to highlight first a quirk of our data generation and import by trying to search for every person with a missing email address. Normally, we'd leverage the IS NULL operator to find rows with missing fields, but that won't work here. Try it out, SELECT * FROM Person WHERE email IS NULL;. The problem here is that our empty fields were actually imported as "" empty strings. NULL represents the case of no value, whereas the empty string is evaluated as a value. Let's test this theory: SELECT * FROM Person WHERE email = "";. If you entered the command correctly, SQLite should output a bunch of rows with an empty string placeholder in the email field.
Before we move on to the final task, let's fix the data to work as we expect. SQL provides the UPDATE command for this scenario. Though I won't fully explain the syntax at this point, I think you'll be catch the gist of what we're doing. Now, go ahead and execute the update command: UPDATE Person SET email=NULL WHERE email="";. Run the IS NULL query one more time to check the results.
On to our last task. Let's play with dates to find all users in a range, let's say younger than 30 years old. For the moment, we can get by with a statically encoded date. Soon we'll learn some more powerful mechanisms for manipulating dates. As I mentioned earlier, SQLite does understand dates at a semantic level if the textual representation follows a known pattern. So, we need to express our date as a string in the "yyyy-mm-dd" format, and we'll gain the ability to use standard comparison operators. Give it a shot: SELECT * FROM Person WHERE birthdate >= "1986-03-29";. Double check the results to make sure they meet your expectations. If you forgot to quote the date with single or double quotes, you may see unpredictable results.