Skip to content

Instantly share code, notes, and snippets.

@misho-kr
Last active February 2, 2020 05:25
Show Gist options
  • Save misho-kr/bd6f0fbdd80623a611a0511765c1ae61 to your computer and use it in GitHub Desktop.
Save misho-kr/bd6f0fbdd80623a611a0511765c1ae61 to your computer and use it in GitHub Desktop.
Summary of "Introduction to Importing Data in Python" course on Datacamp

As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models, and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In this course, you'll learn the many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL.

Lead by Hugo Bowne-Anderson, Data Scientist at DataCamp

Introduction and flat files

In this chapter, you'll learn how to import data into Python from all types of flat files, which are a simple and prevalent form of data storage. You've previously learned how to use NumPy and pandas—you will learn how to use these packages to import flat files and customize your imports.

  • read() and readline()
  • Numpy.loadtxt()
  • np.genfromtxt() and np.recfromcsv()
  • pandas.read_csv()
  • Customizing pandas import

Importing data from other file types

You've learned how to import flat files, but there are many other file types you will potentially have to work with as a data scientist. In this chapter, you'll learn how to import data into Python from a wide array of important file types. These include pickled files, Excel spreadsheets, SAS and Stata files, HDF5 files, a file type for storing large quantities of numerical data, and MATLAB files.

  • pickle.load()
  • pandas.ExcelFile() and xls.parse()
  • Customizing spreadsheet import
  • SAS, Stata, HDF5 and MATLAB files

Working with relational databases in Python

In this chapter, you'll learn how to extract meaningful data from relational databases, an essential skill for any data scientist. You will learn about relational models, how to create SQL queries, how to filter and order your SQL records, and how to perform advanced queries by joining database tables.

  • SQLAlchemy create_engine(), engine.table_names()
  • engine.connect(), con.execute(), rs.keys()
  • WHERE, ORDER BY
  • pandas.read_sql_query('SELECT * FROM table', engine)
  • INNER JOIN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment