Skip to content

Instantly share code, notes, and snippets.

@misho-kr
Last active November 2, 2019 05:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save misho-kr/57c7c2350b5812dff931a907d5ec1582 to your computer and use it in GitHub Desktop.
Save misho-kr/57c7c2350b5812dff931a907d5ec1582 to your computer and use it in GitHub Desktop.
Summary of "Python Data Science Toolbox (Part 2)" course on Datacamp

In this second Python Data Science Toolbox course, you'll continue to build your Python data science skills. First, you'll learn about iterators, objects you have already encountered in the context of for loops. You'll then learn about list comprehensions, which are extremely handy tools for all data scientists working in Python. You'll end the course by working through a case study in which you'll apply all the techniques you learned in both parts of this course.

Using iterators in PythonLand

You'll learn all about iterators and iterables, which you have already worked with when writing for loops.

  • Iterators and iterables, iter() and next()
  • Enumerate(), zip() and unzip with zip()
  • Using iterators to load large data in chunks
z1 = zip(list1, list2)
result1, result2 = zip(*z1)

assert(list1 == result1 and list2 == result2)

List comprehensions and generators

You are introduced to list comprehensions, which allow you to create complicated lists—and lists of lists—in one line of code! You learn about generators, which are extremely helpful when working with large sequences of data that you may not want to store in memory, but instead generate on the fly.

  • List comprehensions, nested list comprehensions
  • Using conditionals in list comprehensions
  • Dictionary comprehensions
  • Generator expressions
  • Generator functions
result = (num for num in range(0,31))
for value in result:
    print(value)

Bringing it all together!

You'll write your own functions and list comprehensions as you work with iterators and generators to solidify your Python data science chops.

  • Turning dictionary into pandas' frame
import pandas as pd

list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]
df = pd.DataFrame(list_of_dicts)
  • Process data in chunks with loops
  • Writing generator process data in chunks
  • Using pandas' read_csv iterator for streaming data
urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000)
df_urb_pop = next(urb_pop_reader)
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']

pops = zip(df_pop_ceb['Total Population'], 
           df_pop_ceb['Urban population (% of total)'])
pops_list = list(pops)

# Use list comprehension to create new DataFrame column 'Total Urban Population'
df_pop_ceb['Total Urban Population'] = [int(total * urban_pct / 100) for total, urban_pct in pops_list]

# Plot urban population data
df_pop_ceb.plot(kind='scatter', x='Year', y='Total Urban Population')
plt.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment