Skip to content

Instantly share code, notes, and snippets.

@misho-kr
Last active February 20, 2023 06:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save misho-kr/b12927ce5896d83115a149adf885de2c to your computer and use it in GitHub Desktop.
Save misho-kr/b12927ce5896d83115a149adf885de2c to your computer and use it in GitHub Desktop.
Summary of "Introduction to NumPy" from Datacamp.Org

NumPy is an essential Python library. TensorFlow and scikit-learn use NumPy arrays as inputs, and pandas and Matplotlib are built on top of NumPy. In this Introduction to NumPy course, you'll become a master wrangler of NumPy's core object: arrays! You'll discover why NumPy is so efficient and use broadcasting and vectorization to make your NumPy code even faster. By the end of the course, you'll be using 3D arrays to alter a Claude Monet painting.

By Izzy Weber, Curriculum Developer @ DataCamp

1. Understanding NumPy Arrays

Create and change array shapes to suit your needs. Discover NumPy's many data types and how they contribute to speedy array operations.

  • Creating arrays from lists
python_list = [3, 2, 5, 8, 4, 9, 7, 6, 1]
array = np.array(python_list)

python_list_of_lists = [[3, 2, 5],
                        [9, 7, 1],
                        [4, 3, 6]]
np.array(python_list_of_lists)
  • Creating arrays from scratch
np.zeros((5, 3))
np.random.random((2, 4))
np.arange(-3, 4)
np.arange(4)
array([-3, 0, 3])
  • Array dimensionality
    • Vector arrays
    • Matrix and tensor arrays
array_1_2D = np.array([[1, 2], [5, 7]])
array_2_2D = np.array([[8, 9], [5, 7]])
array_3_2D = np.array([[1, 2], [5, 7]])
array_3D = np.array([array_1_2D, array_2_2D, array_3_2D])
  • Shapeshifting

    • Rows and columns
    • .shape, flatten(), .reshape()
  • NumPy vs. Python data types

    • .dtype attribute
    • Default data types
    • dtype as an argument
    • Type conversion with .astype()
    • Type coercion

2. Selecting and Updating Data

Slicing, filtering, and sorting New York City’s tree census data. Create new arrays by pulling data based on conditional statements, and add and remove data along any dimension. Shape and dimension compatibility principles to prepare for super-fast array math.

  • Indexing and slicing arrays
    • Selecting elements, rows and columns
    • Slicing 1D, 2D and with steps
  • Sorting arrays
    • Axis order
sudoku_game[2, 4]
sudoku_game[0]
sudoku_game[:, 2]

array = np.array([2, 4, 6, 8, 10])
array[2:4]

sudoku_game[3:6, 3:6]
sudoku_game[3:6:2, 3:6:2]

np.sort(sudoku_game)
np.sort(sudoku_game, axis=0)
  • Filtering arrays
    • Masks and fancy indexing
      • Fancy indexing returns array of elements
    • np.where()
      • Returns array of indices
      • Can create an array based on whether elements do or don't meet condition
      • Find and replace
> one_to_five = np.arange(1, 6)
> mask = one_to_five % 2 == 0
> mask
array([False, True, False, True, False])

> one_to_five[mask]
array([2, 4])

> classroom_ids_and_sizes = np.array([[1, 22], [2, 21], [3, 27], [4, 26]])
> classroom_ids_and_sizes
array([[ 1, 22],
       [ 2, 21],
       [ 3, 27],
       [ 4, 26]])

> classroom_ids_and_sizes[:, 1] % 2 == 0
array([ True, False, False,  True])

> classroom_ids_and_sizes[:, 0][classroom_ids_and_sizes[:, 1] % 2 == 0]
array([1, 4])

> np.where(classroom_ids_and_sizes[:, 1] % 2 == 0)
(array([0, 3]),)

> row_ind, column_ind = np.where(sudoku_game == 0)
> np.where(sudoku_game == 0, "", sudoku_game)
  • Adding and removing data
    • Concatenating rows
      • np.concatenate() concatenates along the first axis by default
    • Concatenating column
    • Input array dimentions for the concatenating axis must match exaclty
    • Input arrays must have the same number of dimentions
    • Creating compatibility with reshape()
    • Deleting with np.delete()
      • Deleting rows and columns
      • Deleting without axis
> np.concatenate((classroom_ids_and_sizes, new_classrooms))
> np.concatenate((classroom_ids_and_sizes, grade_levels_and_teachers), axis=1)

> np.delete(classroom_data, 1, axis=0)
> np.delete(classroom_data, 1, axis=1)
> np.delete(classroom_data, 1)

3. Array Mathematics!

Leverage NumPy’s speedy vectorized operations to gather summary insights on sales data for American liquor stores, restaurants, and department stores. Vectorize Python functions for use in your NumPy code. Use broadcasting logic to perform mathematical operations between arrays of different sizes.

  • Aggregating methods - sum(), min(), max(), min(), cumsum()
    • The keepdims argument
> security_breaches.sum()
> security_breaches.sum(axis=0)
> security_breaches.sum(axis=1, keepdims=True)
> security_breaches.cumsum(axis=0)

> cum_sums_by_client = security_breaches.cumsum(axis=0)
> plt.plot(np.arange(1, 6), cum_sums_by_client[:, 0], label="Client 1")
> plt.plot(np.arange(1, 6), cum_sums_by_client.mean(axis=1), label="Average")
> plt.legend()
> plt.show()
  • Vectorized operations
    • With little help from C, speed compared to Python
    • Addition, multiplication, logical operations
    • Vectorize Python code
> array = np.array(["NumPy", "is", "awesome"])
> len(array) > 2

> vectorized_len = np.vectorize(len)
> vectorized_len(array) > 2
array([ True, False, True])
  • Broadcasting
  • Broadcasting a scalar
    • Compatibility rules
    • NumPy compares sets of array dimensions from right to left
    • Two dimensions are compatible when...
      • One of them has a length of one or
      • They are of equal lengths
    • All dimension sets must be compatible
    • Broadcastable shapes - (10, 5) and (10, 1)
    • Shapes which are not broadcastable 0 (10, 5) and (5, 10)
  • Broadcasting rows
  • Broadcasting columns
> array = np.arange(10).reshape((2, 5))
> array + np.array([0, 1, 2, 3, 4])
array([[ 0, 2, 4, 6, 8],
       [ 5, 7, 9, 11, 13]])

> array = np.arange(10).reshape((2, 5))
> array + np.array([0, 1])
ValueError: operands could not be broadcast together with shapes (2,5) (2,)

4. Array Transformations

  • Saving and loading arrays
  • Examining and updating RGB data
with open("logo.npy", "rb") as f:
  logo_rgb_array = np.load(f)
plt.imshow(logo_rgb_array)
plt.show()

red_array = logo_rgb_array[:, :, 0]
blue_array = logo_rgb_array[:, :, 1]
green_array = logo_rgb_array[:, :, 2]

dark_logo_array = np.where(logo_rgb_array == 255, 50, logo_rgb_array)
plt.imshow(dark_logo_array)
plt.show()

with open("dark_logo.npy", "wb") as f:
  np.save(f, dark_logo_array)
  • Data augmentation
    • Flipping an array
    • Transposing an array
    • Setting transposed axis order
flipped_logo = np.flip(logo_rgb_array)
flipped_rows_logo = np.flip(logo_rgb_array, axis=0)
flipped_colors_logo = np.flip(logo_rgb_array, axis=2)
flipped_except_colors_logo = np.flip(logo_rgb_array, axis=(0, 1))

array = np.array([[1.1, 1.2, 1.3],
                  [2.1, 2.2, 2.3],
                  [3.1, 3.2, 3.3],
                  [4.1, 4.2, 4.3]])
np.flip(array)
np.transpose(array)

transposed_logo = np.transpose(logo_rgb_array, axes=(1, 0, 2))
  • Slicing dimensions
  • Splitting arrays
  • Trailing dimensions
  • Stacking arrays
rgb = np.array([[[255, 0, 0], [255, 255, 0], [255, 255, 255]],
                [[255, 0, 255], [0, 255, 0], [0, 255, 255]],
                [[0, 0, 0], [0, 255, 255], [0, 0, 255]]])
red_array = rgb[:, :, 0]
green_array = rgb[:, :, 1]
blue_array = rgb[:, :, 2]

red_array, green_array, blue_array = np.split(rgb, 3, axis=2)
red_array.shape
(3, 3, 1)

red_array_2D = red_array.reshape((3, 3))
red_array_2D.shape
(3, 3)

stacked_rgb = np.stack([red_array, green_array, blue_array], axis=2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment