misho-kr/Introduction to NumPy.md

## Introduction to NumPy.md

      
    Raw
  

              Introduction to NumPy.md
            
          
    Introduction to NumPy

NumPy is an essential Python library. TensorFlow and scikit-learn use NumPy arrays as inputs, and pandas and Matplotlib are built on top of NumPy. In this Introduction to NumPy course, you'll become a master wrangler of NumPy's core object: arrays! You'll discover why NumPy is so efficient and use broadcasting and vectorization to make your NumPy code even faster. By the end of the course, you'll be using 3D arrays to alter a Claude Monet painting.
By Izzy Weber, Curriculum Developer @ DataCamp
1. Understanding NumPy Arrays

Create and change array shapes to suit your needs. Discover NumPy's many data types and how they contribute to speedy array operations.

Creating arrays from lists

python_list = [3, 2, 5, 8, 4, 9, 7, 6, 1]
array = np.array(python_list)

python_list_of_lists = [[3, 2, 5],
                        [9, 7, 1],
                        [4, 3, 6]]
np.array(python_list_of_lists)

Creating arrays from scratch

np.zeros((5, 3))
np.random.random((2, 4))
np.arange(-3, 4)
np.arange(4)
array([-3, 0, 3])

Array dimensionality

Vector arrays
Matrix and tensor arrays


array_1_2D = np.array([[1, 2], [5, 7]])
array_2_2D = np.array([[8, 9], [5, 7]])
array_3_2D = np.array([[1, 2], [5, 7]])
array_3D = np.array([array_1_2D, array_2_2D, array_3_2D])


Shapeshifting

Rows and columns
.shape, flatten(), .reshape()


NumPy vs. Python data types

.dtype attribute
Default data types
dtype as an argument
Type conversion with .astype()
Type coercion


2. Selecting and Updating Data

Slicing, filtering, and sorting New York City’s tree census data. Create new arrays by pulling data based on conditional statements, and add and remove data along any dimension. Shape and dimension compatibility principles to prepare for super-fast array math.

Indexing and slicing arrays

Selecting elements, rows and columns
Slicing 1D, 2D and with steps


Sorting arrays

Axis order


sudoku_game[2, 4]
sudoku_game[0]
sudoku_game[:, 2]

array = np.array([2, 4, 6, 8, 10])
array[2:4]

sudoku_game[3:6, 3:6]
sudoku_game[3:6:2, 3:6:2]

np.sort(sudoku_game)
np.sort(sudoku_game, axis=0)

Filtering arrays

Masks and fancy indexing

Fancy indexing returns array of elements


np.where()

Returns array of indices
Can create an array based on whether elements do or don't meet condition
Find and replace


> one_to_five = np.arange(1, 6)
> mask = one_to_five % 2 == 0
> mask
array([False, True, False, True, False])

> one_to_five[mask]
array([2, 4])

> classroom_ids_and_sizes = np.array([[1, 22], [2, 21], [3, 27], [4, 26]])
> classroom_ids_and_sizes
array([[ 1, 22],
       [ 2, 21],
       [ 3, 27],
       [ 4, 26]])

> classroom_ids_and_sizes[:, 1] % 2 == 0
array([ True, False, False,  True])

> classroom_ids_and_sizes[:, 0][classroom_ids_and_sizes[:, 1] % 2 == 0]
array([1, 4])

> np.where(classroom_ids_and_sizes[:, 1] % 2 == 0)
(array([0, 3]),)

> row_ind, column_ind = np.where(sudoku_game == 0)
> np.where(sudoku_game == 0, "", sudoku_game)

Adding and removing data

Concatenating rows

np.concatenate() concatenates along the first axis by default


Concatenating column
Input array dimentions for the concatenating axis must match exaclty
Input arrays must have the same number of dimentions
Creating compatibility with reshape()
Deleting with np.delete()

Deleting rows and columns
Deleting without axis


> np.concatenate((classroom_ids_and_sizes, new_classrooms))
> np.concatenate((classroom_ids_and_sizes, grade_levels_and_teachers), axis=1)

> np.delete(classroom_data, 1, axis=0)
> np.delete(classroom_data, 1, axis=1)
> np.delete(classroom_data, 1)
3. Array Mathematics!

Leverage NumPy’s speedy vectorized operations to gather summary insights on sales data for American liquor stores, restaurants, and department stores. Vectorize Python functions for use in your NumPy code. Use broadcasting logic to perform mathematical operations between arrays of different sizes.

Aggregating methods - sum(), min(), max(), min(), cumsum()

The keepdims argument


> security_breaches.sum()
> security_breaches.sum(axis=0)
> security_breaches.sum(axis=1, keepdims=True)
> security_breaches.cumsum(axis=0)

> cum_sums_by_client = security_breaches.cumsum(axis=0)
> plt.plot(np.arange(1, 6), cum_sums_by_client[:, 0], label="Client 1")
> plt.plot(np.arange(1, 6), cum_sums_by_client.mean(axis=1), label="Average")
> plt.legend()
> plt.show()

Vectorized operations

With little help from C, speed compared to Python
Addition, multiplication, logical operations
Vectorize Python code


> array = np.array(["NumPy", "is", "awesome"])
> len(array) > 2

> vectorized_len = np.vectorize(len)
> vectorized_len(array) > 2
array([ True, False, True])

Broadcasting
Broadcasting a scalar

Compatibility rules
NumPy compares sets of array dimensions from right to left
Two dimensions are compatible when...

One of them has a length of one or
They are of equal lengths


All dimension sets must be compatible
Broadcastable shapes - (10, 5) and (10, 1)
Shapes which are not broadcastable 0 (10, 5) and (5, 10)


Broadcasting rows
Broadcasting columns

> array = np.arange(10).reshape((2, 5))
> array + np.array([0, 1, 2, 3, 4])
array([[ 0, 2, 4, 6, 8],
       [ 5, 7, 9, 11, 13]])

> array = np.arange(10).reshape((2, 5))
> array + np.array([0, 1])
ValueError: operands could not be broadcast together with shapes (2,5) (2,)
4. Array Transformations


Saving and loading arrays
Examining and updating RGB data

with open("logo.npy", "rb") as f:
  logo_rgb_array = np.load(f)
plt.imshow(logo_rgb_array)
plt.show()

red_array = logo_rgb_array[:, :, 0]
blue_array = logo_rgb_array[:, :, 1]
green_array = logo_rgb_array[:, :, 2]

dark_logo_array = np.where(logo_rgb_array == 255, 50, logo_rgb_array)
plt.imshow(dark_logo_array)
plt.show()

with open("dark_logo.npy", "wb") as f:
  np.save(f, dark_logo_array)

Data augmentation

Flipping an array
Transposing an array
Setting transposed axis order


flipped_logo = np.flip(logo_rgb_array)
flipped_rows_logo = np.flip(logo_rgb_array, axis=0)
flipped_colors_logo = np.flip(logo_rgb_array, axis=2)
flipped_except_colors_logo = np.flip(logo_rgb_array, axis=(0, 1))

array = np.array([[1.1, 1.2, 1.3],
                  [2.1, 2.2, 2.3],
                  [3.1, 3.2, 3.3],
                  [4.1, 4.2, 4.3]])
np.flip(array)
np.transpose(array)

transposed_logo = np.transpose(logo_rgb_array, axes=(1, 0, 2))

Slicing dimensions
Splitting arrays
Trailing dimensions
Stacking arrays

rgb = np.array([[[255, 0, 0], [255, 255, 0], [255, 255, 255]],
                [[255, 0, 255], [0, 255, 0], [0, 255, 255]],
                [[0, 0, 0], [0, 255, 255], [0, 0, 255]]])
red_array = rgb[:, :, 0]
green_array = rgb[:, :, 1]
blue_array = rgb[:, :, 2]

red_array, green_array, blue_array = np.split(rgb, 3, axis=2)
red_array.shape
(3, 3, 1)

red_array_2D = red_array.reshape((3, 3))
red_array_2D.shape
(3, 3)

stacked_rgb = np.stack([red_array, green_array, blue_array], axis=2)