NumPy is an essential Python library. TensorFlow and scikit-learn use NumPy arrays as inputs, and pandas and Matplotlib are built on top of NumPy. In this Introduction to NumPy course, you'll become a master wrangler of NumPy's core object: arrays! You'll discover why NumPy is so efficient and use broadcasting and vectorization to make your NumPy code even faster. By the end of the course, you'll be using 3D arrays to alter a Claude Monet painting.
By Izzy Weber, Curriculum Developer @ DataCamp
Create and change array shapes to suit your needs. Discover NumPy's many data types and how they contribute to speedy array operations.
- Creating arrays from lists
python_list = [3, 2, 5, 8, 4, 9, 7, 6, 1]
array = np.array(python_list)
python_list_of_lists = [[3, 2, 5],
[9, 7, 1],
[4, 3, 6]]
np.array(python_list_of_lists)
- Creating arrays from scratch
np.zeros((5, 3))
np.random.random((2, 4))
np.arange(-3, 4)
np.arange(4)
array([-3, 0, 3])
- Array dimensionality
- Vector arrays
- Matrix and tensor arrays
array_1_2D = np.array([[1, 2], [5, 7]])
array_2_2D = np.array([[8, 9], [5, 7]])
array_3_2D = np.array([[1, 2], [5, 7]])
array_3D = np.array([array_1_2D, array_2_2D, array_3_2D])
-
Shapeshifting
- Rows and columns
.shape
,flatten()
,.reshape()
-
NumPy vs. Python data types
.dtype
attribute- Default data types
- dtype as an argument
- Type conversion with
.astype()
- Type coercion
Slicing, filtering, and sorting New York City’s tree census data. Create new arrays by pulling data based on conditional statements, and add and remove data along any dimension. Shape and dimension compatibility principles to prepare for super-fast array math.
- Indexing and slicing arrays
- Selecting elements, rows and columns
- Slicing 1D, 2D and with steps
- Sorting arrays
- Axis order
sudoku_game[2, 4]
sudoku_game[0]
sudoku_game[:, 2]
array = np.array([2, 4, 6, 8, 10])
array[2:4]
sudoku_game[3:6, 3:6]
sudoku_game[3:6:2, 3:6:2]
np.sort(sudoku_game)
np.sort(sudoku_game, axis=0)
- Filtering arrays
- Masks and fancy indexing
- Fancy indexing returns array of elements
- np.where()
- Returns array of indices
- Can create an array based on whether elements do or don't meet condition
- Find and replace
- Masks and fancy indexing
> one_to_five = np.arange(1, 6)
> mask = one_to_five % 2 == 0
> mask
array([False, True, False, True, False])
> one_to_five[mask]
array([2, 4])
> classroom_ids_and_sizes = np.array([[1, 22], [2, 21], [3, 27], [4, 26]])
> classroom_ids_and_sizes
array([[ 1, 22],
[ 2, 21],
[ 3, 27],
[ 4, 26]])
> classroom_ids_and_sizes[:, 1] % 2 == 0
array([ True, False, False, True])
> classroom_ids_and_sizes[:, 0][classroom_ids_and_sizes[:, 1] % 2 == 0]
array([1, 4])
> np.where(classroom_ids_and_sizes[:, 1] % 2 == 0)
(array([0, 3]),)
> row_ind, column_ind = np.where(sudoku_game == 0)
> np.where(sudoku_game == 0, "", sudoku_game)
- Adding and removing data
- Concatenating rows
np.concatenate()
concatenates along the first axis by default
- Concatenating column
- Input array dimentions for the concatenating axis must match exaclty
- Input arrays must have the same number of dimentions
- Creating compatibility with
reshape()
- Deleting with
np.delete()
- Deleting rows and columns
- Deleting without axis
- Concatenating rows
> np.concatenate((classroom_ids_and_sizes, new_classrooms))
> np.concatenate((classroom_ids_and_sizes, grade_levels_and_teachers), axis=1)
> np.delete(classroom_data, 1, axis=0)
> np.delete(classroom_data, 1, axis=1)
> np.delete(classroom_data, 1)
Leverage NumPy’s speedy vectorized operations to gather summary insights on sales data for American liquor stores, restaurants, and department stores. Vectorize Python functions for use in your NumPy code. Use broadcasting logic to perform mathematical operations between arrays of different sizes.
- Aggregating methods -
sum()
,min()
,max()
,min()
,cumsum()
- The
keepdims
argument
- The
> security_breaches.sum()
> security_breaches.sum(axis=0)
> security_breaches.sum(axis=1, keepdims=True)
> security_breaches.cumsum(axis=0)
> cum_sums_by_client = security_breaches.cumsum(axis=0)
> plt.plot(np.arange(1, 6), cum_sums_by_client[:, 0], label="Client 1")
> plt.plot(np.arange(1, 6), cum_sums_by_client.mean(axis=1), label="Average")
> plt.legend()
> plt.show()
- Vectorized operations
- With little help from C, speed compared to Python
- Addition, multiplication, logical operations
- Vectorize Python code
> array = np.array(["NumPy", "is", "awesome"])
> len(array) > 2
> vectorized_len = np.vectorize(len)
> vectorized_len(array) > 2
array([ True, False, True])
- Broadcasting
- Broadcasting a scalar
- Compatibility rules
- NumPy compares sets of array dimensions from right to left
- Two dimensions are compatible when...
- One of them has a length of one or
- They are of equal lengths
- All dimension sets must be compatible
- Broadcastable shapes -
(10, 5)
and(10, 1)
- Shapes which are not broadcastable 0
(10, 5)
and(5, 10)
- Broadcasting rows
- Broadcasting columns
> array = np.arange(10).reshape((2, 5))
> array + np.array([0, 1, 2, 3, 4])
array([[ 0, 2, 4, 6, 8],
[ 5, 7, 9, 11, 13]])
> array = np.arange(10).reshape((2, 5))
> array + np.array([0, 1])
ValueError: operands could not be broadcast together with shapes (2,5) (2,)
- Saving and loading arrays
- Examining and updating RGB data
with open("logo.npy", "rb") as f:
logo_rgb_array = np.load(f)
plt.imshow(logo_rgb_array)
plt.show()
red_array = logo_rgb_array[:, :, 0]
blue_array = logo_rgb_array[:, :, 1]
green_array = logo_rgb_array[:, :, 2]
dark_logo_array = np.where(logo_rgb_array == 255, 50, logo_rgb_array)
plt.imshow(dark_logo_array)
plt.show()
with open("dark_logo.npy", "wb") as f:
np.save(f, dark_logo_array)
- Data augmentation
- Flipping an array
- Transposing an array
- Setting transposed axis order
flipped_logo = np.flip(logo_rgb_array)
flipped_rows_logo = np.flip(logo_rgb_array, axis=0)
flipped_colors_logo = np.flip(logo_rgb_array, axis=2)
flipped_except_colors_logo = np.flip(logo_rgb_array, axis=(0, 1))
array = np.array([[1.1, 1.2, 1.3],
[2.1, 2.2, 2.3],
[3.1, 3.2, 3.3],
[4.1, 4.2, 4.3]])
np.flip(array)
np.transpose(array)
transposed_logo = np.transpose(logo_rgb_array, axes=(1, 0, 2))
- Slicing dimensions
- Splitting arrays
- Trailing dimensions
- Stacking arrays
rgb = np.array([[[255, 0, 0], [255, 255, 0], [255, 255, 255]],
[[255, 0, 255], [0, 255, 0], [0, 255, 255]],
[[0, 0, 0], [0, 255, 255], [0, 0, 255]]])
red_array = rgb[:, :, 0]
green_array = rgb[:, :, 1]
blue_array = rgb[:, :, 2]
red_array, green_array, blue_array = np.split(rgb, 3, axis=2)
red_array.shape
(3, 3, 1)
red_array_2D = red_array.reshape((3, 3))
red_array_2D.shape
(3, 3)
stacked_rgb = np.stack([red_array, green_array, blue_array], axis=2)