Numpy
This page provides an introduction to numpy. These are my notes from YouTube Video.
Overview
NumPy is a powerful Python library for numerical computing, widely used for handling large, multi-dimensional arrays and matrices. To start using numpy you have to install and import numpy as below:
# install numpy
pip3 install numpy
# install matplotlib - required later
pip3 install matplotlib
# importing numpy & matplotlib
python3
import numpy as np
import matplotlib.pyplot as plt
Array
# int array
a = np.array([1, 2, 3, 4])
type(a)
# o/p: <class 'numpy.ndarray
a.dtype
# o/p: dtype('int64')
# float array
b = np.array([1.2, 3.4, 5.6, 7.8])
b.dtype
# o/p: dtype('float64')
# access a value
a[0]
# o/p: np.int64(1)
# override a value
a[0] = 10
# o/p: array([10, 2, 3, 4])
# assigning float value in integer array
a[0] = 11.5
# o/p: array([11, 2, 3, 4])
# dimension
a.ndim
# o/p: 1
# shape - shows number of elements along each dimension
a.shape
# o/p: (4, )
# size = number of elements in the array
a.size
# o/p: 4
All the data in numpy array should be of the same type. If you overwrite a value in an integer array with a float, the decimal part will be truncated.
Vectorized Operation
# add
a + b
# o/p: array([12.2, 5.4, 8.6, 11.8])
# div
a / b
# o/p: array([9.16666667, 0.58823529, 0.53571429, 0.51282051])
# multiply
a * b
# o/p: array([13.2, 6.8, 16.8, 31.2])
# pow
a ** b
# o/p: array([1.77693369e+01, 1.05560633e+01, 4.69763237e+02, 4.96670005e+04])
# adding constant
a + 10
# o/p: array([21, 12, 13, 14])
Universal Functions
np.sin(a)
# o/p: array([-0.99999021, 0.90929743, 0.14112001, -0.7568025 ])
2D Array
# defining a 2d array
a_2d_small = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8]
])
# o/p:
# array([[1, 2, 3, 4],
# [5, 6, 7, 8]])
# creates an array of range from 0 to 25 and the reshape it as 5*5 array
a_2d = np.arange(25).reshape(5, 5)
# o/p:
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24]])
# shape
a_2d.shape
# o/p: (5, 5)
# size
a_2d.size
# o/p: 25
# ndim
a_2d.ndim
# o/p: 2
# set
a_2d[1, 3] = -1
# o/p:
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, -1, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24]])
a_2d[1, 3] = 8
# o/p:
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24]])
# get
print(a_2d[1, 3])
# o/p: 8
Slicing
Extracts the portion specified by lower(inclusive) and upper(exclusive) bound, taking each step of size step.
a_2d[1]
# o/p: array([5, 6, 7, 8, 9])
# [lower: upper: step]
a_2d[0:2:1]
# o/p:
# array([[0, 1, 2, 3, 4],
# [5, 6, 7, 8, 9]])
Indexing and Slicing
We specify indexing first and then specify slicing stride.
# select 0th index and then extract stride 1:3
a_2d[0, 1:3]
# o/p: array([1, 2])
# select strides from each dimension,
# from row dimension select a stride from 0 to everything and
# from col dimesion select a stride from 1 to everything
a_2d[0:, 1:]
# o/p:
# array([[ 1, 2, 3, 4],
# [ 6, 7, 8, 9],
# [11, 12, 13, 14],
# [16, 17, 18, 19],
# [21, 22, 23, 24]])
# from row dimension select a stride from 0 to everything and
# from col dimesion only 2 column
a_2d[0:, 2]
# array([ 2, 7, 12, 17, 22])
# Use negative to throw away.
# For example a_2d[0:-1, ], this formula means select all the rows except last row.
a_2d[0:-1, ]
# o/p:
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19]])
Changing a slice of an array by assigning new values updates the original array, as the modification affects the same memory location.
Blurring an Image
Let's say we have an image as below where each square represents a pixel in an image.
To blur an image, we essentially merge neighboring pixels to create a lower-resolution image. This process involves taking the average color of adjacent pixels to form a single pixel. For instance, all pixels marked with a green dot are combined by averaging their colors to create one pixel.
In the example below, we apply this technique to downscale a 4x4 image to a 2x2 image, reducing the pixel count while maintaining an averaged representation of the original image.
Firstly, we group top pixels of all the four pixel group(Green, Pink, Blue and Yellow).
Similary group left, right, bottom and center pixels from all the four pixel groups and calculate avg. Below are the formulas for slicing pixel groups:
Slicing formula for top pixels:
- Start from , and throw away last rows.
- Start from , and throw away last column.
Slicing formula for left pixels:
- Start from , and throw away last row.
- Start from , and throw away last columns.
Slicing formula for right pixels:
- Start from , and throw away last row.
- Start from , and select all columns.
Slicing formula for bottom pixels:
- Start from , and select everything.
- Start from , and throw away last column.
Slicing formula for center pixels:
- Start from , and throw away last row.
- Start from , and throw away last column.
img = np.arange(25).reshape(5, 5)
# o/p:
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24]])
blurred_img = (
img[:-2, 1:-1] + # top
img[1:-1, :-2] + # left
img[1:-1, 2:] + # right
img[2:, 1:-1] + # bottom
img[1:-1, 1:-1] # center
) / 5.0
blurred_img
# o/p:
# array([[ 6., 7., 8.],
# [11., 12., 13.],
# [16., 17., 18.]])
Let's blurr an actual image. Original image looks like below:
# using matplotlib import an nature.png image
nature_img = plt.imread("static/img/nature.png")
# blur logic
def blur_img(nature_img):
return (
nature_img[:-2, 1:-1] + # top
nature_img[1:-1, :-2] + # left
nature_img[1:-1, 2:] + # right
nature_img[2:, 1:-1] + # bottom
nature_img[1:-1, 1:-1] # center
) / 5.0
# blur once
nature_img = blur_img(nature_img)
# save logic
plt.imsave("static/img/nature_blur_1.png", nature_img)
After blurring it once image looks like below:
Let's blurr is more times.
for _ in range(1, 50): nature_img = blur_img(nature_img)
plt.imsave("static/img/nature_blur_50.png", nature_img)
After blurring the image more times it looks like below:
Fancy Indexing
a_fancy = np.arange(0, 80, 10)
# indexing by position
indices = [1, 2, 5]
y_indices = a_fancy[indices]
print(y_indices)
# o/p: [10, 20, 50]
# indexing with booleans
mask = np.array([0, 1, 1, 0, 0, 1, 0, 0], dtype=bool)
y_bool = a_fancy[mask]
print(y_bool)
# o/p: [10, 20, 50]
# replacing all the negative numbers in an array with 0
a_fancy_neg = np.array([1, 31, -1, 341, -11, 90, -7])
mask_neg = a_fancy_neg < 0
# o/p: array([False, False, True, False, True, False, True])
a_fancy_neg[mask_neg] = 0
a_fancy_neg
# o/p: array([ 1, 31, 0, 341, 0, 90, 0])
Fancy Indexing in 2D
a_fancy_2d = np.arange(25).reshape(5, 5)
# o/p:
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24]])
# retrieve diagonal elements using indexing by position
y_fancy_indices_2d = a_fancy_2d[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
print(y_fancy_indices_2d)
# o/p: [ 0 6 12 18 24]
# selecting all rows and column 0, 1, 4.
# this can't be done with slicing
# as step increases by 1 and then by 3, so we use fancy indexing.
a_fancy_2d[0:, [0, 2, 4]]
# o/p: array([[ 0, 2, 4],
# [ 5, 7, 9],
# [10, 12, 14],
# [15, 17, 19],
# [20, 22, 24]])
Array Broadcasting Rules
p = np.array([
[0, 0, 0],
[10, 10, 10],
[20, 20, 20],
[30, 30, 30]
])
# o/p:
# array([[ 0, 0, 0],
# [10, 10, 10],
# [20, 20, 20],
# [30, 30, 30]])
q = np.array([
[0, 1, 2],
[0, 1, 2],
[0, 1, 2],
[0, 1, 2]
])
# o/p:
# array([[0, 1, 2],
# [0, 1, 2],
# [0, 1, 2],
# [0, 1, 2]])
r = np.array([[0], [10], [20], [30]])
# o/p:
# array([[ 0],
# [10],
# [20],
# [30]])
s = np.array([0, 1, 2])
# o/p: array([0, 1, 2])
# p + q == q + p = p + r = s + r
# here s and r are repeated across rows or columns to match the shape.
a_1 = p + q
# o/p:
# array([[ 0, 1, 2],
# [10, 11, 12],
# [20, 21, 22],
# [30, 31, 32]])
a_2 = q + p
# o/p:
# array([[ 0, 1, 2],
# [10, 11, 12],
# [20, 21, 22],
# [30, 31, 32]])
a_3 = p + s
# o/p:
# array([[ 0, 1, 2],
# [10, 11, 12],
# [20, 21, 22],
# [30, 31, 32]])
a_4 = s + r
# o/p:
# array([[ 0, 1, 2],
# [10, 11, 12],
# [20, 21, 22],
# [30, 31, 32]])
Array Calculation Methods
a_method = np.arange(9).reshape(3, 3)
# o/p:
# array([[0, 1, 2],
# [3, 4, 5],
# [6, 7, 8]])
# sum all elements in an array
a_method.sum()
# o/p: np.int64(36)
# sum all elements row by rwo
a_method.sum(axis = 0)
# o/p: array([ 9, 12, 15])
# sum all elements col by col
a_method.sum(axis = 1)
# o/p: array([ 3, 12, 21])
# min from an array
a_method.min()
# o/p: np.int64(0)
a_method.min(axis = 0)
# o/p: array([0, 1, 2])
# max from an array
a_method.max()
# o/p: np.int64(8)
a_method.max(axis = 0)
# o/p: array([6, 7, 8])
# index of min from an array
a_method.argmin()
# o/p: np.int64(0)
a_method.argmin(axis = 0)
# o/p: array([0, 0, 0])
# index of max from an array
a_method.argmax()
# o/p: np.int64(8)
a_method.argmax(axis = 0)
# o/p: array([2, 2, 2])
# un-flatten 1D locations
# instead of saying 8 which is flatten 1D location of max element in array,
# below function gives you (2, 2)
np.unravel_index(
a_method.argmax(), a_method.shape
)
# o/p: (np.int64(2), np.int64(2))
# where
np.where(a_method == a_method.max())
# o/p: (array([2]), array([2]))