Skip to main content

Numpy

This page provides an introduction to numpy. These are my notes from YouTube Video.

Overview

NumPy is a powerful Python library for numerical computing, widely used for handling large, multi-dimensional arrays and matrices. To start using numpy you have to install and import numpy as below:

# install numpy 
pip3 install numpy

# install matplotlib - required later
pip3 install matplotlib

# importing numpy & matplotlib
python3
import numpy as np
import matplotlib.pyplot as plt

Array

# int array
a = np.array([1, 2, 3, 4])

type(a)
# o/p: <class 'numpy.ndarray

a.dtype
# o/p: dtype('int64')



# float array
b = np.array([1.2, 3.4, 5.6, 7.8])
b.dtype
# o/p: dtype('float64')



# access a value
a[0]
# o/p: np.int64(1)



# override a value
a[0] = 10
# o/p: array([10, 2, 3, 4])



# assigning float value in integer array
a[0] = 11.5
# o/p: array([11, 2, 3, 4])



# dimension
a.ndim
# o/p: 1



# shape - shows number of elements along each dimension
a.shape
# o/p: (4, )



# size = number of elements in the array
a.size
# o/p: 4
info

All the data in numpy array should be of the same type. If you overwrite a value in an integer array with a float, the decimal part will be truncated.

Vectorized Operation

# add 
a + b
# o/p: array([12.2, 5.4, 8.6, 11.8])


# div
a / b
# o/p: array([9.16666667, 0.58823529, 0.53571429, 0.51282051])


# multiply
a * b
# o/p: array([13.2, 6.8, 16.8, 31.2])


# pow
a ** b
# o/p: array([1.77693369e+01, 1.05560633e+01, 4.69763237e+02, 4.96670005e+04])


# adding constant
a + 10
# o/p: array([21, 12, 13, 14])

Universal Functions

np.sin(a)
# o/p: array([-0.99999021, 0.90929743, 0.14112001, -0.7568025 ])

2D Array

# defining a 2d array
a_2d_small = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8]
])
# o/p:
# array([[1, 2, 3, 4],
# [5, 6, 7, 8]])



# creates an array of range from 0 to 25 and the reshape it as 5*5 array
a_2d = np.arange(25).reshape(5, 5)
# o/p:
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24]])



# shape
a_2d.shape
# o/p: (5, 5)



# size
a_2d.size
# o/p: 25



# ndim
a_2d.ndim
# o/p: 2



# set
a_2d[1, 3] = -1
# o/p:
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, -1, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24]])



a_2d[1, 3] = 8
# o/p:
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24]])



# get
print(a_2d[1, 3])
# o/p: 8

Slicing

Extracts the portion specified by lower(inclusive) and upper(exclusive) bound, taking each step of size step.

a_2d[1]
# o/p: array([5, 6, 7, 8, 9])


# [lower: upper: step]
a_2d[0:2:1]
# o/p:
# array([[0, 1, 2, 3, 4],
# [5, 6, 7, 8, 9]])

Indexing and Slicing

We specify indexing first and then specify slicing stride.

# select 0th index and then extract stride 1:3
a_2d[0, 1:3]
# o/p: array([1, 2])



# select strides from each dimension,
# from row dimension select a stride from 0 to everything and
# from col dimesion select a stride from 1 to everything
a_2d[0:, 1:]
# o/p:
# array([[ 1, 2, 3, 4],
# [ 6, 7, 8, 9],
# [11, 12, 13, 14],
# [16, 17, 18, 19],
# [21, 22, 23, 24]])




# from row dimension select a stride from 0 to everything and
# from col dimesion only 2 column
a_2d[0:, 2]
# array([ 2, 7, 12, 17, 22])



# Use negative to throw away.
# For example a_2d[0:-1, ], this formula means select all the rows except last row.
a_2d[0:-1, ]
# o/p:
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19]])
info

Changing a slice of an array by assigning new values updates the original array, as the modification affects the same memory location.

Blurring an Image

Let's say we have an image as below where each square represents a pixel in an image.

numpy-basics-1.svg

To blur an image, we essentially merge neighboring pixels to create a lower-resolution image. This process involves taking the average color of adjacent pixels to form a single pixel. For instance, all pixels marked with a green dot are combined by averaging their colors to create one pixel.

In the example below, we apply this technique to downscale a 4x4 image to a 2x2 image, reducing the pixel count while maintaining an averaged representation of the original image.

numpy-basics-2.svg

Firstly, we group top pixels of all the four pixel group(Green, Pink, Blue and Yellow).

numpy-basics-3.svg

Similary group left, right, bottom and center pixels from all the four pixel groups and calculate avg. Below are the formulas for slicing pixel groups:

Slicing formula for top pixels: img[:2,1:1]\text{img}[:-2, 1:-1]

  • Start from 00, and throw away last 22 rows.
  • Start from 11, and throw away last 11 column.
[123],[678],[111213]\begin{matrix} [1 & 2 & 3], \\ [6 & 7 & 8], \\ [11 & 12 & 13] \end{matrix}

Slicing formula for left pixels: img[1:1,:2]\text{img}[1:-1, :-2]

  • Start from 11, and throw away last 11 row.
  • Start from 00, and throw away last 22 columns.
[567],[101112],[151617]\begin{matrix} [5 & 6 & 7], \\ [10 & 11 & 12], \\ [15 & 16 & 17] \end{matrix}

Slicing formula for right pixels: img[1:1,2:]\text{img}[1:-1, 2:]

  • Start from 11, and throw away last 11 row.
  • Start from 22, and select all columns.
[789],[121314],[171819]\begin{matrix} [7 & 8 & 9], \\ [12 & 13 & 14], \\ [17 & 18 & 19] \end{matrix}

Slicing formula for bottom pixels: img[2:,1:1]\text{img}[2:, 1:-1]

  • Start from 22, and select everything.
  • Start from 11, and throw away last 11 column.
[111213],[161718],[212223]\begin{matrix} [11 & 12 & 13], \\ [16 & 17 & 18], \\ [21 & 22 & 23] \end{matrix}

Slicing formula for center pixels: img[1:1,1:1]\text{img}[1:-1, 1:-1]

  • Start from 11, and throw away last 11 row.
  • Start from 11, and throw away last 11 column.
[678],[111213],[161718]\begin{matrix} [6 & 7 & 8], \\ [11 & 12 & 13], \\ [16 & 17 & 18] \end{matrix}
img = np.arange(25).reshape(5, 5)
# o/p:
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24]])



blurred_img = (
img[:-2, 1:-1] + # top
img[1:-1, :-2] + # left
img[1:-1, 2:] + # right
img[2:, 1:-1] + # bottom
img[1:-1, 1:-1] # center
) / 5.0


blurred_img
# o/p:
# array([[ 6., 7., 8.],
# [11., 12., 13.],
# [16., 17., 18.]])

Let's blurr an actual image. Original image looks like below:

nature.png
# using matplotlib import an nature.png image
nature_img = plt.imread("static/img/nature.png")


# blur logic
def blur_img(nature_img):
return (
nature_img[:-2, 1:-1] + # top
nature_img[1:-1, :-2] + # left
nature_img[1:-1, 2:] + # right
nature_img[2:, 1:-1] + # bottom
nature_img[1:-1, 1:-1] # center
) / 5.0


# blur once
nature_img = blur_img(nature_img)

# save logic
plt.imsave("static/img/nature_blur_1.png", nature_img)

After blurring it once image looks like below:

napture_blur_1.png

Let's blurr is 4949 more times.

for _ in range(1, 50): nature_img = blur_img(nature_img)

plt.imsave("static/img/nature_blur_50.png", nature_img)

After blurring the image 4949 more times it looks like below:

nature_blur_50.png

Fancy Indexing

a_fancy = np.arange(0, 80, 10)

# indexing by position
indices = [1, 2, 5]
y_indices = a_fancy[indices]
print(y_indices)
# o/p: [10, 20, 50]


# indexing with booleans
mask = np.array([0, 1, 1, 0, 0, 1, 0, 0], dtype=bool)
y_bool = a_fancy[mask]
print(y_bool)
# o/p: [10, 20, 50]


# replacing all the negative numbers in an array with 0
a_fancy_neg = np.array([1, 31, -1, 341, -11, 90, -7])
mask_neg = a_fancy_neg < 0
# o/p: array([False, False, True, False, True, False, True])
a_fancy_neg[mask_neg] = 0
a_fancy_neg
# o/p: array([ 1, 31, 0, 341, 0, 90, 0])

Fancy Indexing in 2D

a_fancy_2d = np.arange(25).reshape(5, 5)
# o/p:
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24]])


# retrieve diagonal elements using indexing by position
y_fancy_indices_2d = a_fancy_2d[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
print(y_fancy_indices_2d)
# o/p: [ 0 6 12 18 24]


# selecting all rows and column 0, 1, 4.
# this can't be done with slicing
# as step increases by 1 and then by 3, so we use fancy indexing.
a_fancy_2d[0:, [0, 2, 4]]
# o/p: array([[ 0, 2, 4],
# [ 5, 7, 9],
# [10, 12, 14],
# [15, 17, 19],
# [20, 22, 24]])

Array Broadcasting Rules

p = np.array([
[0, 0, 0],
[10, 10, 10],
[20, 20, 20],
[30, 30, 30]
])
# o/p:
# array([[ 0, 0, 0],
# [10, 10, 10],
# [20, 20, 20],
# [30, 30, 30]])


q = np.array([
[0, 1, 2],
[0, 1, 2],
[0, 1, 2],
[0, 1, 2]
])
# o/p:
# array([[0, 1, 2],
# [0, 1, 2],
# [0, 1, 2],
# [0, 1, 2]])


r = np.array([[0], [10], [20], [30]])
# o/p:
# array([[ 0],
# [10],
# [20],
# [30]])


s = np.array([0, 1, 2])
# o/p: array([0, 1, 2])


# p + q == q + p = p + r = s + r
# here s and r are repeated across rows or columns to match the shape.
a_1 = p + q
# o/p:
# array([[ 0, 1, 2],
# [10, 11, 12],
# [20, 21, 22],
# [30, 31, 32]])


a_2 = q + p
# o/p:
# array([[ 0, 1, 2],
# [10, 11, 12],
# [20, 21, 22],
# [30, 31, 32]])


a_3 = p + s
# o/p:
# array([[ 0, 1, 2],
# [10, 11, 12],
# [20, 21, 22],
# [30, 31, 32]])


a_4 = s + r
# o/p:
# array([[ 0, 1, 2],
# [10, 11, 12],
# [20, 21, 22],
# [30, 31, 32]])

Array Calculation Methods

a_method = np.arange(9).reshape(3, 3)
# o/p:
# array([[0, 1, 2],
# [3, 4, 5],
# [6, 7, 8]])


# sum all elements in an array
a_method.sum()
# o/p: np.int64(36)


# sum all elements row by rwo
a_method.sum(axis = 0)
# o/p: array([ 9, 12, 15])


# sum all elements col by col
a_method.sum(axis = 1)
# o/p: array([ 3, 12, 21])




# min from an array
a_method.min()
# o/p: np.int64(0)
a_method.min(axis = 0)
# o/p: array([0, 1, 2])



# max from an array
a_method.max()
# o/p: np.int64(8)
a_method.max(axis = 0)
# o/p: array([6, 7, 8])



# index of min from an array
a_method.argmin()
# o/p: np.int64(0)
a_method.argmin(axis = 0)
# o/p: array([0, 0, 0])



# index of max from an array
a_method.argmax()
# o/p: np.int64(8)
a_method.argmax(axis = 0)
# o/p: array([2, 2, 2])



# un-flatten 1D locations
# instead of saying 8 which is flatten 1D location of max element in array,
# below function gives you (2, 2)
np.unravel_index(
a_method.argmax(), a_method.shape
)
# o/p: (np.int64(2), np.int64(2))



# where
np.where(a_method == a_method.max())
# o/p: (array([2]), array([2]))