NumPy for Data Science: Essential Arrays and Operations
Master NumPy, the fundamental package for scientific computing in Python, and boost your data science skills.

NumPy for Data Science: Essential Arrays and Operations
NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
Getting Started with NumPy
First, you'll need to install NumPy:
BASH1pip install numpy
Now, let's create some basic NumPy arrays:
PYTHON1import numpy as np 2 3# Create arrays 4arr1 = np.array([1, 2, 3, 4, 5]) 5arr2 = np.zeros((3, 3)) 6arr3 = np.ones((2, 4)) 7arr4 = np.random.random((2, 2)) 8 9print("Array 1:", arr1) 10print("Array 2:\n", arr2) 11print("Array 3:\n", arr3) 12print("Array 4:\n", arr4)
Array Creation Functions
NumPy provides many functions to create arrays:
PYTHON1# Create an array with a range of values 2arr5 = np.arange(10) # [0, 1, 2, ..., 9] 3 4# Create an array with evenly spaced values 5arr6 = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1] 6 7# Create an identity matrix 8arr7 = np.eye(3) 9 10# Create an array with random integers 11arr8 = np.random.randint(1, 10, size=(3, 3)) 12 13print("Array 5:", arr5) 14print("Array 6:", arr6) 15print("Array 7:\n", arr7) 16print("Array 8:\n", arr8)
Array Indexing and Slicing
NumPy arrays can be indexed and sliced similar to Python lists, but with more capabilities:
PYTHON1# Create a 2D array 2arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) 3 4# Indexing 5print(arr[0, 0]) # 1 (first element) 6print(arr[2, 3]) # 12 (last element) 7 8# Slicing 9print(arr[0:2, 1:3]) # [[2, 3], [6, 7]] 10 11# Boolean indexing 12print(arr[arr > 5]) # [6, 7, 8, 9, 10, 11, 12] 13 14# Fancy indexing 15print(arr[[0, 2], [1, 3]]) # [2, 12]
Array Operations
NumPy provides a wide range of operations for arrays:
Element-wise Operations
PYTHON1a = np.array([1, 2, 3]) 2b = np.array([4, 5, 6]) 3 4# Addition 5print(a + b) # [5, 7, 9] 6 7# Subtraction 8print(a - b) # [-3, -3, -3] 9 10# Multiplication 11print(a * b) # [4, 10, 18] 12 13# Division 14print(a / b) # [0.25, 0.4, 0.5] 15 16# Exponentiation 17print(a ** 2) # [1, 4, 9]
Universal Functions (ufuncs)
PYTHON1# Mathematical functions 2print(np.sqrt(a)) # Square root: [1., 1.41421356, 1.73205081] 3print(np.exp(a)) # Exponential: [2.71828183, 7.3890561, 20.08553692] 4print(np.log(a)) # Natural logarithm: [0., 0.69314718, 1.09861229] 5 6# Trigonometric functions 7angles = np.array([0, np.pi/2, np.pi]) 8print(np.sin(angles)) # [0., 1., 0.] 9print(np.cos(angles)) # [1., 0., -1.]
Array Aggregation
NumPy provides functions to aggregate array values:
PYTHON1arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) 2 3# Sum 4print(np.sum(arr)) # 45 (sum of all elements) 5print(np.sum(arr, axis=0)) # [12, 15, 18] (sum of each column) 6print(np.sum(arr, axis=1)) # [6, 15, 24] (sum of each row) 7 8# Mean, min, max 9print(np.mean(arr)) # 5.0 10print(np.min(arr)) # 1 11print(np.max(arr)) # 9 12 13# Standard deviation and variance 14print(np.std(arr)) # 2.581... 15print(np.var(arr)) # 6.666...
Broadcasting
NumPy's broadcasting allows operations between arrays of different shapes:
PYTHON1# Add a scalar to an array 2arr = np.array([[1, 2, 3], [4, 5, 6]]) 3print(arr + 10) # [[11, 12, 13], [14, 15, 16]] 4 5# Add a row vector to an array 6row = np.array([10, 20, 30]) 7print(arr + row) # [[11, 22, 33], [14, 25, 36]] 8 9# Add a column vector to an array 10col = np.array([[100], [200]]) 11print(arr + col) # [[101, 102, 103], [204, 205, 206]]
Array Reshaping
You can change the shape of arrays without changing their data:
PYTHON1arr = np.arange(12) 2 3# Reshape to 3x4 array 4reshaped = arr.reshape(3, 4) 5print(reshaped) 6 7# Flatten an array 8flattened = reshaped.flatten() 9print(flattened) 10 11# Transpose an array 12transposed = reshaped.T 13print(transposed)
Linear Algebra with NumPy
NumPy provides functions for linear algebra operations:
PYTHON1a = np.array([[1, 2], [3, 4]]) 2b = np.array([[5, 6], [7, 8]]) 3 4# Matrix multiplication 5print(np.matmul(a, b)) # or use a @ b in Python 3.5+ 6 7# Determinant 8print(np.linalg.det(a)) 9 10# Inverse 11print(np.linalg.inv(a)) 12 13# Eigenvalues and eigenvectors 14eigenvalues, eigenvectors = np.linalg.eig(a) 15print("Eigenvalues:", eigenvalues) 16print("Eigenvectors:\n", eigenvectors)
Practical Example: Image Processing
NumPy is often used for image processing, where images are represented as arrays:
PYTHON1# Create a simple 5x5 image (grayscale) 2img = np.zeros((5, 5)) 3img[1:4, 1:4] = 1 # Create a white square in the middle 4print("Image:\n", img) 5 6# Apply a filter (simple blur) 7kernel = np.ones((3, 3)) / 9 # 3x3 averaging filter 8result = np.zeros_like(img) 9 10for i in range(1, img.shape[0]-1): 11 for j in range(1, img.shape[1]-1): 12 result[i, j] = np.sum(img[i-1:i+2, j-1:j+2] * kernel) 13 14print("Filtered image:\n", result)
Performance Tips
- Vectorize operations: Use NumPy's vectorized operations instead of Python loops
- Use appropriate data types: Choose the smallest data type that fits your data
- Pre-allocate arrays: Create arrays with the right size upfront instead of growing them
- Use NumPy's built-in functions: They're optimized for performance
Conclusion
NumPy is an essential library for data science in Python. Its efficient array operations and mathematical functions make it the foundation for many other data science libraries like Pandas, SciPy, and scikit-learn.
By mastering NumPy, you'll be well-equipped to tackle more complex data science tasks and work with other libraries in the Python data science ecosystem.