01 — NumPy

Estimasi: 5 jam Tujuan: NumPy = fondasi semua scientific computing Python. Pandas, scikit-learn, PyTorch, TensorFlow — semuanya di atas NumPy.

Kenapa Materi Ini Penting?

NumPy bukan sekadar library — ini adalah lapisan dasar yang menopang seluruh ekosistem data science dan AI di Python. Pandas dibangun di atas NumPy. Scikit-learn memproses data dalam format NumPy array. PyTorch tensor pada dasarnya adalah NumPy array dengan GPU support. Kalau kamu tidak paham cara kerja array, broadcasting, dan vectorized operations, kamu akan terus bingung kenapa kode ML-mu error atau lambat. Menguasai NumPy sekarang berarti kamu punya "bahasa ibu" untuk semua komputasi numerik yang akan kamu temui di bootcamp.

Bayangkan NumPy sebagai rak buku berdimensi yang super rapi: setiap "buku" (angka) punya alamat pasti, ukuran semua sama, dan kamu bisa ambil banyak buku sekaligus tanpa harus berjalan satu-satu. Kalau Python list itu seperti tas belanja campur aduk (boleh isi apa saja, tapi ribet kalau dihitung), NumPy array itu seperti rak gudang yang sudah dikategorikan, di-label, dan siap untuk forklift.

Di kelas LLM/GenAI nanti, hampir setiap operasi yang kamu lakukan ke embeddings, attention weights, atau token logits akan jadi operasi NumPy/tensor. Memahami array shape, broadcasting, dan vectorization sekarang akan menghemat berjam-jam debug nantinya saat kamu lihat error shape mismatch (768,) and (1, 768).

Peta Materi NumPy

Cara Membaca Diagram:

Tiga node menampilkan progresi shape: 1D vector, 2D matrix, 3D tensor.
Edge antar node menandai penambahan dimensi.

Walkthrough Step-by-Step:

1D shape (4,) = list angka berlabel posisi.
Tambah satu sumbu → 2D shape (2, 3) = tabel.
Tambah satu sumbu lagi → 3D shape (32, 28, 28) = batch image.

Analogi Sehari-hari: rak buku di perpustakaan. 1D = satu baris buku, 2D = satu rak, 3D = satu lemari berisi banyak rak.

Diagram statis Mermaid sebagai fallback:

flowchart TD
    A["Python List<br/>lambat & boros memori"] --> B["NumPy Array<br/>ndarray"]
    B --> C["Properties<br/>shape, dtype, ndim"]
    B --> D["Indexing & Slicing<br/>1D, 2D, boolean"]
    B --> E["Operasi<br/>elementwise, matmul"]
    E --> F["Broadcasting<br/>shape otomatis fit"]
    E --> G["Vectorization<br/>tanpa loop Python"]
    F --> H["Aplikasi ML<br/>embeddings, weights, gradients"]
    G --> H

Bagian 1 — Kenapa NumPy?

Python List vs NumPy Array

Analogi: Loop Python = sepeda kayuh satu per satu. Vectorization NumPy = bus yang angkut 1000 penumpang sekaligus. Tujuannya sama, tapi waktunya beda jauh.

Cara Membaca Diagram:

Kolom kiri = pipeline Python: angka diproses satu per satu.
Kolom kanan = pipeline NumPy: array diproses sekaligus.
Edge "60x lebih cepat" = ringkasan benchmark khas.

Walkthrough Step-by-Step:

Python list dengan loop: setiap elemen masuk satu-satu, ada overhead interpreter.
NumPy array: operasi elemen dijalankan oleh kode C/SIMD, tanpa loop Python.
Pakai operator vectorized (arr * 2, arr ** 2) supaya cepat.

Analogi Sehari-hari: mengantre kasir vs swalayan dengan banyak kasir paralel. Hasil sama, tapi total waktu jauh lebih cepat di paralel.

Diagram statis Mermaid sebagai fallback:

flowchart LR
    subgraph Python["Python List + Loop"]
        P1["x1"] --> P2["proses"]
        P2 --> P3["x2"]
        P3 --> P4["proses"]
        P4 --> P5["...satu per satu"]
    end
    subgraph NumPy["NumPy Vectorized"]
        N1["x1, x2, x3, ..., xN"] --> N2["proses<br/>SEMUA sekaligus"]
    end

# List Python
lst = [1, 2, 3, 4, 5]
result = [x * 2 for x in lst]    # loop manual

# NumPy
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
result = arr * 2                  # vectorized! tanpa loop
# Output: array([ 2,  4,  6,  8, 10])

Speed Comparison

import time
import numpy as np

# Python list
lst = list(range(10_000_000))

start = time.time()
result = [x ** 2 for x in lst]
print(f"Python: {time.time() - start:.2f}s")    # ~3s

# NumPy
arr = np.arange(10_000_000)
start = time.time()
result = arr ** 2
print(f"NumPy: {time.time() - start:.2f}s")     # ~0.05s

NumPy 60x lebih cepat karena:

Implementasi C/Fortran
Memory contiguous
SIMD instruction
No Python overhead

Visualisasi Shape Array (1D, 2D, 3D)

flowchart TB
    subgraph D1["1D — vector"]
        V["[a, b, c, d]<br/>shape: (4,)"]
    end
    subgraph D2["2D — matrix / tabel"]
        M["[[a, b, c],<br/>&nbsp;[d, e, f]]<br/>shape: (2, 3)"]
    end
    subgraph D3["3D — tensor / batch image"]
        T["shape: (batch, height, width)<br/>misal (32, 28, 28)"]
    end
    D1 --> D2 --> D3

Aturan baca shape: dari luar ke dalam. Shape (2, 3, 4) artinya ada 2 blok, tiap blok 3 baris, tiap baris 4 kolom. Di ML, dimensi pertama biasanya batch size.

Bagian 2 — Membuat Array

import numpy as np

# Dari list
np.array([1, 2, 3])
np.array([[1, 2], [3, 4]])    # 2D

# Special
np.zeros(5)                    # [0. 0. 0. 0. 0.]
np.zeros((2, 3))              # 2x3 zeros
np.ones((3, 3))
np.eye(3)                     # identity matrix
np.full((2, 2), 7)            # filled with 7

# Ranges
np.arange(10)                  # [0, 1, ..., 9]
np.arange(0, 1, 0.1)          # [0, 0.1, 0.2, ..., 0.9]
np.linspace(0, 1, 5)          # [0, 0.25, 0.5, 0.75, 1.0]

# Random
np.random.seed(42)
np.random.rand(3)              # uniform [0, 1)
np.random.randn(3)             # standard normal
np.random.randint(0, 10, size=5)  # 5 ints 0-9
np.random.choice([1, 2, 3], size=10, replace=True)

Bagian 3 — Properties

arr = np.array([[1, 2, 3], [4, 5, 6]])

arr.shape       # (2, 3) — dimensi
arr.ndim        # 2 — jumlah dimensi
arr.size        # 6 — total element
arr.dtype       # dtype('int64')

# Convert dtype
arr.astype(np.float32)
arr.astype('int32')

Bagian 4 — Indexing & Slicing

1D

arr = np.arange(10)
arr[0]          # 0
arr[-1]         # 9
arr[2:5]        # [2, 3, 4]
arr[::2]        # every 2nd: [0, 2, 4, 6, 8]
arr[::-1]       # reverse

2D

M = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# [row, col]
M[0, 0]         # 1
M[1, 2]         # 6
M[0]            # row 0: [1, 2, 3]
M[:, 0]         # col 0: [1, 4, 7]
M[0:2, 1:3]     # submatrix:
                # [[2, 3],
                #  [5, 6]]
M[1, :]         # row 1
M[:, 1]         # col 1

Fancy Indexing

arr = np.array([10, 20, 30, 40, 50])

# Index dengan list
arr[[0, 2, 4]]      # [10, 30, 50]

# Boolean mask (SUPER PENTING)
mask = arr > 25
arr[mask]            # [30, 40, 50]
arr[arr > 25]        # shortcut

# Multi condition
arr[(arr > 15) & (arr < 45)]    # [20, 30, 40]

Pakai & dan |, BUKAN and dan or untuk array.

Bagian 5 — Operasi Aritmatika

Element-wise

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

a + b           # [5, 7, 9]
a - b           # [-3, -3, -3]
a * b           # [4, 10, 18]   element-wise!
a / b           # [0.25, 0.4, 0.5]
a ** 2          # [1, 4, 9]
np.sqrt(a)      # [1, 1.41, 1.73]

Dengan Scalar

a + 10          # [11, 12, 13]
a * 2           # [2, 4, 6]

Matrix Multiplication

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

A @ B           # matrix multiplication
A.dot(B)        # equivalent
A * B           # ELEMENT-WISE (Hadamard), beda!

Bagian 6 — Universal Functions (ufuncs)

Function yang apply element-wise:

arr = np.array([1, 2, 3, 4])

np.sin(arr)
np.exp(arr)
np.log(arr)
np.log2(arr)
np.abs(arr)
np.maximum(arr, 2)    # element-wise max
np.minimum(arr, 3)

Reduction

arr = np.array([1, 2, 3, 4, 5])

arr.sum()              # 15
arr.mean()             # 3.0
arr.std()              # 1.41
arr.min(), arr.max()
arr.argmin(), arr.argmax()    # index dari min/max

# 2D
M = np.array([[1, 2, 3], [4, 5, 6]])
M.sum()                 # 21 (semua)
M.sum(axis=0)           # [5, 7, 9] (sum per kolom)
M.sum(axis=1)           # [6, 15]   (sum per baris)

Aturan axis: axis=0 = collapse rows (operate per col), axis=1 = collapse cols (per row).

Bagian 7 — Reshape & Transpose

arr = np.arange(12)
arr.reshape(3, 4)      # 3 baris 4 kolom
arr.reshape(4, 3)
arr.reshape(-1, 3)     # -1 = auto-calculate

M = np.arange(12).reshape(3, 4)
M.T                    # transpose
M.flatten()            # 1D
M.ravel()              # 1D (view, not copy)

# Squeeze (hapus dimensi size 1)
arr3d = np.array([[[1, 2, 3]]])    # shape (1, 1, 3)
arr3d.squeeze()                      # shape (3,)

# Add axis
arr = np.array([1, 2, 3])
arr[:, None]           # shape (3, 1)
arr[None, :]           # shape (1, 3)

Bagian 8 — Broadcasting

Magic NumPy untuk operasi antar shape berbeda.

Analogi: Broadcasting = otomatis "stretch" array yang lebih kecil supaya pas dengan yang besar, tanpa benar-benar copy datanya. Bayangkan kamu punya 1 stempel dan 1 lembar 100 baris — kamu mau stempel itu muncul di tiap baris. Broadcasting "pretend" stempelnya ada 100, tanpa kamu fotokopi 100 kali.

Visualisasi Aturan Broadcasting

Cara Membaca Diagram:

Dua node kiri = shape A dan B yang akan dioperasikan.
Node tengah = aturan compare dari kanan, kemudian decision "sama atau salah satu = 1?".
Dua node kanan = hasil sukses (broadcast OK) atau gagal (ValueError).

Walkthrough Step-by-Step:

Sejajarkan shape dari kanan, dimensi terdalam dulu.
Untuk tiap dimensi, periksa: identik atau salah satu == 1? Jika ya, lanjut.
Jika ada dimensi tidak cocok dan bukan 1 → ValueError.

Analogi Sehari-hari: mencetak stempel ke selembar kertas berisi banyak baris. Stempel kecil bisa dipakai berulang ke seluruh baris kalau ukurannya cocok.

Diagram statis Mermaid sebagai fallback:

flowchart TD
    A["Shape A: (2, 3)"] --> C["Compare<br/>dari KANAN"]
    B["Shape B: (3,)"] --> C
    C --> D{"Tiap dim:<br/>sama atau salah satu = 1?"}
    D -->|"YES"| E["Broadcast OK<br/>(2, 3)"]
    D -->|"NO"| F["ValueError:<br/>shapes not aligned"]

Aturan

Compare shape dari kanan
Match jika sama, atau salah satu = 1
Broadcast (pretend duplicate) yang dimensi 1

Contoh

M = np.array([[1, 2, 3], [4, 5, 6]])    # (2, 3)
v = np.array([10, 20, 30])               # (3,)

M + v
# v di-broadcast jadi [[10,20,30], [10,20,30]]
# Hasil: [[11, 22, 33], [14, 25, 36]]

# Common pattern: subtract column mean
data = np.random.randn(100, 5)
col_mean = data.mean(axis=0)            # (5,)
centered = data - col_mean              # broadcasting

# 2D + column vector
A = np.array([[1, 2], [3, 4]])      # (2, 2)
v = np.array([[10], [20]])          # (2, 1)

A + v
# v broadcast jadi [[10, 10], [20, 20]]
# Hasil: [[11, 12], [23, 24]]

Tabel Cepat: Broadcasting Compatible?

Shape A	Shape B	Hasil	Keterangan
(3,)	(3,)	(3,)	identik, OK
(2, 3)	(3,)	(2, 3)	B di-stretch ke tiap row
(2, 3)	(2, 1)	(2, 3)	B di-stretch ke tiap col
(2, 3)	(1, 3)	(2, 3)	B di-stretch ke tiap row
(2, 3)	(2,)	ERROR	dim terakhir 3 vs 2, tidak match
(4, 1, 3)	(5, 3)	(4, 5, 3)	broadcast multi-dim

Contoh Lengkap (Runnable)

import numpy as np

# Standardize tiap fitur (kolom) — pola yang SANGAT umum di ML
np.random.seed(0)
X = np.random.randn(5, 3) * 10 + 50    # 5 sample, 3 fitur
print("Sebelum:")
print(X.mean(axis=0))    # mean per kolom

mean = X.mean(axis=0)    # shape (3,)
std = X.std(axis=0)      # shape (3,)
X_std = (X - mean) / std # broadcasting: (5,3) - (3,) → OK

print("Setelah:")
print(X_std.mean(axis=0).round(4))   # ≈ [0., 0., 0.]
print(X_std.std(axis=0).round(4))    # ≈ [1., 1., 1.]

Bagian 9 — Stack & Split

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Concat
np.concatenate([a, b])         # [1, 2, 3, 4, 5, 6]
np.concatenate([a, b], axis=0) # vertical for 2D

# Stack
np.vstack([a, b])              # [[1,2,3], [4,5,6]]
np.hstack([a, b])              # [1, 2, 3, 4, 5, 6]
np.stack([a, b], axis=0)       # 2D
np.stack([a, b], axis=1)       # 2D, shape (3, 2)

# Split
np.split(np.arange(10), 2)     # 2 chunks
np.split(np.arange(10), [3, 7]) # split di index 3 dan 7

Bagian 10 — Linear Algebra

A = np.array([[1, 2], [3, 4]])

# Determinant
np.linalg.det(A)

# Inverse
np.linalg.inv(A)

# Eigenvalue & eigenvector
eigenvalues, eigenvectors = np.linalg.eig(A)

# Norm
np.linalg.norm([3, 4])         # 5.0

# Solve Ax = b
b = np.array([5, 11])
x = np.linalg.solve(A, b)

# SVD
U, S, V = np.linalg.svd(A)

Bagian 11 — Random

np.random.seed(42)             # reproducibility

# Sampling
np.random.rand(3)              # uniform [0, 1)
np.random.uniform(low=-1, high=1, size=5)
np.random.randn(3, 4)          # standard normal
np.random.normal(loc=10, scale=2, size=100)

np.random.randint(0, 100, size=10)
np.random.choice([1, 2, 3], size=5, p=[0.5, 0.3, 0.2])

# Shuffle
arr = np.arange(10)
np.random.shuffle(arr)         # in-place
np.random.permutation(arr)     # return new

Bagian 12 — File I/O

# Save
np.save("data.npy", arr)              # binary
np.savetxt("data.csv", arr, delimiter=",")
np.savez("data.npz", a=arr1, b=arr2)  # multiple arrays

# Load
arr = np.load("data.npy")
arr = np.loadtxt("data.csv", delimiter=",")
data = np.load("data.npz")
arr1, arr2 = data["a"], data["b"]

Bagian 13 — Performance Tips

Vectorize, Don't Loop

# JELEK
result = np.zeros(1000)
for i in range(1000):
    result[i] = arr[i] ** 2

# BAIK
result = arr ** 2

Pakai Built-in Aggregation

# JELEK
total = 0
for x in arr:
    total += x

# BAIK
total = arr.sum()

View vs Copy

arr = np.arange(10)
view = arr[2:5]           # view (not copy!)
view[0] = 99              # juga modifikasi arr!
print(arr)                # [0, 1, 99, 3, ..., 9]

# Force copy
copy = arr[2:5].copy()

Avoid Mixed Types

# JELEK (fallback ke object dtype, lambat)
arr = np.array([1, "two", 3.0])

# BAIK
arr = np.array([1, 2, 3], dtype=np.int32)

Common Mistakes / FAQ

1. Pakai `and`/`or` di array → ValueError

arr = np.array([1, 2, 3])
# arr > 1 and arr < 3   # ValueError!
arr[(arr > 1) & (arr < 3)]   # benar, pakai &

Operator and/or Python mengharapkan single boolean, bukan array. Pakai &, |, ~ dengan kurung.

2. Lupa Bedanya `*` vs `@`

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
A * B    # element-wise (Hadamard): [[5,12],[21,32]]
A @ B    # matrix multiplication:   [[19,22],[43,50]]

Di ML, kamu hampir selalu butuh @ untuk forward pass / matmul.

3. View vs Copy — Modify "Tanpa Sengaja"

arr = np.arange(10)
sub = arr[2:5]    # ini VIEW, bukan copy
sub[0] = 999
print(arr)        # arr juga ikut berubah!

Pakai .copy() kalau kamu mau independen.

4. Reshape Bingung dengan `-1`

arr = np.arange(12)
arr.reshape(3, -1)   # auto-calculate jadi (3, 4)
arr.reshape(-1, 6)   # auto jadi (2, 6)

-1 artinya "tolong NumPy hitungkan dimensi ini". Hanya boleh ada satu -1.

5. Axis Bikin Pusing

Operasi	Hasil
`M.sum()`	scalar (semua)
`M.sum(axis=0)`	"collapse rows", hasil per kolom
`M.sum(axis=1)`	"collapse cols", hasil per baris

Cara ingat: axis=k berarti dimensi ke-k hilang dari output.

6. `np.array([1, "two", 3.0])` Jadi Lambat

NumPy fallback ke dtype=object, kehilangan semua keuntungan kecepatan. Pastikan dtype homogen.

7. Random Tanpa Seed → Tidak Reproducible

np.random.seed(42)   # selalu set seed sebelum eksperimen

Cek Pemahaman

Bisa bikin array dari list, ranges, random?
Tahu shape, ndim, dtype?
Bisa indexing 2D dengan [row, col]?
Tahu boolean mask?
Tahu beda * (element-wise) dan @ (matmul)?
Paham broadcasting rules?
Bisa reshape dan transpose?

Challenge 4.1

Challenge 1 — Basic Operations

import numpy as np

# 1. Bikin array 1-100
# 2. Filter genap saja
# 3. Hitung sum, mean, std
# 4. Reshape jadi 10x10
# 5. Transpose
# 6. Flatten kembali ke 1D

Challenge 2 — Random Data Manipulation

Generate matrix 50×5 random normal (seed 42)
Hitung mean per kolom
Hitung std per kolom
Standardize (subtract mean, divide std)
Verify: mean ≈ 0, std ≈ 1 setelah standardization

Challenge 3 — Boolean Mask

np.random.seed(0)
data = np.random.randint(0, 100, size=20)

Filter > 50
Filter prima (cek manual atau fancy)
Hitung berapa yang > mean
Replace yang < 30 jadi 0

Challenge 4 — Matrix Operations

Bikin 2 matrix random 3×3
Hitung A @ B
Hitung A * B (element-wise)
Hitung det(A) dan inv(A)
Verify: A @ inv(A) ≈ identity

Challenge 5 — Broadcasting Practice

Bikin matrix 5×3 random
Subtract row mean dari setiap row
Subtract col mean dari setiap col
Normalize: each col jadi mean=0, std=1

Challenge 6 — Image as Array

# Image grayscale = 2D matrix
img = np.random.randint(0, 256, size=(100, 100), dtype=np.uint8)

Plot dengan plt.imshow(img, cmap="gray")
Apply blur: convolve dengan 3×3 kernel of 1/9 (manual)
Apply edge detection (Sobel) — search kernelnya
Rotate 90 degree: pakai np.rot90

Challenge 7 — Speed Test

Bandingkan loop vs vectorize:

import time

# Sum of squares
n = 1_000_000

# Pure Python
start = time.time()
total = 0
for i in range(n):
    total += i ** 2
print(f"Python: {time.time() - start:.4f}s")

# NumPy
arr = np.arange(n)
start = time.time()
total = (arr ** 2).sum()
print(f"NumPy:  {time.time() - start:.4f}s")

Hitung speedup.

Challenge 8 — Mini Linear Regression

Implementasi linear regression analytical solution:

w = (X.T @ X)⁻¹ @ X.T @ y

np.random.seed(42)
X = np.random.randn(100, 2)
y = X @ np.array([2, -1]) + np.random.randn(100) * 0.1

# Add bias column
X_b = np.hstack([np.ones((100, 1)), X])

# Solve
w = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y
print(w)    # [bias, w1, w2] — should be ~[0, 2, -1]

Selanjutnya: 02-pandas-basics.md

NumPy

01 — NumPy

Kenapa Materi Ini Penting?

Peta Materi NumPy

Bagian 1 — Kenapa NumPy?

Python List vs NumPy Array

Speed Comparison

Visualisasi Shape Array (1D, 2D, 3D)

Bagian 2 — Membuat Array

Bagian 3 — Properties

Bagian 4 — Indexing & Slicing

1D

2D

Fancy Indexing

Bagian 5 — Operasi Aritmatika

Element-wise

Dengan Scalar

Matrix Multiplication

Bagian 6 — Universal Functions (ufuncs)

Reduction

Bagian 7 — Reshape & Transpose

Bagian 8 — Broadcasting

Visualisasi Aturan Broadcasting

Aturan

Contoh

Tabel Cepat: Broadcasting Compatible?

Contoh Lengkap (Runnable)

Bagian 9 — Stack & Split

Bagian 10 — Linear Algebra

Bagian 11 — Random

Bagian 12 — File I/O

Bagian 13 — Performance Tips

Vectorize, Don't Loop

Pakai Built-in Aggregation

View vs Copy

Avoid Mixed Types

Common Mistakes / FAQ

1. Pakai and/or di array → ValueError

2. Lupa Bedanya * vs @

3. View vs Copy — Modify "Tanpa Sengaja"

4. Reshape Bingung dengan -1

5. Axis Bikin Pusing

6. np.array([1, "two", 3.0]) Jadi Lambat

7. Random Tanpa Seed → Tidak Reproducible

Cek Pemahaman

Challenge 4.1

Challenge 1 — Basic Operations

Challenge 2 — Random Data Manipulation

Challenge 3 — Boolean Mask

Challenge 4 — Matrix Operations

Challenge 5 — Broadcasting Practice

Challenge 6 — Image as Array

Challenge 7 — Speed Test

Challenge 8 — Mini Linear Regression

1. Pakai `and`/`or` di array → ValueError

2. Lupa Bedanya `*` vs `@`

4. Reshape Bingung dengan `-1`

6. `np.array([1, "two", 3.0])` Jadi Lambat