10 — Standard Library Tour
Estimasi: 4 jam Tujuan: Survey library standar Python yang akan sering dipakai di AI/data science. Tidak perlu hafal — tahu mereka ada dan kapan dipakai.
Kenapa Materi Ini Penting?
Python disebut "batteries included" karena standard library-nya sangat kaya. Banyak tugas yang di bahasa lain butuh dependency external, di Python sudah tersedia: parsing date, generate random number, manipulasi tanggal, regex, hashing, kompresi, threading. Tahu apa yang sudah ada di stdlib = tidak install package yang tidak perlu = project lebih ringan dan portable.
Untuk konteks AI: 90% script preprocessing dan utility kamu akan pakai stdlib. pathlib untuk file, datetime untuk timestamp eksperimen, random untuk seed reproducibility, collections.Counter untuk word frequency, argparse untuk CLI training script. Semakin lihai stdlib, semakin sedikit kamu re-invent the wheel.
Analogi besar: stdlib = kotak peralatan bawaan rumah baru. Sebelum panggil tukang (third-party library), cek dulu di kotak peralatan — banyak hal yang sudah tersedia dan cukup baik.
Peta Konsep
flowchart TD
A[📚 Standard Library] --> B[📁 File System]
A --> C[⏰ Time]
A --> D[🎲 Random]
A --> E[📊 Collections]
A --> F[🔁 Iteration]
A --> G[⚙️ Functional]
A --> H[🌐 Network]
A --> I[🧪 Testing]
B --> B1[os / pathlib]
C --> C1[datetime / time]
D --> D1[random]
E --> E1[Counter / defaultdict / deque]
F --> F1[itertools]
G --> G1[functools]
H --> H1[urllib / requests]
I --> I1[unittest / pytest]
Bagian 1 — os & pathlib
Sudah dibahas di file 07. Cheatsheet ulang:
import os
from pathlib import Path
# Environment
os.environ.get("API_KEY", "default")
# Path operations (pakai pathlib)
p = Path.home() / "Documents" / "file.txt"
p.exists(), p.is_file(), p.is_dir()
p.suffix, p.stem, p.name, p.parent
p.read_text(encoding="utf-8")
p.write_text("content", encoding="utf-8")
# Listing
list(Path(".").iterdir())
list(Path(".").glob("*.csv"))
list(Path(".").rglob("**/*.py")) # recursive
# Folder
Path("output").mkdir(exist_ok=True)
Path("a/b/c").mkdir(parents=True, exist_ok=True)
Bagian 2 — datetime
from datetime import datetime, date, time, timedelta
# Now
now = datetime.now()
today = date.today()
# Specific
dt = datetime(2026, 5, 13, 14, 30)
# Format
dt.strftime("%Y-%m-%d") # "2026-05-13"
dt.strftime("%d/%m/%Y %H:%M") # "13/05/2026 14:30"
dt.isoformat() # "2026-05-13T14:30:00"
# Parse
datetime.strptime("2026-05-13", "%Y-%m-%d")
datetime.fromisoformat("2026-05-13T14:30:00")
# Arithmetic
besok = now + timedelta(days=1)
30_hari_lalu = now - timedelta(days=30)
seminggu = timedelta(weeks=1)
# Beda waktu
selisih = besok - now
print(selisih.days, selisih.seconds)
print(selisih.total_seconds())
Format Code Penting
| Code | Arti | Contoh |
|---|---|---|
%Y |
Tahun 4 digit | 2026 |
%m |
Bulan 01-12 | 05 |
%d |
Hari 01-31 | 13 |
%H |
Jam 00-23 | 14 |
%M |
Menit | 30 |
%S |
Detik | 45 |
%a |
Hari singkat | Wed |
%A |
Hari penuh | Wednesday |
%b |
Bulan singkat | May |
%B |
Bulan penuh | May |
Bagian 3 — random
import random
# Set seed (WAJIB di ML untuk reproducibility)
random.seed(42)
# Float
random.random() # 0.0 - 1.0
random.uniform(1, 10) # float 1-10
# Integer
random.randint(1, 100) # 1-100 inclusive
random.randrange(0, 100, 5) # 0, 5, 10, ..., 95
# Pilih
random.choice(["a", "b", "c"])
random.choices(["a", "b", "c"], k=10) # with replacement
random.choices(["a", "b"], weights=[1, 9], k=10) # weighted
random.sample(range(100), k=5) # without replacement
# Shuffle
data = [1, 2, 3, 4, 5]
random.shuffle(data) # in-place
ML reproducibility: semua eksperimen ML wajib set seed. Tanpa seed, hasil random tiap run, susah debug.
random.seed(42) np.random.seed(42) # NumPy torch.manual_seed(42) # PyTorch
Bagian 4 — collections
Diagram: Pilih Collection Yang Tepat
flowchart TD
Q[Mau apa?] --> Q1{Hitung frekuensi?}
Q1 -->|Ya| C[🔢 Counter]
Q1 -->|Tidak| Q2{Auto-create key?}
Q2 -->|Ya| D[🗂️ defaultdict]
Q2 -->|Tidak| Q3{Append/pop dari kedua sisi?}
Q3 -->|Ya| DQ[↔️ deque]
Q3 -->|Tidak| Q4{Record dengan field name?}
Q4 -->|Ya| NT[🏷️ namedtuple]
Counter
from collections import Counter
text = "apel pisang apel mangga apel pisang jeruk"
words = text.split()
c = Counter(words)
print(c) # Counter({'apel': 3, 'pisang': 2, ...})
print(c.most_common(2)) # [('apel', 3), ('pisang', 2)]
# Operasi
c1 = Counter("hello")
c2 = Counter("world")
c1 + c2 # union
c1 - c2 # difference
c1 & c2 # intersection (min)
defaultdict
from collections import defaultdict
# Group by
data = [("A", 1), ("B", 2), ("A", 3), ("B", 4)]
# Tanpa defaultdict (verbose)
groups = {}
for k, v in data:
if k not in groups:
groups[k] = []
groups[k].append(v)
# Dengan defaultdict (clean)
groups = defaultdict(list)
for k, v in data:
groups[k].append(v)
print(dict(groups)) # {'A': [1, 3], 'B': [2, 4]}
deque (Double-Ended Queue)
from collections import deque
q = deque([1, 2, 3])
q.appendleft(0) # [0, 1, 2, 3]
q.append(4) # [0, 1, 2, 3, 4]
q.popleft() # 0
q.pop() # 4
# Fixed size (sliding window)
last_5 = deque(maxlen=5)
for i in range(10):
last_5.append(i)
print(list(last_5)) # [5, 6, 7, 8, 9]
namedtuple
from collections import namedtuple
Point = namedtuple("Point", ["x", "y"])
p = Point(3, 4)
print(p.x, p.y) # 3 4
print(p[0], p[1]) # 3 4 (juga bisa index)
# Lebih baik pakai dataclass di Python 3.7+
Bagian 5 — itertools
import itertools
# Permutasi (urutan penting)
list(itertools.permutations([1, 2, 3], 2))
# [(1,2), (1,3), (2,1), (2,3), (3,1), (3,2)]
# Kombinasi (urutan tidak penting)
list(itertools.combinations([1, 2, 3], 2))
# [(1,2), (1,3), (2,3)]
# Cartesian product
list(itertools.product([1, 2], ["a", "b"]))
# [(1,'a'), (1,'b'), (2,'a'), (2,'b')]
# Chain
list(itertools.chain([1, 2], [3, 4], [5]))
# [1, 2, 3, 4, 5]
# zip_longest (zip yang tidak stop di yg pendek)
list(itertools.zip_longest([1, 2, 3], ["a", "b"], fillvalue="X"))
# [(1, 'a'), (2, 'b'), (3, 'X')]
# Group by (sequential!)
data = [("a", 1), ("a", 2), ("b", 3), ("a", 4)]
for k, group in itertools.groupby(data, key=lambda x: x[0]):
print(k, list(group))
# a [('a', 1), ('a', 2)]
# b [('b', 3)]
# a [('a', 4)] ← group sequential, bukan global
# Repeat / cycle / count
list(itertools.islice(itertools.cycle([1, 2, 3]), 7))
# [1, 2, 3, 1, 2, 3, 1]
list(itertools.islice(itertools.count(10, 2), 5))
# [10, 12, 14, 16, 18]
Bagian 6 — functools
from functools import lru_cache, reduce, partial
# lru_cache — memoization
@lru_cache(maxsize=128)
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)
# reduce
reduce(lambda x, y: x + y, [1, 2, 3, 4]) # 10
reduce(lambda x, y: x * y, [1, 2, 3, 4]) # 24
# partial — pre-fill argument
def power(base, exp):
return base ** exp
square = partial(power, exp=2)
cube = partial(power, exp=3)
print(square(5)) # 25
print(cube(5)) # 125
Bagian 7 — subprocess
Jalankan command shell dari Python:
import subprocess
# Simple
result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
print(result.stdout)
print(result.returncode)
# Cross-platform — pakai sys.executable
import sys
subprocess.run([sys.executable, "script.py"])
# Pipe
result = subprocess.run("echo hello | wc -w", shell=True, capture_output=True, text=True)
Hati-hati
shell=Truedengan input user (security risk).
Bagian 8 — argparse (CLI Tool)
import argparse
parser = argparse.ArgumentParser(description="Process data")
parser.add_argument("input", help="Input file")
parser.add_argument("--output", "-o", default="output.json", help="Output file")
parser.add_argument("--verbose", "-v", action="store_true", help="Verbose mode")
parser.add_argument("--limit", type=int, default=100)
args = parser.parse_args()
print(args.input, args.output, args.verbose, args.limit)
Run:
python script.py data.txt -o result.json -v --limit 50
Modern alternative:
typer(pip install) — lebih modern, type-hints based. Tapiargparsestdlib jadi tidak butuh install.
Bagian 9 — time
import time
# Sleep
time.sleep(2) # detik
# Timestamp
ts = time.time() # unix timestamp (float)
print(ts) # 1747158045.123
# Time profiling
start = time.time()
# ... slow code ...
elapsed = time.time() - start
print(f"{elapsed:.4f}s")
# Atau pakai perf_counter (lebih akurat untuk timing)
start = time.perf_counter()
# ...
elapsed = time.perf_counter() - start
Bagian 10 — re (Regex)
Sudah dibahas di file 05. Recap:
import re
re.search(r"\d+", text) # cari pertama
re.findall(r"\d+", text) # semua match
re.sub(r"\d+", "X", text) # replace
re.split(r"[,;]", text) # split
# Compile (untuk dipakai berulang)
pattern = re.compile(r"\b\w+@\w+\.\w+\b")
emails = pattern.findall(text)
Bagian 11 — urllib & requests
urllib (stdlib):
from urllib.request import urlopen
from urllib.parse import urlparse, urlencode
# GET
with urlopen("https://api.github.com") as response:
data = response.read().decode()
# Build URL
params = urlencode({"q": "python", "lang": "en"})
url = f"https://example.com/search?{params}"
requests (third-party, lebih populer):
pip install requests
import requests
# GET
r = requests.get("https://api.github.com/users/torvalds")
print(r.status_code)
print(r.json())
# POST
r = requests.post(
"https://api.example.com/data",
json={"key": "value"},
headers={"Authorization": "Bearer token"}
)
# Timeout (penting!)
r = requests.get(url, timeout=10)
Wajib pakai
requestsuntuk HTTP. Stdliburlliblebih ribet. Untuk async, pakaihttpxatauaiohttp.
Bagian 12 — json & pickle
json (sudah dibahas):
import json
data = json.loads('{"a": 1}')
json_str = json.dumps(data, indent=2)
pickle — Python-specific serialization:
import pickle
# Save object Python apapun
with open("model.pkl", "wb") as f:
pickle.dump(model, f)
# Load
with open("model.pkl", "rb") as f:
model = pickle.load(f)
Hati-hati: pickle TIDAK aman dari sumber tidak terpercaya. Bisa execute code arbitrary saat load. Jangan unpickle file dari internet.
Bagian 13 — csv
Sudah dibahas. Recap:
import csv
with open("data.csv") as f:
reader = csv.DictReader(f)
for row in reader:
print(row["nama"])
Bagian 14 — unittest (Testing)
import unittest
def add(a, b):
return a + b
class TestAdd(unittest.TestCase):
def test_positive(self):
self.assertEqual(add(2, 3), 5)
def test_negative(self):
self.assertEqual(add(-1, -1), -2)
def test_zero(self):
self.assertEqual(add(0, 0), 0)
if __name__ == "__main__":
unittest.main()
Run: python test_file.py
Modern alternative:
pytest(third-party, simpler):def test_add_positive(): assert add(2, 3) == 5Pakai pytest di project nyata.
Bagian 15 — Library yang Wajib di Roadmap
Stdlib bagus, tapi banyak third-party essential:
| Library | Fungsi | Fase |
|---|---|---|
numpy |
Array & math | Fase 4 |
pandas |
DataFrame | Fase 4 |
matplotlib |
Plotting | Fase 4 |
seaborn |
Statistical viz | Fase 4 |
scikit-learn |
Classical ML | Fase 5 |
torch |
Deep learning | Fase 6 |
transformers (HF) |
LLM | Fase 6 |
langchain / llamaindex |
LLM apps | Fase 7 |
chromadb |
Vector DB | Fase 7 |
streamlit / gradio |
UI demo | Fase 7 |
requests |
HTTP | semua |
pytest |
Testing | semua |
python-dotenv |
env vars | semua |
Common Mistakes & FAQ
❌ Mistake 1: Pakai random tanpa seed di ML
# ❌ Hasil eksperimen tidak reproducible
import random
random.shuffle(data)
# ✅ Set seed untuk reproducibility
random.seed(42)
random.shuffle(data)
❌ Mistake 2: datetime.now() tanpa timezone
# ❌ Naive datetime, ambigu kalau cross-timezone
datetime.now()
# ✅ Pakai aware datetime
from datetime import datetime, timezone
datetime.now(timezone.utc)
❌ Mistake 3: os.path padahal bisa pakai pathlib
# ❌ Lama, verbose
import os
file = os.path.join(os.path.dirname(__file__), "data", "x.csv")
# ✅ Pathlib
from pathlib import Path
file = Path(__file__).parent / "data" / "x.csv"
❌ Mistake 4: Pakai subprocess dengan shell=True untuk input user
# ❌ command injection vulnerability!
subprocess.run(f"ls {user_input}", shell=True)
# ✅ Pakai list, tanpa shell
subprocess.run(["ls", user_input])
❌ Mistake 5: Counter di-loop ulang padahal ada method
# ❌ Manual sort
items = sorted(c.items(), key=lambda x: -x[1])[:5]
# ✅ Pakai most_common
items = c.most_common(5)
❌ Mistake 6: pickle dari sumber tidak terpercaya
# ❌ BERBAHAYA — pickle bisa eksekusi kode arbitrary
with open("downloaded.pkl", "rb") as f:
data = pickle.load(f)
# ✅ Pakai JSON kalau tidak butuh Python-specific object
FAQ
Q: pathlib vs os.path? A: Pakai pathlib. Lebih clean, cross-platform, method-based (chainable). os.path hanya untuk legacy code.
Q: time.time() vs time.perf_counter()?
A:
time.time()= wall clock (Unix timestamp), bisa dipengaruhi system clock changetime.perf_counter()= monotonic high-resolution, untuk timing benchmark
Untuk profiling, pakai perf_counter().
Q: argparse vs typer/click? A:
- argparse = stdlib, tidak butuh install
- typer/click = lebih modern, type-hint based, lebih clean
Untuk script personal, argparse cukup. Untuk CLI tool serius, typer.
Q: requests vs urllib?
A: requests jauh lebih clean. urllib untuk kasus khusus stdlib-only. Untuk async, pakai httpx atau aiohttp.
Q: pickle vs json?
A:
- pickle = Python-specific, support semua object Python, tidak aman dari sumber tidak terpercaya
- JSON = portable, language-agnostic, hanya support tipe basic
Q: unittest vs pytest?
A: pytest lebih modern, sintaks lebih simple (function biasa + assert), plugin ecosystem besar. Pakai pytest di project baru.
Q: Berapa banyak stdlib module yang harus dihafal?
A: Sedikit. Yang sering dipakai: pathlib, datetime, json, csv, re, collections, random. Sisanya tahu saja "ada library X untuk tugas Y", lookup detail saat dibutuhkan.
Cek Pemahaman
- Tahu
pathliblebih disarankan darios.path? - Bisa format & parse datetime?
- Tahu kapan pakai Counter vs defaultdict?
- Bisa pakai itertools.combinations dan permutations?
- Tahu beda lru_cache dan partial dari functools?
- Bisa bikin CLI dengan argparse?
- Tahu requests untuk HTTP?
Challenge 2.10
Challenge 1 — Date Calculator
Script yang:
- Input 2 tanggal (format YYYY-MM-DD)
- Output: selisih hari, minggu, bulan, tahun
- Hari dari tanggal 1 (Senin/Selasa/dll)
Challenge 2 — Random Password Generator
def generate_password(length=12, use_special=True):
# Mix of huruf besar, kecil, angka, special chars
pass
generate_password(16) # "Kj#9pL@2mN!xR4vQ"
Challenge 3 — File Organizer
Pakai pathlib + collections:
- Scan folder
- Group file by ekstensi (Counter)
- Print laporan
- Optional: pindah ke subfolder by ekstensi
Challenge 4 — Anagram Finder
def find_anagrams(words):
# Group kata yang anagram satu sama lain
pass
find_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"])
# [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
Pakai defaultdict + sorted key.
Challenge 5 — CLI Note App
Pakai argparse:
python notes.py add "Belajar Python hari ini"
python notes.py list
python notes.py search "python"
python notes.py delete 3
Save ke notes.json.
Challenge 6 — Web Scraper Sederhana
Pakai requests (install dulu) + re:
- Fetch HTML dari URL
- Extract semua link
- Save ke file
import requests, re
r = requests.get("https://example.com")
links = re.findall(r'href="(http[^"]+)"', r.text)
Challenge 7 — Cache Decorator
Bikin decorator @disk_cache(folder) yang cache hasil function ke disk (pickle).
- Filename based on hash of args
- Skip kalau ada cache file
- Load dari file kalau ada
Pakai pickle + hashlib.
Challenge 8 — Mini Test Suite
Pilih 3 function dari challenge sebelumnya. Bikin file test pakai pytest:
def test_clean_text():
assert clean_text(" Hello!! ") == "hello"
def test_clean_text_empty():
assert clean_text("") == ""
Run: pytest test_*.py
Selanjutnya: 11-mini-projects.md — 3 mini project untuk konsolidasi semua skill Python.