2Python untuk AI

Standard Library Tour

4 jam13 min baca
Tujuan

Survey library standar Python yang akan sering dipakai di AI/data science. Tidak perlu hafal — tahu mereka ada dan kapan dipakai.

10 — Standard Library Tour

Estimasi: 4 jam Tujuan: Survey library standar Python yang akan sering dipakai di AI/data science. Tidak perlu hafal — tahu mereka ada dan kapan dipakai.


Kenapa Materi Ini Penting?

Python disebut "batteries included" karena standard library-nya sangat kaya. Banyak tugas yang di bahasa lain butuh dependency external, di Python sudah tersedia: parsing date, generate random number, manipulasi tanggal, regex, hashing, kompresi, threading. Tahu apa yang sudah ada di stdlib = tidak install package yang tidak perlu = project lebih ringan dan portable.

Untuk konteks AI: 90% script preprocessing dan utility kamu akan pakai stdlib. pathlib untuk file, datetime untuk timestamp eksperimen, random untuk seed reproducibility, collections.Counter untuk word frequency, argparse untuk CLI training script. Semakin lihai stdlib, semakin sedikit kamu re-invent the wheel.

Analogi besar: stdlib = kotak peralatan bawaan rumah baru. Sebelum panggil tukang (third-party library), cek dulu di kotak peralatan — banyak hal yang sudah tersedia dan cukup baik.


Peta Konsep

flowchart TD
    A[📚 Standard Library] --> B[📁 File System]
    A --> C[⏰ Time]
    A --> D[🎲 Random]
    A --> E[📊 Collections]
    A --> F[🔁 Iteration]
    A --> G[⚙️ Functional]
    A --> H[🌐 Network]
    A --> I[🧪 Testing]

    B --> B1[os / pathlib]
    C --> C1[datetime / time]
    D --> D1[random]
    E --> E1[Counter / defaultdict / deque]
    F --> F1[itertools]
    G --> G1[functools]
    H --> H1[urllib / requests]
    I --> I1[unittest / pytest]

Bagian 1 — os & pathlib

Sudah dibahas di file 07. Cheatsheet ulang:

import os
from pathlib import Path

# Environment
os.environ.get("API_KEY", "default")

# Path operations (pakai pathlib)
p = Path.home() / "Documents" / "file.txt"
p.exists(), p.is_file(), p.is_dir()
p.suffix, p.stem, p.name, p.parent
p.read_text(encoding="utf-8")
p.write_text("content", encoding="utf-8")

# Listing
list(Path(".").iterdir())
list(Path(".").glob("*.csv"))
list(Path(".").rglob("**/*.py"))    # recursive

# Folder
Path("output").mkdir(exist_ok=True)
Path("a/b/c").mkdir(parents=True, exist_ok=True)

Bagian 2 — datetime

from datetime import datetime, date, time, timedelta

# Now
now = datetime.now()
today = date.today()

# Specific
dt = datetime(2026, 5, 13, 14, 30)

# Format
dt.strftime("%Y-%m-%d")           # "2026-05-13"
dt.strftime("%d/%m/%Y %H:%M")     # "13/05/2026 14:30"
dt.isoformat()                    # "2026-05-13T14:30:00"

# Parse
datetime.strptime("2026-05-13", "%Y-%m-%d")
datetime.fromisoformat("2026-05-13T14:30:00")

# Arithmetic
besok = now + timedelta(days=1)
30_hari_lalu = now - timedelta(days=30)
seminggu = timedelta(weeks=1)

# Beda waktu
selisih = besok - now
print(selisih.days, selisih.seconds)
print(selisih.total_seconds())

Format Code Penting

Code Arti Contoh
%Y Tahun 4 digit 2026
%m Bulan 01-12 05
%d Hari 01-31 13
%H Jam 00-23 14
%M Menit 30
%S Detik 45
%a Hari singkat Wed
%A Hari penuh Wednesday
%b Bulan singkat May
%B Bulan penuh May

Bagian 3 — random

import random

# Set seed (WAJIB di ML untuk reproducibility)
random.seed(42)

# Float
random.random()           # 0.0 - 1.0
random.uniform(1, 10)     # float 1-10

# Integer
random.randint(1, 100)    # 1-100 inclusive
random.randrange(0, 100, 5)  # 0, 5, 10, ..., 95

# Pilih
random.choice(["a", "b", "c"])
random.choices(["a", "b", "c"], k=10)         # with replacement
random.choices(["a", "b"], weights=[1, 9], k=10)  # weighted

random.sample(range(100), k=5)                # without replacement

# Shuffle
data = [1, 2, 3, 4, 5]
random.shuffle(data)      # in-place

ML reproducibility: semua eksperimen ML wajib set seed. Tanpa seed, hasil random tiap run, susah debug.

random.seed(42)
np.random.seed(42)        # NumPy
torch.manual_seed(42)     # PyTorch

Bagian 4 — collections

Diagram: Pilih Collection Yang Tepat

flowchart TD
    Q[Mau apa?] --> Q1{Hitung frekuensi?}
    Q1 -->|Ya| C[🔢 Counter]
    Q1 -->|Tidak| Q2{Auto-create key?}
    Q2 -->|Ya| D[🗂️ defaultdict]
    Q2 -->|Tidak| Q3{Append/pop dari kedua sisi?}
    Q3 -->|Ya| DQ[↔️ deque]
    Q3 -->|Tidak| Q4{Record dengan field name?}
    Q4 -->|Ya| NT[🏷️ namedtuple]

Counter

from collections import Counter

text = "apel pisang apel mangga apel pisang jeruk"
words = text.split()

c = Counter(words)
print(c)                  # Counter({'apel': 3, 'pisang': 2, ...})
print(c.most_common(2))   # [('apel', 3), ('pisang', 2)]

# Operasi
c1 = Counter("hello")
c2 = Counter("world")
c1 + c2                   # union
c1 - c2                   # difference
c1 & c2                   # intersection (min)

defaultdict

from collections import defaultdict

# Group by
data = [("A", 1), ("B", 2), ("A", 3), ("B", 4)]

# Tanpa defaultdict (verbose)
groups = {}
for k, v in data:
    if k not in groups:
        groups[k] = []
    groups[k].append(v)

# Dengan defaultdict (clean)
groups = defaultdict(list)
for k, v in data:
    groups[k].append(v)

print(dict(groups))    # {'A': [1, 3], 'B': [2, 4]}

deque (Double-Ended Queue)

from collections import deque

q = deque([1, 2, 3])
q.appendleft(0)        # [0, 1, 2, 3]
q.append(4)            # [0, 1, 2, 3, 4]
q.popleft()            # 0
q.pop()                # 4

# Fixed size (sliding window)
last_5 = deque(maxlen=5)
for i in range(10):
    last_5.append(i)
print(list(last_5))    # [5, 6, 7, 8, 9]

namedtuple

from collections import namedtuple

Point = namedtuple("Point", ["x", "y"])
p = Point(3, 4)
print(p.x, p.y)        # 3 4
print(p[0], p[1])      # 3 4 (juga bisa index)

# Lebih baik pakai dataclass di Python 3.7+

Bagian 5 — itertools

import itertools

# Permutasi (urutan penting)
list(itertools.permutations([1, 2, 3], 2))
# [(1,2), (1,3), (2,1), (2,3), (3,1), (3,2)]

# Kombinasi (urutan tidak penting)
list(itertools.combinations([1, 2, 3], 2))
# [(1,2), (1,3), (2,3)]

# Cartesian product
list(itertools.product([1, 2], ["a", "b"]))
# [(1,'a'), (1,'b'), (2,'a'), (2,'b')]

# Chain
list(itertools.chain([1, 2], [3, 4], [5]))
# [1, 2, 3, 4, 5]

# zip_longest (zip yang tidak stop di yg pendek)
list(itertools.zip_longest([1, 2, 3], ["a", "b"], fillvalue="X"))
# [(1, 'a'), (2, 'b'), (3, 'X')]

# Group by (sequential!)
data = [("a", 1), ("a", 2), ("b", 3), ("a", 4)]
for k, group in itertools.groupby(data, key=lambda x: x[0]):
    print(k, list(group))
# a [('a', 1), ('a', 2)]
# b [('b', 3)]
# a [('a', 4)]    ← group sequential, bukan global

# Repeat / cycle / count
list(itertools.islice(itertools.cycle([1, 2, 3]), 7))
# [1, 2, 3, 1, 2, 3, 1]

list(itertools.islice(itertools.count(10, 2), 5))
# [10, 12, 14, 16, 18]

Bagian 6 — functools

from functools import lru_cache, reduce, partial

# lru_cache — memoization
@lru_cache(maxsize=128)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

# reduce
reduce(lambda x, y: x + y, [1, 2, 3, 4])    # 10
reduce(lambda x, y: x * y, [1, 2, 3, 4])    # 24

# partial — pre-fill argument
def power(base, exp):
    return base ** exp

square = partial(power, exp=2)
cube = partial(power, exp=3)

print(square(5))    # 25
print(cube(5))      # 125

Bagian 7 — subprocess

Jalankan command shell dari Python:

import subprocess

# Simple
result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
print(result.stdout)
print(result.returncode)

# Cross-platform — pakai sys.executable
import sys
subprocess.run([sys.executable, "script.py"])

# Pipe
result = subprocess.run("echo hello | wc -w", shell=True, capture_output=True, text=True)

Hati-hati shell=True dengan input user (security risk).


Bagian 8 — argparse (CLI Tool)

import argparse

parser = argparse.ArgumentParser(description="Process data")
parser.add_argument("input", help="Input file")
parser.add_argument("--output", "-o", default="output.json", help="Output file")
parser.add_argument("--verbose", "-v", action="store_true", help="Verbose mode")
parser.add_argument("--limit", type=int, default=100)

args = parser.parse_args()

print(args.input, args.output, args.verbose, args.limit)

Run:

python script.py data.txt -o result.json -v --limit 50

Modern alternative: typer (pip install) — lebih modern, type-hints based. Tapi argparse stdlib jadi tidak butuh install.


Bagian 9 — time

import time

# Sleep
time.sleep(2)    # detik

# Timestamp
ts = time.time()              # unix timestamp (float)
print(ts)                     # 1747158045.123

# Time profiling
start = time.time()
# ... slow code ...
elapsed = time.time() - start
print(f"{elapsed:.4f}s")

# Atau pakai perf_counter (lebih akurat untuk timing)
start = time.perf_counter()
# ...
elapsed = time.perf_counter() - start

Bagian 10 — re (Regex)

Sudah dibahas di file 05. Recap:

import re

re.search(r"\d+", text)       # cari pertama
re.findall(r"\d+", text)      # semua match
re.sub(r"\d+", "X", text)     # replace
re.split(r"[,;]", text)       # split

# Compile (untuk dipakai berulang)
pattern = re.compile(r"\b\w+@\w+\.\w+\b")
emails = pattern.findall(text)

Bagian 11 — urllib & requests

urllib (stdlib):

from urllib.request import urlopen
from urllib.parse import urlparse, urlencode

# GET
with urlopen("https://api.github.com") as response:
    data = response.read().decode()

# Build URL
params = urlencode({"q": "python", "lang": "en"})
url = f"https://example.com/search?{params}"

requests (third-party, lebih populer):

pip install requests
import requests

# GET
r = requests.get("https://api.github.com/users/torvalds")
print(r.status_code)
print(r.json())

# POST
r = requests.post(
    "https://api.example.com/data",
    json={"key": "value"},
    headers={"Authorization": "Bearer token"}
)

# Timeout (penting!)
r = requests.get(url, timeout=10)

Wajib pakai requests untuk HTTP. Stdlib urllib lebih ribet. Untuk async, pakai httpx atau aiohttp.


Bagian 12 — json & pickle

json (sudah dibahas):

import json
data = json.loads('{"a": 1}')
json_str = json.dumps(data, indent=2)

pickle — Python-specific serialization:

import pickle

# Save object Python apapun
with open("model.pkl", "wb") as f:
    pickle.dump(model, f)

# Load
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

Hati-hati: pickle TIDAK aman dari sumber tidak terpercaya. Bisa execute code arbitrary saat load. Jangan unpickle file dari internet.


Bagian 13 — csv

Sudah dibahas. Recap:

import csv

with open("data.csv") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row["nama"])

Bagian 14 — unittest (Testing)

import unittest

def add(a, b):
    return a + b

class TestAdd(unittest.TestCase):
    def test_positive(self):
        self.assertEqual(add(2, 3), 5)
    
    def test_negative(self):
        self.assertEqual(add(-1, -1), -2)
    
    def test_zero(self):
        self.assertEqual(add(0, 0), 0)

if __name__ == "__main__":
    unittest.main()

Run: python test_file.py

Modern alternative: pytest (third-party, simpler):

def test_add_positive():
    assert add(2, 3) == 5

Pakai pytest di project nyata.


Bagian 15 — Library yang Wajib di Roadmap

Stdlib bagus, tapi banyak third-party essential:

Library Fungsi Fase
numpy Array & math Fase 4
pandas DataFrame Fase 4
matplotlib Plotting Fase 4
seaborn Statistical viz Fase 4
scikit-learn Classical ML Fase 5
torch Deep learning Fase 6
transformers (HF) LLM Fase 6
langchain / llamaindex LLM apps Fase 7
chromadb Vector DB Fase 7
streamlit / gradio UI demo Fase 7
requests HTTP semua
pytest Testing semua
python-dotenv env vars semua

Common Mistakes & FAQ

❌ Mistake 1: Pakai random tanpa seed di ML

# ❌ Hasil eksperimen tidak reproducible
import random
random.shuffle(data)

# ✅ Set seed untuk reproducibility
random.seed(42)
random.shuffle(data)

❌ Mistake 2: datetime.now() tanpa timezone

# ❌ Naive datetime, ambigu kalau cross-timezone
datetime.now()

# ✅ Pakai aware datetime
from datetime import datetime, timezone
datetime.now(timezone.utc)

❌ Mistake 3: os.path padahal bisa pakai pathlib

# ❌ Lama, verbose
import os
file = os.path.join(os.path.dirname(__file__), "data", "x.csv")

# ✅ Pathlib
from pathlib import Path
file = Path(__file__).parent / "data" / "x.csv"

❌ Mistake 4: Pakai subprocess dengan shell=True untuk input user

# ❌ command injection vulnerability!
subprocess.run(f"ls {user_input}", shell=True)

# ✅ Pakai list, tanpa shell
subprocess.run(["ls", user_input])

❌ Mistake 5: Counter di-loop ulang padahal ada method

# ❌ Manual sort
items = sorted(c.items(), key=lambda x: -x[1])[:5]

# ✅ Pakai most_common
items = c.most_common(5)

❌ Mistake 6: pickle dari sumber tidak terpercaya

# ❌ BERBAHAYA — pickle bisa eksekusi kode arbitrary
with open("downloaded.pkl", "rb") as f:
    data = pickle.load(f)

# ✅ Pakai JSON kalau tidak butuh Python-specific object

FAQ

Q: pathlib vs os.path? A: Pakai pathlib. Lebih clean, cross-platform, method-based (chainable). os.path hanya untuk legacy code.

Q: time.time() vs time.perf_counter()? A:

  • time.time() = wall clock (Unix timestamp), bisa dipengaruhi system clock change
  • time.perf_counter() = monotonic high-resolution, untuk timing benchmark

Untuk profiling, pakai perf_counter().

Q: argparse vs typer/click? A:

  • argparse = stdlib, tidak butuh install
  • typer/click = lebih modern, type-hint based, lebih clean

Untuk script personal, argparse cukup. Untuk CLI tool serius, typer.

Q: requests vs urllib? A: requests jauh lebih clean. urllib untuk kasus khusus stdlib-only. Untuk async, pakai httpx atau aiohttp.

Q: pickle vs json? A:

  • pickle = Python-specific, support semua object Python, tidak aman dari sumber tidak terpercaya
  • JSON = portable, language-agnostic, hanya support tipe basic

Q: unittest vs pytest? A: pytest lebih modern, sintaks lebih simple (function biasa + assert), plugin ecosystem besar. Pakai pytest di project baru.

Q: Berapa banyak stdlib module yang harus dihafal? A: Sedikit. Yang sering dipakai: pathlib, datetime, json, csv, re, collections, random. Sisanya tahu saja "ada library X untuk tugas Y", lookup detail saat dibutuhkan.


Cek Pemahaman

  • Tahu pathlib lebih disarankan dari os.path?
  • Bisa format & parse datetime?
  • Tahu kapan pakai Counter vs defaultdict?
  • Bisa pakai itertools.combinations dan permutations?
  • Tahu beda lru_cache dan partial dari functools?
  • Bisa bikin CLI dengan argparse?
  • Tahu requests untuk HTTP?

Challenge 2.10

Challenge 1 — Date Calculator

Script yang:

  • Input 2 tanggal (format YYYY-MM-DD)
  • Output: selisih hari, minggu, bulan, tahun
  • Hari dari tanggal 1 (Senin/Selasa/dll)

Challenge 2 — Random Password Generator

def generate_password(length=12, use_special=True):
    # Mix of huruf besar, kecil, angka, special chars
    pass

generate_password(16)    # "Kj#9pL@2mN!xR4vQ"

Challenge 3 — File Organizer

Pakai pathlib + collections:

  1. Scan folder
  2. Group file by ekstensi (Counter)
  3. Print laporan
  4. Optional: pindah ke subfolder by ekstensi

Challenge 4 — Anagram Finder

def find_anagrams(words):
    # Group kata yang anagram satu sama lain
    pass

find_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"])
# [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]

Pakai defaultdict + sorted key.

Challenge 5 — CLI Note App

Pakai argparse:

python notes.py add "Belajar Python hari ini"
python notes.py list
python notes.py search "python"
python notes.py delete 3

Save ke notes.json.

Challenge 6 — Web Scraper Sederhana

Pakai requests (install dulu) + re:

  • Fetch HTML dari URL
  • Extract semua link
  • Save ke file
import requests, re
r = requests.get("https://example.com")
links = re.findall(r'href="(http[^"]+)"', r.text)

Challenge 7 — Cache Decorator

Bikin decorator @disk_cache(folder) yang cache hasil function ke disk (pickle).

  • Filename based on hash of args
  • Skip kalau ada cache file
  • Load dari file kalau ada

Pakai pickle + hashlib.

Challenge 8 — Mini Test Suite

Pilih 3 function dari challenge sebelumnya. Bikin file test pakai pytest:

def test_clean_text():
    assert clean_text("  Hello!! ") == "hello"

def test_clean_text_empty():
    assert clean_text("") == ""

Run: pytest test_*.py


Selanjutnya: 11-mini-projects.md — 3 mini project untuk konsolidasi semua skill Python.