07 โ File I/O & Modules
Estimasi: 6 jam Tujuan: Baca/tulis file, parse JSON/CSV, organisir code dalam modules dan packages.
Kenapa Materi Ini Penting?
ML/AI = data. Data tinggal di file: CSV dataset, JSON config, txt corpus, parquet table, pickle model. Tanpa keahlian baca-tulis file, kamu tidak bisa load training data, tidak bisa save model checkpoint, tidak bisa export hasil eksperimen. File I/O adalah jembatan antara kode kamu dan dunia nyata.
Modules dan packages pula adalah cara mengorganisir kode kamu saat project mulai membesar. Capstone project di Dicoding nanti bukan satu file 1000 baris โ itu folder berisi puluhan file yang saling import. Skill memecah kode jadi modules yang clean adalah pembeda antara junior dan mid-level developer.
Analogi besar:
- File I/O = pintu masuk dan keluar gedung. Data masuk lewat read, data keluar lewat write.
- Context manager (
with) = penjaga pintu yang otomatis tutup pintu setelah kamu keluar.- Module = satu kotak peralatan.
- Package = lemari berisi banyak kotak peralatan, dengan label di tiap rak.
Peta Konsep
flowchart TD
A[๐ File I/O & Modules] --> B[๐ File Read/Write]
A --> C[๐ฃ๏ธ pathlib]
A --> D[๐ JSON]
A --> E[๐ CSV]
A --> F[๐ฆ Modules]
A --> G[๐ Packages]
B --> B1[open + with]
B --> B2[modes: r/w/a]
F --> F1[import]
F --> F2[__name__ == __main__]
G --> G1[__init__.py]
G --> G2[relative import]
Diagram: File I/O sebagai Pipeline
Cara Membaca Diagram
Pipeline kiri-ke-kanan: disk โ open() โ read โ Python object โ write โ disk baru. Bagian bawah: dekorator behavior penting (with statement untuk auto-close, encoding untuk hindari mojibake). Setiap tahap punya mode/spec yang harus dipilih.
Walkthrough Step-by-Step
- File di disk โ data persistent (text, binary).
open(path, mode)โ buka file dengan mode (r=read, w=write, a=append, b=binary).- Read โ
.read()(semua),.readlines()(list of line), atau iterate baris. - Python object โ string atau list di RAM, siap dimanipulasi.
- Write โ
.write(text)atau.writelines([...]). - File baru โ di-save ke disk. Mode "w" overwrite, "a" append.
withwrapper โ auto-close walau ada error. Wajib pakai.encoding="utf-8"โ wajib spesifik untuk hindari masalah Unicode di Windows.
Analogi Sehari-hari
File I/O = kantor pos. File di disk = surat di kotak surat. open() = ambil surat dari kotak. Read = baca isinya. Python object = catatan di buku notes kamu. Write = tulis surat baru. Output file = kirim surat ke kotak surat tujuan. with statement = petugas pos yang otomatis tutup kotak setelah kamu selesai โ tidak peduli kamu lupa atau ada gangguan.
Diagram statis Mermaid sebagai fallback:
flowchart LR
F[๐พ File di Disk] -->|open r| R[๐ Read]
R --> P[๐ Python Object]
P --> W[โ๏ธ Write]
W -->|open w| F2[๐พ File baru]
Bagian 1 โ File I/O Dasar
Baca File
# Cara klasik
f = open("data.txt", "r")
content = f.read()
f.close()
# Cara modern (with statement) โ WAJIB PAKAI INI
with open("data.txt", "r", encoding="utf-8") as f:
content = f.read()
# File otomatis di-close
with adalah context manager โ auto cleanup. Walaupun ada exception, file akan di-close.
Diagram: Context Manager Lifecycle
Cara Membaca Diagram
Flow kiri-ke-kanan: with โ enter โ block โ cek error โ exit. Edge atas (hijau) = jalur sukses, edge bawah (pink) = jalur error. Yang penting: kedua jalur tetap melewati exit โ itulah inti context manager.
Walkthrough Step-by-Step
with open(...) as f:โ Python panggil__enter__di context manager.__enter__โ buka resource (file, koneksi DB, GPU memory). Return object yang di-bind kef.- Block dalam with โ kode yang pakai resource. Bisa raise error.
- Cek error โ Python pantau apakah block berakhir normal atau exception.
- Sukses โ
__exit__(None, None, None)โ tutup resource โ lanjut program. - Error โ
__exit__(exc_type, exc_val, tb)โ tutup resource โ re-raise exception. - Garansi: apapun yang terjadi,
__exit__dipanggil. Resource tidak akan leak.
Analogi Sehari-hari
Context manager = petugas keamanan otomatis di gedung. Kamu masuk gedung (__enter__) โ petugas catat dan buka pintu. Kamu kerja di dalam (block). Mau keluar normal? Petugas tutup pintu (__exit__). Ada gempa di tengah kerja? Petugas tetap tutup pintu (__exit__) sebelum kamu keluar darurat. File handle, GPU memory, DB connection โ semua butuh "petugas" begini supaya tidak terbuka selamanya saat ada masalah.
Diagram statis Mermaid sebagai fallback:
flowchart TD
A[with open path as f] --> B[__enter__: buka file]
B --> C[Eksekusi block dalam with]
C --> D{Error?}
D -->|Tidak| E[__exit__: tutup file]
D -->|Ya| F[__exit__: tutup file]
F --> G[Re-raise exception]
E --> H[Lanjut program]
Analogi:
with= penjaga pintu otomatis (auto buka-tutup). Walau kamu lupa, walau ada gempa di tengah jalan, pintu pasti ditutup. File handle tidak akan bocor.
Mode Open File
| Mode | Arti |
|---|---|
"r" |
read (default), error kalau tidak ada |
"w" |
write, OVERWRITE kalau ada |
"a" |
append |
"x" |
exclusive create, error kalau sudah ada |
"b" |
binary (combine: "rb", "wb") |
"+" |
read + write ("r+", "w+") |
Hati-hati: mode
"w"HAPUS isi file kalau sudah ada. Tidak ada konfirmasi.
Cara Baca
# Read all
with open("data.txt", "r", encoding="utf-8") as f:
content = f.read() # string lengkap
# Read line by line (hemat memori untuk file besar)
with open("data.txt", "r", encoding="utf-8") as f:
for line in f:
print(line.strip()) # \n perlu di-strip
# Read all lines as list
with open("data.txt", "r", encoding="utf-8") as f:
lines = f.readlines() # list of string
Tulis File
# Write (overwrite)
with open("output.txt", "w", encoding="utf-8") as f:
f.write("Halo dunia\n")
f.write("Line 2\n")
# Write multiple lines
lines = ["Line 1", "Line 2", "Line 3"]
with open("output.txt", "w", encoding="utf-8") as f:
f.writelines(line + "\n" for line in lines)
# Append
with open("log.txt", "a", encoding="utf-8") as f:
f.write("Log entry baru\n")
Selalu pakai
encoding="utf-8"untuk text file. Default Windows kadang cp1252 yang bikin masalah saat baca file dari Linux/Mac.
Bagian 2 โ Path Handling dengan pathlib (Modern)
pathlib lebih baik dari os.path. Pakai ini.
from pathlib import Path
# Bikin path
p = Path("data/users.txt")
p = Path.home() / "Documents" / "file.txt"
p = Path.cwd() / "subfolder" / "data.csv"
# Operasi
p.exists() # True/False
p.is_file() # True kalau file
p.is_dir() # True kalau folder
p.suffix # ".txt"
p.stem # "file" (tanpa ekstensi)
p.name # "file.txt"
p.parent # parent directory
# Bikin folder
Path("output").mkdir(exist_ok=True)
Path("a/b/c").mkdir(parents=True, exist_ok=True)
# List file
for f in Path("data").iterdir():
print(f)
# Glob pattern
for f in Path("data").glob("*.csv"):
print(f)
for f in Path(".").rglob("*.py"): # recursive
print(f)
# Read/write convenience
content = Path("data.txt").read_text(encoding="utf-8")
Path("output.txt").write_text("Hello", encoding="utf-8")
# Bytes
data = Path("img.png").read_bytes()
Path("copy.png").write_bytes(data)
Aturan: kalau bisa pakai
pathlib, jangan pakaios.pathkuno. Lebih bersih, cross-platform.
Bagian 3 โ JSON
JSON = standar untuk data terstruktur. Wajib paham.
Parse JSON String
import json
# String โ dict/list
json_str = '{"nama": "Budi", "umur": 25, "hobi": ["AI", "Coding"]}'
data = json.loads(json_str)
print(data["nama"]) # "Budi"
print(type(data)) # dict
Dict/List โ JSON String
data = {"nama": "Budi", "umur": 25}
s = json.dumps(data) # compact
s = json.dumps(data, indent=2) # pretty
s = json.dumps(data, ensure_ascii=False) # untuk karakter non-ASCII (e.g. รซ, รฉ)
Read/Write JSON File
import json
from pathlib import Path
# Read
with open("data.json", "r", encoding="utf-8") as f:
data = json.load(f) # bukan loads (load = file, loads = string)
# Atau lebih ringkas dengan pathlib
import json
data = json.loads(Path("data.json").read_text(encoding="utf-8"))
# Write
with open("output.json", "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, ensure_ascii=False)
Tipe Konversi
| Python | JSON |
|---|---|
| dict | object |
| list, tuple | array |
| str | string |
| int, float | number |
| True | true |
| False | false |
| None | null |
Catatan: JSON tidak punya tuple, set, datetime. Convert dulu sebelum dump.
Common Pitfall
# Object custom tidak bisa serialize default
import datetime
data = {"now": datetime.datetime.now()}
json.dumps(data) # โ TypeError
# Pakai default function atau convert manual
data = {"now": datetime.datetime.now().isoformat()}
json.dumps(data) # โ
Bagian 4 โ CSV
CSV (Comma-Separated Values) = format tabular text.
Read CSV
import csv
with open("users.csv", "r", encoding="utf-8") as f:
reader = csv.reader(f)
header = next(reader) # baris pertama = header
for row in reader:
print(row) # row = list
# Output:
# ['nama', 'umur', 'kota']
# ['Budi', '25', 'Bandung']
# ...
DictReader (Lebih Pythonic)
with open("users.csv", "r", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
print(row["nama"], row["umur"]) # akses by key
Write CSV
data = [
["Budi", 25, "Bandung"],
["Ani", 30, "Jakarta"],
]
with open("output.csv", "w", encoding="utf-8", newline="") as f:
writer = csv.writer(f)
writer.writerow(["nama", "umur", "kota"]) # header
writer.writerows(data)
DictWriter
data = [
{"nama": "Budi", "umur": 25, "kota": "Bandung"},
{"nama": "Ani", "umur": 30, "kota": "Jakarta"},
]
with open("output.csv", "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["nama", "umur", "kota"])
writer.writeheader()
writer.writerows(data)
Wajib
newline=""saat write CSV di Windows untuk hindari double newline.
Pandas โ Lebih Powerful (Akan Dibahas di Fase 4)
import pandas as pd
df = pd.read_csv("users.csv")
df.to_csv("output.csv", index=False)
Untuk ML, biasanya pakai pandas. Tapi paham csv module dulu.
Bagian 5 โ Modules
Module = satu file .py.
Buat Module Sendiri
math_helpers.py:
PI = 3.14159
def luas_lingkaran(r):
return PI * r ** 2
def keliling_lingkaran(r):
return 2 * PI * r
main.py:
import math_helpers
print(math_helpers.luas_lingkaran(5))
print(math_helpers.PI)
Cara Import
# Import full module
import math_helpers
math_helpers.luas_lingkaran(5)
# Import specific
from math_helpers import luas_lingkaran, PI
luas_lingkaran(5)
# Alias
import math_helpers as mh
mh.luas_lingkaran(5)
import numpy as np # konvensi standar
# Import semua (HINDARI โ namespace pollution)
from math_helpers import *
__name__ == "__main__"
Pattern penting untuk file yang bisa dijalankan dan di-import:
script.py:
def main():
print("Running as script")
if __name__ == "__main__":
main()
- Kalau
python script.pyโ__name__="__main__"โmain()jalan - Kalau
import scriptdari file lain โ__name__="script"โmain()tidak jalan
Wajib pakai pattern ini untuk file yang berupa script.
Bagian 6 โ Packages
Diagram Struktur Package
flowchart TD
P[๐ my_project/] --> M[main.py]
P --> U[๐ฆ utils/]
U --> I[__init__.py]
U --> MH[math_helpers.py]
U --> SH[string_helpers.py]
U --> SUB[๐ฆ io/]
SUB --> I2[__init__.py]
SUB --> JH[json_helpers.py]
Analogi: package = lemari berlaci. Tiap laci (sub-package) berisi alat-alat (modules).
__init__.py= label di laci yang kasih tahu apa isi dalamnya.
Package = folder berisi modules + file __init__.py.
Struktur Package
my_project/
โโโ main.py
โโโ utils/
โโโ __init__.py
โโโ math_helpers.py
โโโ string_helpers.py
utils/__init__.py:
# Bisa kosong, atau:
from .math_helpers import luas_lingkaran
from .string_helpers import slugify
utils/math_helpers.py:
def luas_lingkaran(r):
return 3.14 * r ** 2
main.py:
from utils.math_helpers import luas_lingkaran
# atau (kalau di __init__.py sudah expose)
from utils import luas_lingkaran
Relative Import (Dalam Package)
utils/string_helpers.py:
from .math_helpers import luas_lingkaran # . = current package
from ..other_pkg import x # .. = parent package
Best practice: kalau ragu, pakai absolute import (
from utils.math_helpers import ...).
Bagian 7 โ Standard Library Highlights
Python punya stdlib super lengkap. Yang sering dipakai di AI:
os
import os
os.environ["API_KEY"] # baca environment variable
os.environ.get("API_KEY", default="abc")
os.cpu_count() # jumlah CPU
os.makedirs("a/b/c", exist_ok=True)
sys
import sys
sys.argv # command line args
sys.exit(0) # keluar program
sys.path # import paths
datetime
from datetime import datetime, timedelta, date
now = datetime.now()
today = date.today()
# Format
now.strftime("%Y-%m-%d %H:%M:%S") # "2026-05-13 14:30:45"
# Parse
datetime.strptime("2026-05-13", "%Y-%m-%d")
# Arithmetic
besok = now + timedelta(days=1)
seminggu_lalu = now - timedelta(weeks=1)
random
import random
random.random() # 0.0 - 1.0
random.uniform(1, 10) # float 1-10
random.randint(1, 100) # int 1-100 (inclusive)
random.choice([1, 2, 3]) # pilih random
random.sample(range(10), 3) # 3 unik dari 0-9
random.shuffle(my_list) # in-place shuffle
random.seed(42) # reproducibility (penting di ML!)
math
import math
math.pi
math.e
math.sqrt(16)
math.log(100, 10)
math.exp(1)
math.floor(3.7), math.ceil(3.2)
collections
from collections import Counter, defaultdict, deque, OrderedDict
# Counter
words = ["a", "b", "a", "c", "a", "b"]
c = Counter(words)
print(c) # Counter({'a': 3, 'b': 2, 'c': 1})
print(c.most_common(2)) # [('a', 3), ('b', 2)]
# defaultdict โ auto-create missing key
d = defaultdict(list)
d["a"].append(1) # tidak perlu cek key dulu
d["a"].append(2)
print(d["a"]) # [1, 2]
print(d["b"]) # [] (auto-create)
# deque โ fast append/pop dari kedua ujung
from collections import deque
q = deque([1, 2, 3])
q.appendleft(0) # [0, 1, 2, 3]
q.popleft() # 0
itertools
import itertools
# Permutasi & kombinasi
list(itertools.permutations([1, 2, 3], 2))
# [(1,2), (1,3), (2,1), (2,3), (3,1), (3,2)]
list(itertools.combinations([1, 2, 3], 2))
# [(1,2), (1,3), (2,3)]
list(itertools.product([1, 2], ["a", "b"]))
# [(1,'a'), (1,'b'), (2,'a'), (2,'b')]
# Chain โ gabung iterable
list(itertools.chain([1, 2], [3, 4])) # [1, 2, 3, 4]
# Enumerate dengan start
list(enumerate(["a", "b"], start=1)) # [(1, 'a'), (2, 'b')]
functools
from functools import lru_cache, reduce
# lru_cache โ memoize hasil function (untuk function pure)
@lru_cache(maxsize=None)
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)
fib(100) # cepat karena cached
# reduce
reduce(lambda x, y: x + y, [1, 2, 3, 4]) # 10
Bagian 8 โ Pip & Package Management
Install Package
pip install numpy
pip install numpy==1.24.0 # specific version
pip install "numpy>=1.20,<2.0" # range
pip install -r requirements.txt # dari file
requirements.txt
File listing dependencies:
numpy==1.24.0
pandas>=2.0
matplotlib
scikit-learn>=1.3
pip freeze > requirements.txt # export current environment
pip install -r requirements.txt # restore
Virtual Environment (Sudah Setup di Fase 1)
conda activate ai-prep # selalu aktifkan dulu
pip install nama-package
Common Mistakes & FAQ
โ Mistake 1: Tidak pakai with (lupa close)
# โ leak file handle kalau exception
f = open("data.txt")
data = f.read()
# kalau crash di sini, f tidak pernah close
f.close()
# โ
pakai with
with open("data.txt") as f:
data = f.read()
โ Mistake 2: Mode "w" overwrite tanpa peringatan
# โ kalau output.txt sudah ada, isinya HILANG total
with open("output.txt", "w") as f:
f.write("baru")
# Untuk append, pakai "a"
with open("output.txt", "a") as f:
f.write("tambah")
โ Mistake 3: Lupa encoding
# โ Encoding error random di Windows
with open("data.txt") as f:
text = f.read() # UnicodeDecodeError
# โ
Selalu spesifikasi
with open("data.txt", encoding="utf-8") as f:
text = f.read()
โ Mistake 4: Pakai os.path padahal ada pathlib
# Cara lama
import os
path = os.path.join(os.path.expanduser("~"), "Documents", "file.txt")
if os.path.exists(path):
with open(path) as f:
...
# Cara modern
from pathlib import Path
path = Path.home() / "Documents" / "file.txt"
if path.exists():
text = path.read_text(encoding="utf-8")
โ Mistake 5: Lupa newline="" saat write CSV di Windows
# โ akan ada baris kosong antara setiap row di Windows
with open("out.csv", "w") as f:
csv.writer(f).writerows(data)
# โ
with open("out.csv", "w", newline="", encoding="utf-8") as f:
csv.writer(f).writerows(data)
โ Mistake 6: Circular import
# a.py
from b import func_b
def func_a(): ...
# b.py
from a import func_a # โ circular!
def func_b(): ...
Fix: restructure (taruh shared code di file ke-3) atau import di dalam function.
FAQ
Q: JSON support tipe apa saja? A: dict, list, str, int, float, bool, null. Tidak support: tuple, set, datetime, custom object. Convert dulu sebelum dump.
Q: CSV vs JSON kapan? A:
- CSV = data tabular flat (rows ร columns), heavy data analysis
- JSON = data nested/structured, config, API response
Q: from x import * boleh?
A: Hindari. Namespace pollution, susah lacak source variabel, bisa override built-in. Kecuali sangat khusus.
Q: Apa beda import di awal file vs di dalam function?
A: Import di awal = standar (Python lazy load module sekali). Import di dalam function hanya untuk hindari circular import atau optional dependency.
Q: pathlib vs os.path? A: pathlib selalu lebih clean dan cross-platform. Pakai pathlib. os.path hanya untuk legacy code.
Q: Apakah __init__.py wajib?
A: Sejak Python 3.3 ada namespace package (tanpa __init__.py). Tapi tetap disarankan pakai __init__.py (eksplisit, bisa expose API package).
Cek Pemahaman
- Bisa baca/tulis file dengan
with open()? - Pakai pathlib untuk handle path?
- Bisa parse JSON file ke dict?
- Bisa baca CSV pakai DictReader?
- Tahu beda module dan package?
- Tahu fungsi
__name__ == "__main__"? - Tahu function-function di stdlib (datetime, random, collections)?
Challenge 2.7
Challenge 1 โ Notes App
Bikin CLI notes app yang:
- Tambah note (simpan ke
notes.json) - List semua notes
- Cari note by keyword
- Hapus note by ID
- Quit (save dan exit)
Format note:
{
"id": 1,
"title": "Belajar Python",
"content": "Hari ini belajar OOP...",
"created": "2026-05-13T10:30:00"
}
Challenge 2 โ CSV to JSON Converter
Function csv_to_json(csv_path, json_path):
- Baca CSV
- Convert ke list of dict
- Save ke JSON dengan pretty print
Bonus: handle field tipe (auto-detect int/float).
Challenge 3 โ Log Analyzer
Diberikan file log:
2026-05-13 10:30:45 [INFO] App started
2026-05-13 10:31:02 [ERROR] DB connection failed
...
Bikin script yang:
- Hitung jumlah log per level
- List 10 ERROR terbaru
- Cari log antara jam X dan Y
- Save report ke JSON
Pakai regex untuk parsing.
Challenge 4 โ Word Count CLI
Script yang baca text file dan output:
- Total kata
- Total kalimat (split by
.!?) - Top 10 kata terbanyak
- Save report ke
report.json
Run: python wordcount.py myfile.txt
Pakai sys.argv untuk dapat filename.
Challenge 5 โ Mini Database
File db.json simpan list users.
Bikin module database.py dengan function:
load_db()save_db(data)add_user(user)find_user(id)update_user(id, data)delete_user(id)
Test di main.py dengan import.
Challenge 6 โ Folder Organizer
Script yang:
- Scan folder Downloads
- Pindah file berdasarkan ekstensi:
.jpg, .pngโ folderImages/.pdfโDocuments/.mp3, .wavโMusic/- dll
Pakai pathlib dan shutil.
Challenge 7 โ Package Sendiri
Bikin package my_utils/:
__init__.pytext.pyโ function clean_text, slugifymath.pyโ function luas, kelilingfile.pyโ function load_json, save_json
Buat tests.py di luar package, import dan test.
Challenge 8 โ Mini Translator (Pakai LLM API)
Kalau punya API key Gemini (gratis tier):
- Read text dari file
- Translate ke target language pakai LLM
- Save hasil
# pip install google-generativeai
import google.generativeai as genai
genai.configure(api_key="YOUR_KEY")
model = genai.GenerativeModel("gemini-1.5-flash")
def translate(text, target_lang):
response = model.generate_content(
f"Translate to {target_lang}: {text}"
)
return response.text
Pertama kali coba LLM API. Bukan main, tapi rasa-in dulu.
Selanjutnya: 08-error-handling.md โ error handling dan debugging. 50% pekerjaan ML practitioner.