2Python untuk AI

File I/O & Modules

6 jam16 min baca
Tujuan

Baca/tulis file, parse JSON/CSV, organisir code dalam modules dan packages.

07 โ€” File I/O & Modules

Estimasi: 6 jam Tujuan: Baca/tulis file, parse JSON/CSV, organisir code dalam modules dan packages.


Kenapa Materi Ini Penting?

ML/AI = data. Data tinggal di file: CSV dataset, JSON config, txt corpus, parquet table, pickle model. Tanpa keahlian baca-tulis file, kamu tidak bisa load training data, tidak bisa save model checkpoint, tidak bisa export hasil eksperimen. File I/O adalah jembatan antara kode kamu dan dunia nyata.

Modules dan packages pula adalah cara mengorganisir kode kamu saat project mulai membesar. Capstone project di Dicoding nanti bukan satu file 1000 baris โ€” itu folder berisi puluhan file yang saling import. Skill memecah kode jadi modules yang clean adalah pembeda antara junior dan mid-level developer.

Analogi besar:

  • File I/O = pintu masuk dan keluar gedung. Data masuk lewat read, data keluar lewat write.
  • Context manager (with) = penjaga pintu yang otomatis tutup pintu setelah kamu keluar.
  • Module = satu kotak peralatan.
  • Package = lemari berisi banyak kotak peralatan, dengan label di tiap rak.

Peta Konsep

flowchart TD
    A[๐Ÿ“ File I/O & Modules] --> B[๐Ÿ“– File Read/Write]
    A --> C[๐Ÿ›ฃ๏ธ pathlib]
    A --> D[๐Ÿ“‹ JSON]
    A --> E[๐Ÿ“Š CSV]
    A --> F[๐Ÿ“ฆ Modules]
    A --> G[๐Ÿ“š Packages]

    B --> B1[open + with]
    B --> B2[modes: r/w/a]

    F --> F1[import]
    F --> F2[__name__ == __main__]

    G --> G1[__init__.py]
    G --> G2[relative import]

Diagram: File I/O sebagai Pipeline

Cara Membaca Diagram

Pipeline kiri-ke-kanan: disk โ†’ open() โ†’ read โ†’ Python object โ†’ write โ†’ disk baru. Bagian bawah: dekorator behavior penting (with statement untuk auto-close, encoding untuk hindari mojibake). Setiap tahap punya mode/spec yang harus dipilih.

Walkthrough Step-by-Step

  1. File di disk โ€” data persistent (text, binary).
  2. open(path, mode) โ€” buka file dengan mode (r=read, w=write, a=append, b=binary).
  3. Read โ€” .read() (semua), .readlines() (list of line), atau iterate baris.
  4. Python object โ€” string atau list di RAM, siap dimanipulasi.
  5. Write โ€” .write(text) atau .writelines([...]).
  6. File baru โ€” di-save ke disk. Mode "w" overwrite, "a" append.
  7. with wrapper โ€” auto-close walau ada error. Wajib pakai.
  8. encoding="utf-8" โ€” wajib spesifik untuk hindari masalah Unicode di Windows.

Analogi Sehari-hari

File I/O = kantor pos. File di disk = surat di kotak surat. open() = ambil surat dari kotak. Read = baca isinya. Python object = catatan di buku notes kamu. Write = tulis surat baru. Output file = kirim surat ke kotak surat tujuan. with statement = petugas pos yang otomatis tutup kotak setelah kamu selesai โ€” tidak peduli kamu lupa atau ada gangguan.

Diagram statis Mermaid sebagai fallback:

flowchart LR
    F[๐Ÿ’พ File di Disk] -->|open r| R[๐Ÿ“– Read]
    R --> P[๐Ÿ Python Object]
    P --> W[โœ๏ธ Write]
    W -->|open w| F2[๐Ÿ’พ File baru]

Bagian 1 โ€” File I/O Dasar

Baca File

# Cara klasik
f = open("data.txt", "r")
content = f.read()
f.close()

# Cara modern (with statement) โ€” WAJIB PAKAI INI
with open("data.txt", "r", encoding="utf-8") as f:
    content = f.read()
# File otomatis di-close

with adalah context manager โ€” auto cleanup. Walaupun ada exception, file akan di-close.

Diagram: Context Manager Lifecycle

Cara Membaca Diagram

Flow kiri-ke-kanan: with โ†’ enter โ†’ block โ†’ cek error โ†’ exit. Edge atas (hijau) = jalur sukses, edge bawah (pink) = jalur error. Yang penting: kedua jalur tetap melewati exit โ€” itulah inti context manager.

Walkthrough Step-by-Step

  1. with open(...) as f: โ€” Python panggil __enter__ di context manager.
  2. __enter__ โ€” buka resource (file, koneksi DB, GPU memory). Return object yang di-bind ke f.
  3. Block dalam with โ€” kode yang pakai resource. Bisa raise error.
  4. Cek error โ€” Python pantau apakah block berakhir normal atau exception.
  5. Sukses โ†’ __exit__(None, None, None) โ†’ tutup resource โ†’ lanjut program.
  6. Error โ†’ __exit__(exc_type, exc_val, tb) โ†’ tutup resource โ†’ re-raise exception.
  7. Garansi: apapun yang terjadi, __exit__ dipanggil. Resource tidak akan leak.

Analogi Sehari-hari

Context manager = petugas keamanan otomatis di gedung. Kamu masuk gedung (__enter__) โ€” petugas catat dan buka pintu. Kamu kerja di dalam (block). Mau keluar normal? Petugas tutup pintu (__exit__). Ada gempa di tengah kerja? Petugas tetap tutup pintu (__exit__) sebelum kamu keluar darurat. File handle, GPU memory, DB connection โ€” semua butuh "petugas" begini supaya tidak terbuka selamanya saat ada masalah.

Diagram statis Mermaid sebagai fallback:

flowchart TD
    A[with open path as f] --> B[__enter__: buka file]
    B --> C[Eksekusi block dalam with]
    C --> D{Error?}
    D -->|Tidak| E[__exit__: tutup file]
    D -->|Ya| F[__exit__: tutup file]
    F --> G[Re-raise exception]
    E --> H[Lanjut program]

Analogi: with = penjaga pintu otomatis (auto buka-tutup). Walau kamu lupa, walau ada gempa di tengah jalan, pintu pasti ditutup. File handle tidak akan bocor.

Mode Open File

Mode Arti
"r" read (default), error kalau tidak ada
"w" write, OVERWRITE kalau ada
"a" append
"x" exclusive create, error kalau sudah ada
"b" binary (combine: "rb", "wb")
"+" read + write ("r+", "w+")

Hati-hati: mode "w" HAPUS isi file kalau sudah ada. Tidak ada konfirmasi.

Cara Baca

# Read all
with open("data.txt", "r", encoding="utf-8") as f:
    content = f.read()           # string lengkap

# Read line by line (hemat memori untuk file besar)
with open("data.txt", "r", encoding="utf-8") as f:
    for line in f:
        print(line.strip())      # \n perlu di-strip

# Read all lines as list
with open("data.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()        # list of string

Tulis File

# Write (overwrite)
with open("output.txt", "w", encoding="utf-8") as f:
    f.write("Halo dunia\n")
    f.write("Line 2\n")

# Write multiple lines
lines = ["Line 1", "Line 2", "Line 3"]
with open("output.txt", "w", encoding="utf-8") as f:
    f.writelines(line + "\n" for line in lines)

# Append
with open("log.txt", "a", encoding="utf-8") as f:
    f.write("Log entry baru\n")

Selalu pakai encoding="utf-8" untuk text file. Default Windows kadang cp1252 yang bikin masalah saat baca file dari Linux/Mac.


Bagian 2 โ€” Path Handling dengan pathlib (Modern)

pathlib lebih baik dari os.path. Pakai ini.

from pathlib import Path

# Bikin path
p = Path("data/users.txt")
p = Path.home() / "Documents" / "file.txt"
p = Path.cwd() / "subfolder" / "data.csv"

# Operasi
p.exists()              # True/False
p.is_file()             # True kalau file
p.is_dir()              # True kalau folder
p.suffix                # ".txt"
p.stem                  # "file" (tanpa ekstensi)
p.name                  # "file.txt"
p.parent                # parent directory

# Bikin folder
Path("output").mkdir(exist_ok=True)
Path("a/b/c").mkdir(parents=True, exist_ok=True)

# List file
for f in Path("data").iterdir():
    print(f)

# Glob pattern
for f in Path("data").glob("*.csv"):
    print(f)

for f in Path(".").rglob("*.py"):    # recursive
    print(f)

# Read/write convenience
content = Path("data.txt").read_text(encoding="utf-8")
Path("output.txt").write_text("Hello", encoding="utf-8")

# Bytes
data = Path("img.png").read_bytes()
Path("copy.png").write_bytes(data)

Aturan: kalau bisa pakai pathlib, jangan pakai os.path kuno. Lebih bersih, cross-platform.


Bagian 3 โ€” JSON

JSON = standar untuk data terstruktur. Wajib paham.

Parse JSON String

import json

# String โ†’ dict/list
json_str = '{"nama": "Budi", "umur": 25, "hobi": ["AI", "Coding"]}'
data = json.loads(json_str)
print(data["nama"])    # "Budi"
print(type(data))      # dict

Dict/List โ†’ JSON String

data = {"nama": "Budi", "umur": 25}

s = json.dumps(data)              # compact
s = json.dumps(data, indent=2)    # pretty
s = json.dumps(data, ensure_ascii=False)  # untuk karakter non-ASCII (e.g. รซ, รฉ)

Read/Write JSON File

import json
from pathlib import Path

# Read
with open("data.json", "r", encoding="utf-8") as f:
    data = json.load(f)         # bukan loads (load = file, loads = string)

# Atau lebih ringkas dengan pathlib
import json
data = json.loads(Path("data.json").read_text(encoding="utf-8"))

# Write
with open("output.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2, ensure_ascii=False)

Tipe Konversi

Python JSON
dict object
list, tuple array
str string
int, float number
True true
False false
None null

Catatan: JSON tidak punya tuple, set, datetime. Convert dulu sebelum dump.

Common Pitfall

# Object custom tidak bisa serialize default
import datetime

data = {"now": datetime.datetime.now()}
json.dumps(data)    # โŒ TypeError

# Pakai default function atau convert manual
data = {"now": datetime.datetime.now().isoformat()}
json.dumps(data)    # โœ…

Bagian 4 โ€” CSV

CSV (Comma-Separated Values) = format tabular text.

Read CSV

import csv

with open("users.csv", "r", encoding="utf-8") as f:
    reader = csv.reader(f)
    header = next(reader)        # baris pertama = header
    for row in reader:
        print(row)               # row = list

# Output:
# ['nama', 'umur', 'kota']
# ['Budi', '25', 'Bandung']
# ...

DictReader (Lebih Pythonic)

with open("users.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row["nama"], row["umur"])    # akses by key

Write CSV

data = [
    ["Budi", 25, "Bandung"],
    ["Ani", 30, "Jakarta"],
]

with open("output.csv", "w", encoding="utf-8", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["nama", "umur", "kota"])    # header
    writer.writerows(data)

DictWriter

data = [
    {"nama": "Budi", "umur": 25, "kota": "Bandung"},
    {"nama": "Ani", "umur": 30, "kota": "Jakarta"},
]

with open("output.csv", "w", encoding="utf-8", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["nama", "umur", "kota"])
    writer.writeheader()
    writer.writerows(data)

Wajib newline="" saat write CSV di Windows untuk hindari double newline.

Pandas โ€” Lebih Powerful (Akan Dibahas di Fase 4)

import pandas as pd

df = pd.read_csv("users.csv")
df.to_csv("output.csv", index=False)

Untuk ML, biasanya pakai pandas. Tapi paham csv module dulu.


Bagian 5 โ€” Modules

Module = satu file .py.

Buat Module Sendiri

math_helpers.py:

PI = 3.14159

def luas_lingkaran(r):
    return PI * r ** 2

def keliling_lingkaran(r):
    return 2 * PI * r

main.py:

import math_helpers

print(math_helpers.luas_lingkaran(5))
print(math_helpers.PI)

Cara Import

# Import full module
import math_helpers
math_helpers.luas_lingkaran(5)

# Import specific
from math_helpers import luas_lingkaran, PI
luas_lingkaran(5)

# Alias
import math_helpers as mh
mh.luas_lingkaran(5)

import numpy as np    # konvensi standar

# Import semua (HINDARI โ€” namespace pollution)
from math_helpers import *

__name__ == "__main__"

Pattern penting untuk file yang bisa dijalankan dan di-import:

script.py:

def main():
    print("Running as script")

if __name__ == "__main__":
    main()
  • Kalau python script.py โ†’ __name__ = "__main__" โ†’ main() jalan
  • Kalau import script dari file lain โ†’ __name__ = "script" โ†’ main() tidak jalan

Wajib pakai pattern ini untuk file yang berupa script.


Bagian 6 โ€” Packages

Diagram Struktur Package

flowchart TD
    P[๐Ÿ“š my_project/] --> M[main.py]
    P --> U[๐Ÿ“ฆ utils/]
    U --> I[__init__.py]
    U --> MH[math_helpers.py]
    U --> SH[string_helpers.py]
    U --> SUB[๐Ÿ“ฆ io/]
    SUB --> I2[__init__.py]
    SUB --> JH[json_helpers.py]

Analogi: package = lemari berlaci. Tiap laci (sub-package) berisi alat-alat (modules). __init__.py = label di laci yang kasih tahu apa isi dalamnya.

Package = folder berisi modules + file __init__.py.

Struktur Package

my_project/
โ”œโ”€โ”€ main.py
โ””โ”€โ”€ utils/
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ math_helpers.py
    โ””โ”€โ”€ string_helpers.py

utils/__init__.py:

# Bisa kosong, atau:
from .math_helpers import luas_lingkaran
from .string_helpers import slugify

utils/math_helpers.py:

def luas_lingkaran(r):
    return 3.14 * r ** 2

main.py:

from utils.math_helpers import luas_lingkaran
# atau (kalau di __init__.py sudah expose)
from utils import luas_lingkaran

Relative Import (Dalam Package)

utils/string_helpers.py:

from .math_helpers import luas_lingkaran    # . = current package
from ..other_pkg import x                    # .. = parent package

Best practice: kalau ragu, pakai absolute import (from utils.math_helpers import ...).


Bagian 7 โ€” Standard Library Highlights

Python punya stdlib super lengkap. Yang sering dipakai di AI:

os

import os

os.environ["API_KEY"]       # baca environment variable
os.environ.get("API_KEY", default="abc")
os.cpu_count()              # jumlah CPU
os.makedirs("a/b/c", exist_ok=True)

sys

import sys

sys.argv                    # command line args
sys.exit(0)                 # keluar program
sys.path                    # import paths

datetime

from datetime import datetime, timedelta, date

now = datetime.now()
today = date.today()

# Format
now.strftime("%Y-%m-%d %H:%M:%S")    # "2026-05-13 14:30:45"

# Parse
datetime.strptime("2026-05-13", "%Y-%m-%d")

# Arithmetic
besok = now + timedelta(days=1)
seminggu_lalu = now - timedelta(weeks=1)

random

import random

random.random()                 # 0.0 - 1.0
random.uniform(1, 10)           # float 1-10
random.randint(1, 100)          # int 1-100 (inclusive)
random.choice([1, 2, 3])        # pilih random
random.sample(range(10), 3)     # 3 unik dari 0-9
random.shuffle(my_list)         # in-place shuffle

random.seed(42)                 # reproducibility (penting di ML!)

math

import math

math.pi
math.e
math.sqrt(16)
math.log(100, 10)
math.exp(1)
math.floor(3.7), math.ceil(3.2)

collections

from collections import Counter, defaultdict, deque, OrderedDict

# Counter
words = ["a", "b", "a", "c", "a", "b"]
c = Counter(words)
print(c)               # Counter({'a': 3, 'b': 2, 'c': 1})
print(c.most_common(2)) # [('a', 3), ('b', 2)]

# defaultdict โ€” auto-create missing key
d = defaultdict(list)
d["a"].append(1)       # tidak perlu cek key dulu
d["a"].append(2)
print(d["a"])          # [1, 2]
print(d["b"])          # [] (auto-create)

# deque โ€” fast append/pop dari kedua ujung
from collections import deque
q = deque([1, 2, 3])
q.appendleft(0)        # [0, 1, 2, 3]
q.popleft()            # 0

itertools

import itertools

# Permutasi & kombinasi
list(itertools.permutations([1, 2, 3], 2))
# [(1,2), (1,3), (2,1), (2,3), (3,1), (3,2)]

list(itertools.combinations([1, 2, 3], 2))
# [(1,2), (1,3), (2,3)]

list(itertools.product([1, 2], ["a", "b"]))
# [(1,'a'), (1,'b'), (2,'a'), (2,'b')]

# Chain โ€” gabung iterable
list(itertools.chain([1, 2], [3, 4]))      # [1, 2, 3, 4]

# Enumerate dengan start
list(enumerate(["a", "b"], start=1))        # [(1, 'a'), (2, 'b')]

functools

from functools import lru_cache, reduce

# lru_cache โ€” memoize hasil function (untuk function pure)
@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

fib(100)    # cepat karena cached

# reduce
reduce(lambda x, y: x + y, [1, 2, 3, 4])    # 10

Bagian 8 โ€” Pip & Package Management

Install Package

pip install numpy
pip install numpy==1.24.0          # specific version
pip install "numpy>=1.20,<2.0"     # range
pip install -r requirements.txt    # dari file

requirements.txt

File listing dependencies:

numpy==1.24.0
pandas>=2.0
matplotlib
scikit-learn>=1.3
pip freeze > requirements.txt      # export current environment
pip install -r requirements.txt    # restore

Virtual Environment (Sudah Setup di Fase 1)

conda activate ai-prep             # selalu aktifkan dulu
pip install nama-package

Common Mistakes & FAQ

โŒ Mistake 1: Tidak pakai with (lupa close)

# โŒ leak file handle kalau exception
f = open("data.txt")
data = f.read()
# kalau crash di sini, f tidak pernah close
f.close()

# โœ… pakai with
with open("data.txt") as f:
    data = f.read()

โŒ Mistake 2: Mode "w" overwrite tanpa peringatan

# โŒ kalau output.txt sudah ada, isinya HILANG total
with open("output.txt", "w") as f:
    f.write("baru")

# Untuk append, pakai "a"
with open("output.txt", "a") as f:
    f.write("tambah")

โŒ Mistake 3: Lupa encoding

# โŒ Encoding error random di Windows
with open("data.txt") as f:
    text = f.read()    # UnicodeDecodeError

# โœ… Selalu spesifikasi
with open("data.txt", encoding="utf-8") as f:
    text = f.read()

โŒ Mistake 4: Pakai os.path padahal ada pathlib

# Cara lama
import os
path = os.path.join(os.path.expanduser("~"), "Documents", "file.txt")
if os.path.exists(path):
    with open(path) as f:
        ...

# Cara modern
from pathlib import Path
path = Path.home() / "Documents" / "file.txt"
if path.exists():
    text = path.read_text(encoding="utf-8")

โŒ Mistake 5: Lupa newline="" saat write CSV di Windows

# โŒ akan ada baris kosong antara setiap row di Windows
with open("out.csv", "w") as f:
    csv.writer(f).writerows(data)

# โœ…
with open("out.csv", "w", newline="", encoding="utf-8") as f:
    csv.writer(f).writerows(data)

โŒ Mistake 6: Circular import

# a.py
from b import func_b

def func_a(): ...

# b.py
from a import func_a    # โŒ circular!

def func_b(): ...

Fix: restructure (taruh shared code di file ke-3) atau import di dalam function.

FAQ

Q: JSON support tipe apa saja? A: dict, list, str, int, float, bool, null. Tidak support: tuple, set, datetime, custom object. Convert dulu sebelum dump.

Q: CSV vs JSON kapan? A:

  • CSV = data tabular flat (rows ร— columns), heavy data analysis
  • JSON = data nested/structured, config, API response

Q: from x import * boleh? A: Hindari. Namespace pollution, susah lacak source variabel, bisa override built-in. Kecuali sangat khusus.

Q: Apa beda import di awal file vs di dalam function? A: Import di awal = standar (Python lazy load module sekali). Import di dalam function hanya untuk hindari circular import atau optional dependency.

Q: pathlib vs os.path? A: pathlib selalu lebih clean dan cross-platform. Pakai pathlib. os.path hanya untuk legacy code.

Q: Apakah __init__.py wajib? A: Sejak Python 3.3 ada namespace package (tanpa __init__.py). Tapi tetap disarankan pakai __init__.py (eksplisit, bisa expose API package).


Cek Pemahaman

  • Bisa baca/tulis file dengan with open()?
  • Pakai pathlib untuk handle path?
  • Bisa parse JSON file ke dict?
  • Bisa baca CSV pakai DictReader?
  • Tahu beda module dan package?
  • Tahu fungsi __name__ == "__main__"?
  • Tahu function-function di stdlib (datetime, random, collections)?

Challenge 2.7

Challenge 1 โ€” Notes App

Bikin CLI notes app yang:

  • Tambah note (simpan ke notes.json)
  • List semua notes
  • Cari note by keyword
  • Hapus note by ID
  • Quit (save dan exit)

Format note:

{
  "id": 1,
  "title": "Belajar Python",
  "content": "Hari ini belajar OOP...",
  "created": "2026-05-13T10:30:00"
}

Challenge 2 โ€” CSV to JSON Converter

Function csv_to_json(csv_path, json_path):

  • Baca CSV
  • Convert ke list of dict
  • Save ke JSON dengan pretty print

Bonus: handle field tipe (auto-detect int/float).

Challenge 3 โ€” Log Analyzer

Diberikan file log:

2026-05-13 10:30:45 [INFO] App started
2026-05-13 10:31:02 [ERROR] DB connection failed
...

Bikin script yang:

  1. Hitung jumlah log per level
  2. List 10 ERROR terbaru
  3. Cari log antara jam X dan Y
  4. Save report ke JSON

Pakai regex untuk parsing.

Challenge 4 โ€” Word Count CLI

Script yang baca text file dan output:

  • Total kata
  • Total kalimat (split by .!?)
  • Top 10 kata terbanyak
  • Save report ke report.json

Run: python wordcount.py myfile.txt

Pakai sys.argv untuk dapat filename.

Challenge 5 โ€” Mini Database

File db.json simpan list users.

Bikin module database.py dengan function:

  • load_db()
  • save_db(data)
  • add_user(user)
  • find_user(id)
  • update_user(id, data)
  • delete_user(id)

Test di main.py dengan import.

Challenge 6 โ€” Folder Organizer

Script yang:

  • Scan folder Downloads
  • Pindah file berdasarkan ekstensi:
    • .jpg, .png โ†’ folder Images/
    • .pdf โ†’ Documents/
    • .mp3, .wav โ†’ Music/
    • dll

Pakai pathlib dan shutil.

Challenge 7 โ€” Package Sendiri

Bikin package my_utils/:

  • __init__.py
  • text.py โ€” function clean_text, slugify
  • math.py โ€” function luas, keliling
  • file.py โ€” function load_json, save_json

Buat tests.py di luar package, import dan test.

Challenge 8 โ€” Mini Translator (Pakai LLM API)

Kalau punya API key Gemini (gratis tier):

  • Read text dari file
  • Translate ke target language pakai LLM
  • Save hasil
# pip install google-generativeai
import google.generativeai as genai

genai.configure(api_key="YOUR_KEY")
model = genai.GenerativeModel("gemini-1.5-flash")

def translate(text, target_lang):
    response = model.generate_content(
        f"Translate to {target_lang}: {text}"
    )
    return response.text

Pertama kali coba LLM API. Bukan main, tapi rasa-in dulu.


Selanjutnya: 08-error-handling.md โ€” error handling dan debugging. 50% pekerjaan ML practitioner.