11 — Mini Projects

Estimasi: 8 jam (4 jam per project, atau pilih 1) Tujuan: Konsolidasi semua skill Python lewat project nyata. Output: 2-3 repo GitHub yang bisa kamu pamerkan.

Pilih minimal 1 project. Idealnya 2-3.

Kenapa Materi Ini Penting?

Bagian ini adalah transisi dari belajar ke praktik. Teori dan exercise itu penting, tapi recruiter Dicoding (dan recruiter manapun) tidak peduli kamu tahu list comprehension — mereka mau lihat kamu bisa selesaikan masalah end-to-end. Project di file ini didesain untuk meng-exercise semua skill di file 01-10 secara bersamaan: file I/O, OOP, error handling, CLI parsing, modular code.

Lebih strategis lagi: project di sini bisa di-evolve di fase berikutnya. Knowledge Base CLI di Project 3 bisa jadi RAG system di Fase 7. Gemini API wrapper di Project 4 jadi pondasi LLM application di Fase 6+. Investasi kamu di sini compound — tidak terbuang.

Analogi besar: project = ujian praktik mengemudi. Kamu sudah baca buku rambu, sudah hafal teori. Sekarang waktunya nyetir di jalan beneran — dengan macet, hujan, dan pengendara lain. Itu yang dinilai recruiter.

Peta Project

flowchart TD
    A[🎯 Mini Projects] --> P1[📝 Todo CLI]
    A --> P2[🌐 News Scraper]
    A --> P3[📚 Knowledge Base]
    A --> P4[🤖 Gemini Wrapper]

    P1 --> S1[OOP, JSON, argparse]
    P2 --> S2[requests, regex, error handling]
    P3 --> S3[OOP, JSON, regex, search]
    P4 --> S4[OOP, requests, env, type hints]

    P3 -.->|evolves to| RAG[🔮 RAG System Fase 7]
    P4 -.->|foundation for| LLM[🔮 LLM Apps Fase 6+]

Rekomendasi urutan:

Project 1 (Todo CLI) — paling mudah, latihan dasar lengkap

Project 3 (Knowledge Base) — strategic, akan kamu pakai lagi nanti

Project 4 (Gemini Wrapper) — pengalaman pertama LLM API

Project 2 (Scraper) — kalau masih ada waktu

Project 1 — CLI Todo App

Skill: OOP, JSON, file I/O, argparse, datetime

Diagram Arsitektur

flowchart LR
    U[👤 User] -->|CLI command| CLI[argparse]
    CLI --> TM[⚙️ TodoManager]
    TM --> T[📦 Todo dataclass]
    TM <-->|load/save| F[💾 todos.json]
    TM --> O[📤 Output ke stdout]

Spec

CLI app untuk manage todo list. Data di-save ke todos.json.

python todo.py add "Belajar Python" --priority high --due 2026-05-20
python todo.py list
python todo.py list --status pending
python todo.py done 1
python todo.py delete 2
python todo.py search "python"

Format Data

[
  {
    "id": 1,
    "title": "Belajar Python",
    "priority": "high",
    "status": "pending",
    "created": "2026-05-13T10:30:00",
    "due": "2026-05-20",
    "completed": null
  }
]

Class Structure

from dataclasses import dataclass, field, asdict
from datetime import datetime
from pathlib import Path
from typing import Optional
import json
import argparse

@dataclass
class Todo:
    id: int
    title: str
    priority: str = "medium"
    status: str = "pending"
    created: str = field(default_factory=lambda: datetime.now().isoformat())
    due: Optional[str] = None
    completed: Optional[str] = None

class TodoManager:
    def __init__(self, path="todos.json"):
        self.path = Path(path)
        self.todos: list[Todo] = []
        self.load()
    
    def load(self): ...
    def save(self): ...
    def add(self, title: str, **kwargs) -> Todo: ...
    def done(self, id: int): ...
    def delete(self, id: int): ...
    def list_todos(self, status: Optional[str] = None) -> list[Todo]: ...
    def search(self, keyword: str) -> list[Todo]: ...

def main():
    parser = argparse.ArgumentParser()
    subparsers = parser.add_subparsers(dest="command")
    
    add_parser = subparsers.add_parser("add")
    add_parser.add_argument("title")
    add_parser.add_argument("--priority", choices=["low", "medium", "high"], default="medium")
    add_parser.add_argument("--due")
    
    # ... subparser lain
    
    args = parser.parse_args()
    mgr = TodoManager()
    
    if args.command == "add":
        todo = mgr.add(args.title, priority=args.priority, due=args.due)
        print(f"Added: {todo.title}")
    # ... handle command lain

if __name__ == "__main__":
    main()

Bonus Features

Color output (pakai colorama atau ANSI codes)
Filter by priority + due date
Sort options
Export to CSV
Backup before save

Submission

Push ke GitHub: dicoding-genai-prep/projects/01-todo-cli/

README.md harus punya:

Demo (screenshot atau GIF)
Cara install + run
Daftar fitur
Code structure

Project 2 — Web Scraper Berita

Skill: requests, regex, BeautifulSoup, file I/O, error handling

Diagram Pipeline Scraper

flowchart LR
    U[🌐 URL] -->|GET| F[fetch]
    F -->|HTML| P[parse_article]
    P -->|dict| L[list articles]
    L --> S1[💾 save_json]
    L --> S2[💾 save_csv]
    F -->|error| E[🚨 logger.error]

Spec

Script yang scrape berita dari portal berita Indonesia, save ke JSON dan CSV.

Tech Stack

pip install requests beautifulsoup4 lxml

Implementation Outline

import requests
from bs4 import BeautifulSoup
from pathlib import Path
import json
import csv
from datetime import datetime
import time
import logging

logger = logging.getLogger(__name__)

class NewsScraper:
    def __init__(self, base_url: str, delay: float = 1.0):
        self.base_url = base_url
        self.delay = delay
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (educational scraping)"
        })
    
    def fetch(self, url: str) -> str:
        try:
            r = self.session.get(url, timeout=10)
            r.raise_for_status()
            return r.text
        except requests.RequestException as e:
            logger.error(f"Failed to fetch {url}: {e}")
            return ""
    
    def parse_article(self, html: str) -> dict:
        soup = BeautifulSoup(html, "lxml")
        return {
            "title": soup.find("h1").get_text(strip=True),
            "content": " ".join(p.get_text(strip=True) for p in soup.find_all("p")),
            "date": ...,
            "url": ...,
        }
    
    def get_article_urls(self, list_page_html: str) -> list[str]:
        soup = BeautifulSoup(list_page_html, "lxml")
        return [a["href"] for a in soup.select("article a")]
    
    def scrape(self, max_articles: int = 10) -> list[dict]:
        articles = []
        list_html = self.fetch(self.base_url)
        urls = self.get_article_urls(list_html)[:max_articles]
        
        for url in urls:
            time.sleep(self.delay)   # be polite
            html = self.fetch(url)
            if html:
                article = self.parse_article(html)
                article["scraped_at"] = datetime.now().isoformat()
                articles.append(article)
        
        return articles
    
    def save_json(self, articles: list[dict], path: str):
        Path(path).write_text(
            json.dumps(articles, indent=2, ensure_ascii=False),
            encoding="utf-8"
        )
    
    def save_csv(self, articles: list[dict], path: str):
        if not articles:
            return
        with open(path, "w", encoding="utf-8", newline="") as f:
            writer = csv.DictWriter(f, fieldnames=articles[0].keys())
            writer.writeheader()
            writer.writerows(articles)

if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)
    scraper = NewsScraper("https://example-news.com")
    articles = scraper.scrape(max_articles=20)
    scraper.save_json(articles, "news.json")
    scraper.save_csv(articles, "news.csv")
    print(f"Scraped {len(articles)} articles")

Aturan Etika Scraping

Cek robots.txt dulu (https://site.com/robots.txt)
Hormati rate limit — kasih delay
Set User-Agent yang masuk akal
Tidak scrape konten yang dibatasi (login required, paid content)
Tidak overload server kecil

Bonus Features

Pagination (multi-page)
Cek duplicate articles
Sentiment analysis sederhana (count keyword positif/negatif)
Schedule pakai schedule library

Submission

dicoding-genai-prep/projects/02-news-scraper/

Project 3 — Personal Knowledge Base CLI

Skill: OOP, JSON, regex, search, advanced Python

Ini project paling strategic karena nantinya bisa kamu evolve jadi RAG system di Fase 7.

Diagram Knowledge Base

classDiagram
    class Note {
        +int id
        +str title
        +str content
        +list~str~ tags
        +list~int~ links
        +str created
    }
    class KnowledgeBase {
        -Path path
        -list~Note~ notes
        +new(title, content, tags)
        +get(id)
        +search(query)
        +filter_by_tag(tag)
        +link(source, target)
        +graph()
    }
    KnowledgeBase "1" *-- "*" Note : composition

Evolusi ke RAG System (Fase 7)

flowchart LR
    KB[📚 KB sekarang<br/>keyword search] -->|tambah| EMB[🔢 Embedding per note]
    EMB --> VDB[🗄️ Vector DB]
    VDB --> SEM[🔎 Semantic search]
    SEM --> RAG[🤖 RAG Q&A]

Spec

CLI knowledge base untuk simpan catatan personal — dengan tagging, search, dan link antar note.

python kb.py new "Belajar AI" --tags "ai,learning,bootcamp"
python kb.py list --tag ai
python kb.py search "machine learning"
python kb.py view 5
python kb.py edit 5
python kb.py link 5 7      # link note 5 ke note 7
python kb.py graph         # show all links

Format Note

{
  "id": 1,
  "title": "Belajar AI",
  "content": "...markdown content...",
  "tags": ["ai", "learning"],
  "links": [3, 5],
  "created": "2026-05-13T10:00:00",
  "updated": "2026-05-13T11:00:00"
}

Implementation Outline

from dataclasses import dataclass, field, asdict
from datetime import datetime
from pathlib import Path
from typing import Optional
import json
import re
import argparse

@dataclass
class Note:
    id: int
    title: str
    content: str = ""
    tags: list[str] = field(default_factory=list)
    links: list[int] = field(default_factory=list)
    created: str = field(default_factory=lambda: datetime.now().isoformat())
    updated: Optional[str] = None

class KnowledgeBase:
    def __init__(self, path: str = "kb.json"):
        self.path = Path(path)
        self.notes: list[Note] = []
        self.load()
    
    def _next_id(self) -> int:
        return max((n.id for n in self.notes), default=0) + 1
    
    def load(self):
        if not self.path.exists():
            return
        data = json.loads(self.path.read_text(encoding="utf-8"))
        self.notes = [Note(**n) for n in data]
    
    def save(self):
        data = [asdict(n) for n in self.notes]
        self.path.write_text(
            json.dumps(data, indent=2, ensure_ascii=False),
            encoding="utf-8"
        )
    
    def new(self, title: str, content: str = "", tags: list[str] = None) -> Note:
        note = Note(
            id=self._next_id(),
            title=title,
            content=content,
            tags=tags or []
        )
        self.notes.append(note)
        self.save()
        return note
    
    def get(self, id: int) -> Optional[Note]:
        return next((n for n in self.notes if n.id == id), None)
    
    def search(self, query: str) -> list[Note]:
        """Full-text search di title + content."""
        pattern = re.compile(re.escape(query), re.IGNORECASE)
        return [
            n for n in self.notes
            if pattern.search(n.title) or pattern.search(n.content)
        ]
    
    def filter_by_tag(self, tag: str) -> list[Note]:
        return [n for n in self.notes if tag in n.tags]
    
    def link(self, source_id: int, target_id: int):
        source = self.get(source_id)
        if not source:
            raise ValueError(f"Note {source_id} not found")
        if not self.get(target_id):
            raise ValueError(f"Note {target_id} not found")
        if target_id not in source.links:
            source.links.append(target_id)
            source.updated = datetime.now().isoformat()
            self.save()
    
    def graph(self) -> dict[int, list[int]]:
        """Return adjacency list."""
        return {n.id: n.links for n in self.notes}

Bonus Features (Wajib Coba Minimal 2)

Markdown render di terminal pakai rich library
Backlinks — note A ke note B otomatis update note B punya backlink dari A
Tag autocomplete dari tag yang sudah ada
Export to markdown — generate folder notes_md/ dengan file per note
Stats — total notes, top tags, orphan notes (tanpa link)
TUI dengan textual library (advanced)

Kenapa Project Ini Strategic?

Di Fase 7, kamu akan bikin RAG system. Knowledge base ini bisa di-upgrade:

Tambah embedding ke setiap note
Save ke vector DB
Search jadi semantic, bukan keyword
Tanya jawab AI berdasarkan note kamu sendiri

Project ini menjadi dataset dan starter untuk capstone Dicoding kamu nanti.

Submission

dicoding-genai-prep/projects/03-knowledge-base/

Project 4 (Bonus) — API Wrapper Library

Skill: OOP, requests, env vars, error handling, type hints

Spec

Bikin wrapper Python untuk Gemini API (gratis tier). Jadi library kecil yang bisa kamu pakai di project lain.

from my_gemini import GeminiClient

client = GeminiClient(api_key=os.environ["GEMINI_API_KEY"])

response = client.generate("Tulis puisi tentang Bandung")
print(response.text)

# Streaming
for chunk in client.generate_stream("..."):
    print(chunk, end="", flush=True)

# Chat (multi-turn)
chat = client.start_chat()
chat.send("Halo")
chat.send("Siapa kamu?")
print(chat.history)

Implementation

import os
import requests
from typing import Optional, Iterator
from dataclasses import dataclass

@dataclass
class GenerateResponse:
    text: str
    model: str
    usage: dict

class GeminiClient:
    BASE_URL = "https://generativelanguage.googleapis.com/v1beta"
    
    def __init__(self, api_key: Optional[str] = None, model: str = "gemini-1.5-flash"):
        self.api_key = api_key or os.environ.get("GEMINI_API_KEY")
        if not self.api_key:
            raise ValueError("API key required")
        self.model = model
    
    def generate(self, prompt: str, **kwargs) -> GenerateResponse:
        url = f"{self.BASE_URL}/models/{self.model}:generateContent"
        params = {"key": self.api_key}
        body = {"contents": [{"parts": [{"text": prompt}]}]}
        
        r = requests.post(url, params=params, json=body, timeout=30)
        r.raise_for_status()
        data = r.json()
        
        text = data["candidates"][0]["content"]["parts"][0]["text"]
        return GenerateResponse(text=text, model=self.model, usage=data.get("usageMetadata", {}))

Pelajaran Penting

Tidak hardcode API key — pakai env var
Pakai python-dotenv untuk load .env file
Add .env ke .gitignore — JANGAN PERNAH commit API key
Handle error API dengan baik
Type hints untuk autocomplete IDE

Submission

dicoding-genai-prep/projects/04-gemini-wrapper/

Ini project pertama kamu pakai LLM API. Bukan main, tapi rasakan dulu sebelum Fase 6+.

Cara Submit & Showcase

Untuk setiap project:

1. Repo Structure

projects/01-todo-cli/
├── README.md           ← penjelasan, demo, cara pakai
├── requirements.txt    ← dependencies
├── src/
│   └── todo.py
├── tests/
│   └── test_todo.py
├── .gitignore
└── .env.example        ← template env vars (bukan .env asli!)

2. README Template

# Todo CLI App

Manage todos dari command line.

## Demo

![demo](demo.gif)

## Install

\`\`\`bash
git clone <repo>
cd 01-todo-cli
pip install -r requirements.txt
\`\`\`

## Usage

\`\`\`bash
python src/todo.py add "Belajar Python"
python src/todo.py list
\`\`\`

## Features

- Add, list, complete, delete todos
- Filter by status, priority
- Search by keyword
- Save to JSON

## Tech

- Python 3.11
- argparse, json, dataclasses

## Tests

\`\`\`bash
pytest tests/
\`\`\`

3. Tambah ke Profile README

Update profile README GitHub kamu dengan link project. Saat HR ngintip, project ini langsung kelihatan.

Common Mistakes & FAQ

❌ Mistake 1: Commit API key ke Git

# ❌ FATAL — exposed di history selamanya
git add .env
git commit -m "config"

# ✅ Selalu pakai .gitignore
echo ".env" >> .gitignore
echo "*.key" >> .gitignore

# ✅ Pakai .env.example sebagai template
echo "GEMINI_API_KEY=your-key-here" > .env.example

Kalau sudah terlanjur commit:

Rotate API key segera (anggap key sudah bocor)
Pakai git filter-repo untuk hapus dari history
Force push (kalau project personal)

❌ Mistake 2: Tidak ada error handling di network call

# ❌ crash pertama kali API down
response = requests.get(url)
data = response.json()

# ✅ Handle network errors + timeout
try:
    response = requests.get(url, timeout=10)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.Timeout:
    logger.warning("Request timed out, retrying...")
except requests.exceptions.HTTPError as e:
    logger.error(f"HTTP error: {e}")

❌ Mistake 3: Project tanpa README

Recruiter buka repo, lihat README kosong → close tab. README adalah storefront project kamu.

Minimal:

1 paragraf "apa ini"
Cara install + run
Screenshot/demo
3-5 fitur utama

❌ Mistake 4: Folder structure berantakan

# ❌ semua di root
todo.py
data.json
test.py
helpers.py
README.md

# ✅ terorganisir
todo-cli/
├── README.md
├── requirements.txt
├── .gitignore
├── src/
│   └── todo.py
├── tests/
│   └── test_todo.py
└── data/           ← .gitignore-d

❌ Mistake 5: Commit "wip" 50 kali

# ❌
git log
> wip
> wip2
> fix
> wip3

# ✅ commit message yang descriptive
git log
> feat: add todo deletion by id
> fix: handle missing due date in list view
> refactor: extract TodoManager save logic

FAQ

Q: Pilih project mana dulu? A: Project 1 (Todo CLI) untuk warm-up. Lalu Project 3 (KB) karena strategic untuk fase berikutnya.

Q: Perlu test untuk project mini? A: Wajib coba pytest minimal 3-5 test. Recruiter sangat appreciate kandidat yang test code-nya.

Q: Bahasa apa untuk README — Indonesia atau English? A: Untuk repo public yang ingin di-showcase ke recruiter international, English. Untuk learning lokal, ID OK.

Q: Boleh pakai LLM untuk bantu coding? A: Boleh, tapi: (1) pahami tiap baris, (2) test sendiri, (3) refactor sesuai gaya kamu. Kalau cuma copy-paste tanpa paham, kamu rugi sendiri saat interview.

Q: Berapa banyak project yang ideal di portfolio? A: 3-5 project yang dalam lebih baik dari 20 project shallow. Quality > quantity. Project yang ada README clean + test + dokumentasi proper akan menonjol.

Q: Gimana cara showcase project ke recruiter Dicoding? A:

Pin project terbaik di GitHub profile
Profile README highlight 2-3 project utama
LinkedIn post saat selesai project (with screenshot/demo)
Update CV dengan link GitHub

Cek Pemahaman

Setelah selesai minimal 1 project, pastikan kamu bisa:

Setup project Python dari scratch (folder, env, dependencies)
Bikin class dengan method dan dataclass
Read/write JSON dan CSV
Pakai argparse untuk CLI
Handle error dengan try/except + logging
Test dengan pytest
Tulis README yang clear
Push ke GitHub dengan history rapi

Tantangan Tambahan

Code review yourself: baca ulang code 1 minggu kemudian. Refactor yang jelek.
Ask LLM untuk review: kirim code-mu ke Claude/ChatGPT, minta feedback.
Live demo: record video 2-3 menit demo project. Posting di LinkedIn.
Open issue ke project lain: baca code orang lain di GitHub, kontribusi minor (typo, dokumentasi).

Selanjutnya: challenges.md — final challenge konsolidasi Fase 2 sebelum lanjut ke Fase 3.