04 — Deployment

Estimasi: 4 jam Tujuan: Deploy capstone kamu ke web — gratis. Link bisa dibagikan ke siapa saja.

Kenapa Materi Ini Penting?

Project di laptop sendiri = "nggak ada". Recruiter dan interviewer tidak akan minta clone repo lalu setup Python environment hanya untuk lihat capstone kamu — mereka klik link, dan kalau tidak ada link, mereka skip ke kandidat berikutnya. Junior LLM developer dievaluasi bukan dari kode di GitHub saja, tapi dari bukti kalau project itu hidup di production: ada URL, ada UX, ada handle untuk error. Materi ini melatih kebiasaan paling penting di industri AI: ship, bukan cuma build. Sekali kamu bisa deploy 1 RAG ke Streamlit Cloud / HF Spaces, kamu sudah selangkah di depan 80% pelamar entry-level.

Analogi: Deploy = Pindahkan Restoran dari Dapur Rumah ke Mall

Local dev (streamlit run app.py) = masak di dapur rumah, cuma kamu yang bisa nyobain
Streamlit Cloud / HF Spaces = sewa stand di food court mall, traffic dari pengunjung mall, infrastruktur listrik & AC ditanggung mall (gratis tier)
Render / Railway / VPS = sewa ruko sendiri, lebih banyak kontrol, harus urus lebih banyak hal
AWS/GCP custom = bikin chain restoran sendiri, kompleks tapi maksimal scale

Untuk capstone, kamu mau food court — cepat, gratis, link bisa dibagikan ke recruiter.

Diagram: Pipeline Deployment

Cara Membaca Diagram:

Ungu kiri = stage local (code + test).
Cyan = GitHub sebagai remote source of truth.
Amber = build step otomatis (install deps, build container).
Pink = deploy ke platform (Streamlit Cloud / HF Spaces / Render).
Emerald = live URL yang dapat diakses publik.
Garis putus balik ke code = feedback loop dari monitoring.

Walkthrough Step-by-Step:

Code di laptop, jalankan test lokal.
git push ke GitHub.
Webhook trigger build otomatis di platform.
Container atau env baru dibuat dengan requirements.txt.
App live di URL publik.
Logs/metrics di-monitor; bug atau lambat → balik ke step 1.

Analogi Sehari-hari: Seperti continuous restaurant operation. Resep diuji di dapur (local), dikirim ke cabang (GitHub), staff cabang siapkan bahan (build), restaurant buka (deploy), customer review (monitor), chef tweak resep (feedback).

Diagram statis Mermaid sebagai fallback:

flowchart LR
    subgraph Dev["💻 Local Dev"]
        C["📝 Code"]
        T["🧪 Test"]
        C --> T
    end
    subgraph Repo["🐙 GitHub"]
        G["main branch"]
    end
    subgraph Build["🔨 Build (auto)"]
        I["📦 Install requirements"]
        B["🛠️ Build container"]
        I --> B
    end
    subgraph Deploy["🚀 Deploy"]
        S["🌐 Streamlit Cloud<br/>HF Spaces<br/>Render"]
    end
    subgraph Monitor["📊 Monitor"]
        L["📜 Logs"]
        M["📈 Metrics"]
    end
    Dev -->|git push| Repo
    Repo -->|webhook| Build
    Build --> Deploy
    Deploy --> Monitor
    Monitor -.feedback.-> Dev

Diagram: Pilihan Platform Deployment

Cara Membaca Diagram:

Amber kiri = pertanyaan awal: mau deploy apa?
Cyan tengah = 3 cabang utama berdasarkan stack: Streamlit, AI/ML demo, FastAPI/Flask.
Emerald kanan = platform yang cocok untuk masing-masing.

Walkthrough Step-by-Step:

Tanya: pakai Streamlit? Iya → Streamlit Cloud (paling cepat 1-click).
Tanya: AI/ML demo dengan Gradio? Iya → HF Spaces (credibility tinggi).
Tanya: FastAPI/Flask backend? → Render atau Railway untuk full-stack.
Frontend Next.js modern? → Vercel.
Multi-platform deployment OK — deploy ke 2-3 sekaligus untuk audience berbeda.

Analogi Sehari-hari: Mau jualan dimana? Bazar food court (Streamlit Cloud — cepat, gratis, traffic ada). Stand di event AI conference (HF Spaces — segmented audience). Sewa ruko sendiri (Render/Railway — kontrol penuh). Mall premium (Vercel — frontend canggih).

Diagram statis Mermaid sebagai fallback:

flowchart TB
    Q["🤔 Mau deploy apa?"] --> ST{"Streamlit app?"}
    ST -->|Yes| S1["🥇 Streamlit Cloud<br/>(1-click, 1GB RAM)"]
    ST -->|Yes| S2["🤗 HF Spaces<br/>(Streamlit/Gradio)"]
    ST -->|No, FastAPI/Flask| F1["🚂 Railway / Render<br/>(full-stack)"]
    ST -->|No, Next.js frontend| F2["▲ Vercel<br/>(serverless)"]
    Q --> AI{"AI/ML demo?"}
    AI -->|Yes| S2
    Q --> P{"Production scale?"}
    P -->|Yes| F1
    P -->|Yes, custom| F3["☁️ AWS / GCP / Fly.io"]

Pilihan Platform Gratis

1. Streamlit Community Cloud (Recommended)

1-click deploy dari GitHub
Free tier: 1GB RAM, sleeps after inactivity
URL: username-app.streamlit.app
Best for: Streamlit apps

2. Hugging Face Spaces

Free tier generous
Support Streamlit, Gradio, Docker
URL: huggingface.co/spaces/username/app
Best for: AI/ML demos

3. Render

750 jam/bulan free
Sleep after inactivity
Best for: full-stack apps (FastAPI + frontend)

4. Railway

$5 kredit/bulan free
Good for full-stack
Best for: production-ready

Untuk Capstone Streamlit: Streamlit Cloud paling simple.

Tabel Perbandingan Platform Deployment

Platform	Free Tier	Setup Effort	Cocok Untuk	Cold Start	Custom Domain
Streamlit Cloud	1GB RAM, 1 app aktif, sleep saat idle	⭐ 1-click dari GitHub	Streamlit demos, capstone	~30-60 detik	Tidak (gratis)
Hugging Face Spaces	CPU basic, generous quota, sleep saat idle	⭐⭐ git push ke HF repo	AI/ML demos, Gradio/Streamlit	~30-60 detik	Tidak (gratis)
Render	750 jam/bulan, sleep saat idle	⭐⭐ Connect repo + Dockerfile	FastAPI, Flask, full-stack	~30 detik	✅ ada
Railway	$5 kredit/bulan	⭐⭐ Connect repo	Production-ready, multi-service	Cepat (kalau aktif)	✅ ada
Vercel	Generous, serverless	⭐⭐ git push	Next.js, React, API routes	Cepat	✅ ada
Fly.io	3 VM kecil gratis	⭐⭐⭐ CLI + config	Container apps, edge	Cepat	✅ ada
PythonAnywhere	1 web app gratis	⭐⭐ Manual upload	Flask/Django sederhana	Selalu hidup	Tidak (gratis)
VPS (Hetzner, Contabo)	Mulai $4-5/bulan	⭐⭐⭐⭐ SSH + Nginx + sistem	Full kontrol, multi-app	Selalu hidup	✅ ada

Recommendation order untuk capstone GenAI:

Streamlit Cloud (kalau pure Streamlit)

HF Spaces (kalau mau credibility "deployed at HuggingFace")

Render/Railway (kalau ada FastAPI backend)

Deploy ke Streamlit Cloud

Step 1: Push ke GitHub

cd personal-kb-assistant
git init
git add .
git commit -m "Initial: RAG chatbot"
gh repo create --public --source=. --push

Pastikan .env di .gitignore!

Step 2: Buat `requirements.txt`

pip freeze > requirements.txt

Atau curated minimal:

streamlit
llama-index
llama-index-llms-gemini
llama-index-embeddings-huggingface
sentence-transformers
pypdf
python-docx
python-dotenv

Step 3: Deploy

Buka share.streamlit.io
Sign in dengan GitHub
Click "New app"
Pilih repo, branch, file (app.py)
Add Secrets (di settings):
```
GEMINI_API_KEY = "your_key"
```
Deploy

App live dalam 2-3 menit di https://yourapp.streamlit.app.

Step 4: Update Secrets

Streamlit Secrets diakses dengan:

import streamlit as st

api_key = st.secrets["GEMINI_API_KEY"]

Update app.py untuk pakai ini, fallback ke .env lokal:

import os
import streamlit as st

api_key = st.secrets.get("GEMINI_API_KEY") or os.environ.get("GEMINI_API_KEY")

Deploy ke Hugging Face Spaces

Step 1: Bikin Space

huggingface.co/new-space
Owner: your username
Space name: app-name
SDK: Streamlit
Hardware: CPU basic (free)
Visibility: public

Step 2: Push

# HF Spaces = git repo
git remote add hf https://huggingface.co/spaces/username/app-name
git push hf main

Step 3: Add Secrets

Go to space settings
"Variables and secrets"
Add GEMINI_API_KEY

Step 4: Done

Auto-deploy dalam 5 menit. URL: huggingface.co/spaces/username/app-name.

File Konfigurasi HF Spaces (README.md di root)

HF Spaces butuh metadata di README.md (frontmatter YAML):

---
title: Personal KB Assistant
emoji: 📚
colorFrom: indigo
colorTo: pink
sdk: streamlit
sdk_version: 1.32.0
app_file: app.py
pinned: false
license: mit
---

# Personal Knowledge Assistant

RAG chatbot for personal documents. See [GitHub](https://github.com/...) for details.

Setelah commit file ini, HF Spaces tahu cara build dan launch app kamu.

Deploy Issues & Solutions

Issue 1: Memory Limit

Streamlit Cloud free tier = 1GB RAM. Kalau OOM:

Pakai embedding model lebih kecil: all-MiniLM-L6-v2 (90MB) instead of BERT (400MB)
Lazy load model
Limit upload size di UI

Issue 2: Cold Start (Lambat First Request)

Free tier sleep saat tidak aktif. First request setelah sleep = 30-60 detik.

Mitigation: kasih loading message yang clear.

Issue 3: API Cost di Production

Public app = banyak yang test. Set rate limit + monitoring:

@st.cache_resource(ttl=3600)
def get_rag():
    return RAGEngine()

# Track usage per session
if "queries_count" not in st.session_state:
    st.session_state.queries_count = 0

if st.session_state.queries_count > 20:
    st.warning("Rate limit. Try again in 1 hour.")
    st.stop()

Issue 4: Secrets Bocor

Cek .gitignore:

.env
.streamlit/secrets.toml
*.pkl
*.npy
storage/

Periksa git history untuk pastikan .env tidak pernah ke-commit:

git log --all --full-history -- .env

Kalau pernah ke-commit, rotate API key segera + clean history.

FastAPI Deployment (Bonus)

Kalau bikin REST API instead of UI:

# api.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from src.rag import RAGEngine

app = FastAPI(title="KB Assistant API")
rag = RAGEngine()
rag.load()

class QueryRequest(BaseModel):
    question: str
    top_k: int = 3

class QueryResponse(BaseModel):
    answer: str
    sources: list

@app.post("/query", response_model=QueryResponse)
def query(req: QueryRequest):
    if not rag.index:
        raise HTTPException(503, "Index not ready")
    return rag.query(req.question, top_k=req.top_k)

@app.get("/health")
def health():
    return {"status": "ok"}

Deploy ke Render / Railway / Fly.io.

Monitoring & Observability

Production app butuh tracking:

import logging
import time

logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(message)s")
logger = logging.getLogger(__name__)

def query_with_logging(question: str):
    start = time.time()
    try:
        result = rag.query(question)
        elapsed = time.time() - start
        
        logger.info(f"Query: {question[:50]} | Time: {elapsed:.2f}s | Sources: {len(result['sources'])}")
        return result
    except Exception as e:
        logger.error(f"Query failed: {e}")
        raise

Free Monitoring

Streamlit Cloud: built-in logs
HF Spaces: logs di settings
External: Helicone (LLM call tracking)

Performance Tips

1. Cache Index

@st.cache_resource
def load_rag():
    rag = RAGEngine()
    rag.load()
    return rag

rag = load_rag()    # cached across sessions

2. Pre-build Index

Jangan biarkan user upload + index live. Pre-build:

python scripts/build_index.py
git add storage/
git commit -m "Add prebuild index"
git push

Deploy dengan index sudah ready.

3. Async untuk Multiple Query

Kalau pernah extend ke API:

async def batch_query(questions):
    return await asyncio.gather(*[rag.aquery(q) for q in questions])

Checklist Deployment

Common Mistakes & FAQ

Mistake 1: Hardcoding API Key di Code

# ❌ Bocor saat push ke GitHub
api_key = "AIzaSy..."

# ✅ Pakai secrets / env var
api_key = st.secrets.get("GEMINI_API_KEY") or os.environ.get("GEMINI_API_KEY")

Kalau pernah ke-commit, rotate API key langsung, jangan cuma hapus dari latest commit.

Mistake 2: requirements.txt Tidak Lengkap

App jalan di lokal tapi crash di cloud karena ada package yang lupa dimasukkan. Selalu test:

# Bikin venv bersih, install dari requirements, jalankan app
python -m venv test-env
source test-env/bin/activate  # atau: test-env\Scripts\activate di Windows
pip install -r requirements.txt
streamlit run app.py

Mistake 3: Lupa Pin Versi

# ❌ Build berbeda tiap kali
streamlit
llama-index

# ✅ Reproducible
streamlit==1.32.0
llama-index==0.10.20

Mistake 4: Index/Storage Tidak Ke-deploy

Kalau storage/ di .gitignore, app cloud tidak punya index. Solusi: pre-build index dan commit, ATAU re-build di startup, ATAU upload manual.

Mistake 5: Memory Out di Cold Start

Embedding model 400MB + LlamaIndex + Streamlit + dokumen = bisa OOM di 1GB free tier. Pakai embedding model kecil (all-MiniLM-L6-v2, 90MB) atau pre-compute embeddings offline.

Mistake 6: Tidak Handle Cold Start UX

User klik link, layar putih 60 detik, mereka bouncing. Solusi:

with st.spinner("Loading model (first request takes 30-60s)..."):
    rag = load_rag()

Mistake 7: Langsung Public Tanpa Rate Limit

Public URL = bisa di-spam. Set per-session limit + log unusual traffic.

FAQ

Q: Streamlit Cloud vs HF Spaces, mana yang lebih bagus untuk portfolio? A: HF Spaces sedikit lebih impressive di kalangan AI recruiter karena nama Hugging Face. Streamlit Cloud lebih cepat di-setup. Boleh deploy di keduanya — link berbeda untuk audience berbeda.

Q: Berapa jumlah user concurrent yang ditangani free tier? A: Streamlit Cloud free: ~5-10 user concurrent oke. Lebih dari itu, antrian. Untuk demo recruiter, lebih dari cukup.

Q: Bagaimana kalau app crash di production? A: Cek logs di dashboard. 90% kasus: env var hilang, requirements tidak lengkap, atau OOM.

Q: Bisa pakai LLM lokal (Ollama) di Streamlit Cloud? A: Tidak, karena CPU tier tidak punya cukup RAM untuk load model 7B. Pakai API (Gemini/OpenAI/Claude) untuk demo cloud.

Q: Apa benar app sleep setelah idle? A: Iya. Streamlit Cloud + HF Spaces + Render free tier semua sleep ~10 menit idle. First request setelah sleep = ~30-60 detik wake-up.

Q: Bagaimana cara update app yang sudah deploy? A: Push commit baru ke main branch GitHub. Streamlit Cloud / HF Spaces auto-redeploy. ~2-3 menit.

Q: Domain custom gratis dimana? A: Tidak ada platform free tier yang support custom domain gratis. Beli domain murah ($5-10/year) di Namecheap, point ke Vercel/Render/Railway (semua support custom domain free).

Challenge 7.4

Challenge 1 — Deploy Capstone

Deploy chatbot kamu ke Streamlit Cloud. Pastikan public URL bisa diakses.

Challenge 2 — HF Spaces Mirror

Deploy juga ke HuggingFace Spaces. Compare experience.

Challenge 3 — REST API

Bikin FastAPI version. Deploy ke Render. Test dengan curl/Postman.

Challenge 4 — Custom Domain (Bonus)

Beli domain murah ($5-10/year). Point ke Streamlit/HF Spaces.

Memberi kamu URL profesional seperti kb.yazid.com instead of kb-yazid-xxx.streamlit.app.

Selanjutnya: 05-portfolio-ready.md

Deployment