 # Agentic AI Coding Workshop **From prompts to autonomous coding agents** January 2026 --- # Today's Goal **Be artful with AI agents** — gain productivity while maintaining quality and security This is for everyone at Kapernikov, regardless of current AI experience. --- # Agenda (4 hours) | Block | Duration | What | |-------|----------|------| | **Principles & security** | ~45 min | Best practices, feedback loops, sandboxes | | **Tools & workflows** | ~30 min | Skills, sub-agents, worktrees, Spec-Kit | | *Break* | 15 min | | | **Tips & tricks** | ~15 min | Share what works, Q&A | | **Hands-on** | ~2 hours | Build a reusable workflow | We'll keep slides short. The real learning happens when you try it. ---
Agentic AI will make all of us the 10x coder, right?
--- --- # Discussion - Why do you think this happened? - What does this mean for us? --- --- # Why Most Don't Capture the Gains We're early. The difference isn't the tools — it's how you use them. --- # Part 1: Understanding Agentic AI ## From chatbots to autonomous agents --- # The Evolution of AI Assistants | Generation | Capability | Example | |------------|-----------|---------| | **Chat** | Single-turn Q&A | ChatGPT (basic) | | **Autocomplete** | Code completion with file context | GitHub Copilot | | **Agentic** | Autonomous task execution | Claude Code, Cursor | --- # How a chatbot talks to an LLM
User
Chatbot App
LLM
What's the capital of Belgium?
[system] You are a helpful assistant.
[user] What's the capital of Belgium?
[assistant] Brussels.
What is the best museum to visit there?
[system] You are a helpful assistant.
[user] What's the capital of Belgium?
[assistant] Brussels.
[user] What is the best museum to visit there?
[assistant] The Magritte Museum.
--- # The Agentic Loop
$
run agent
>
User
CLI Harness
LLM
Tool (bash)
list my Python files
[system] You are Claude Code...
[user] list my Python files
tool_call: bash
find . -name '*.py'
$ find . -name '*.py'
./main.py ./test_api.py
tool_result + conversation
./main.py ./test_api.py
I found 2 Python files:
main.py and test_api.py
--- # AI agents: Strengths and weaknesses An AI agent writes code at a very high speed in any technology you want. But it has some weaknesses: * **Limited memory** — doesn't learn from past failures, limited context window * **Notoriously bad at judging** — is this good enough? Are we optimizing what should not exist? * **Doesn't identify structural issues** — an agent is always happy and never says "I can't continue working this way" * **Doesn't see patterns over time** — an agent is really a code factory > Key to successful agent usage is taking these limits into account --- # Part 2: Coming to best practices ## What does it take for us to increase productivity with AI agents ? --- # The problem We want to find a way of working that: * Produces high-quality code * Increases the output of one developer by factor X * Enables learning (as a group!) and continuous improvement. --- # Principle 0: Know your role | Human (long-term memory) | | Agent (fast executor) | |--------------------------|:-:|----------------------| | Remembers past failures | → | Implements safeguards | | Identifies friction | → | Builds tooling | | Updates instructions and feedback | → | Follows best practices | | Spots patterns | → | Writes tests to catch them | **The asymmetry:** - **Agent:** fast execution, cheap implementation, no memory across sessions - **Human:** slow execution, but remembers patterns and failures over time --- # Principle 0: Know your role What's "good enough"? What's the right pattern? What does "done" look like? The agent doesn't decide. It infers — from your code, your docs, your tooling, your error messages. The prompt is a small fraction of what shapes its output. > Human role: be the input architect. > Agent role: execute against it. --- # Principle 1: Tame the iteration loop

> If your feedback is just syntax errors or "it doesn't work", **improve** it. --- # Principle 1: The Feedback Signal The stronger your feedback signal, the longer the agent can run without you. - **The feedback signal** is the driver for agent iterations and replaces human intervention if properly designed. ## The feedback signal * Agents can write tests extremely quickly. But they suck at figuring out what to test. * Tests are not the only way to make a feedback signal. Mandatory checklists, adversary reviews, ... are other tools. --- # Principle 1: The Feedback Hierarchy **Ranking feedback signals**: | Signal | Why | |--------|-----| | **E2E tests** | Capture intent, not implementation details | | **Checklists** | Cheap to write, force completeness | | **Runtime signals** | Logs, traces, metrics — what's actually happening as it runs | | **Adversarial reviews** | Agent reviews its own work with a critical eye | | **Human feedback** | Fewer but higher-value interventions | The agent can write tests — but "write tests" is not enough. **You** define what to test, the agent implements the check. --- # Principle 2: Keep agent context clear Smart decisions require clear context. Three problems to avoid: **Context window rot** — performance degrades as context fills; worse with long back-and-forth. **Tunnel vision** — pre-seeding biases the agent toward one solution; it may miss better paths. **Dilution** — human language is full of inconsistency. Pile every rule into one big instruction file and the model can't tell which ones matter. Everything important = nothing important. > **Solutions:** start clean for every task · keep knowledge modular and narrow · only add a rule when the agent broke without it · **progressive disclosure** — layer detail (name → skill body → linked references). --- # Principle 3: Engineer for the lack of memory Don't just prompt - make knowledge **persistent and discoverable**: - **What the agent can't see doesn't exist** — Slack threads, Confluence pages, things in your head are invisible. Move them into the repo (skills, AGENTS.md, design docs). - **Tests + CI** - Encode expected behavior in code, not conversation - **Linter rules** - Enforce patterns automatically, run in CI - **Type definitions** - Make constraints machine-readable > What you tell the agent once, it forgets. What you write in code, it follows forever. --- # Principle 4: Iterate on the process Principle 1 was about the agent's loop. This is about **yours**. You are a **process engineer**. Every failure is an opportunity: 1. Fix the immediate bug 2. Ask: *why didn't the agent catch this?* 3. Improve feedback (add test, linter rule, type check) 4. Update AGENTS.md/skill/... with the lesson learned > The goal isn't just working code - it's a system that produces working code reliably. --- # Harness Engineering What you've been learning has a name. OpenAI articulated it best: > "Building software still demands discipline, but the discipline shows up more in the **scaffolding** rather than the code." Your **harness** is the feedback loops, skills, tooling, runtime signals, **agent eyes** (Chrome DevTools MCP), and review surface around the agent. Every primitive in this deck is a piece of one. > Stop optimizing prompts. Start designing harnesses.
Source: openai.com/index/harness-engineering · Feb 2026
--- # Part 3: Security ## The uncomfortable truth about agent sandboxes --- # Security: The Uncomfortable Truth **Current options:** | Approach | Problem | |----------|---------| | Permission prompts | Slow, annoying — you click "yes" anyway | | Command sandboxes | Not actually secure, easy to escape | | Disable all guardrails | Fast, but... | None of these actually work. Let's look at why. --- # The Sandbox is Leaky

Would you say "yes"?
If so, because you **trust the agent**, not because you understood the command.
The sandbox only works if humans actually review. They don't.
--- # Time of Check ≠ Time of Use Approvals are per-action, but security is about the **combination**: | Step | You approve... | Seems safe? | |------|---------------|-------------| | 1 | Run `make` | Yes — just builds | | 2 | Edit `Makefile` | Yes — just a text file | | **Combined** | **Agent edits Makefile, then runs make** | **Full arbitrary code execution** | The checkbox model gives a **false sense of control**. Each approval looks harmless. The sequence grants full access. --- # What Are We Actually Worried About? The risk is **not** "the agent goes rogue." It's: good faith + too much access + mistakes. | Risk | Example | |------|---------| | **Credential exposure** | Agent commits `.env` with customer API keys | | **Supply chain** | Agent installs a typosquatted package | | **Destructive mistakes** | Agent runs `rm -rf` on the wrong directory | | **Prompt injection** | Malicious content in a cloned repo influences agent behavior | These are mundane operational risks, not science fiction. > Treat the agent like a junior dev: not malicious, but don't give it root access --- # What To Do About It | | Recommendation | Status | |---|---|---| | **Minimum** | Don't point agents at repos with production secrets | Doable right now | | **Minimum** | Don't experiment with new tools on your work laptop | Doable right now | | **Better** | Use devcontainers / separate user accounts | Works, ergonomics are rough | | **Best** | Fully containerized agent runtime, no host access | Tooling is coming, not here yet | | **Always** | Review the **diff** before merging, not the bash prompts | Your real security boundary | **Key insight:** `git diff` is your security review, not the permission checkbox. > Make the boundary about the **artifacts** the agent produces, not the side effects --- # For non-coders: the in-app embedded agent
For non-developers, an AI agent **embedded inside the app** is the safer option. The sandbox is enforced by the application's own permissions and data boundaries — not by trust in a CLI prompt.
--- # It's Already Happening

--- # Teaching Your Agent ## Slash commands, skills, and how to write them --- # Slash commands Slash commands are **user-invoked** prompts:

--- # Skills Skills are reusable workflows that agents load **on demand**:

--- # Writing Good Skills: Principles 1. **Skills encode the delta** — only what the model doesn't already know about *your* project 2. **Start with nothing** — add skills after observed failures, not upfront 3. **Every skill should have a clear and narrow scope** — one job, done well 4. **Define what "done" looks like** — give the agent criteria to self-check, not just instructions to follow > Don't use `/init` or ask the AI to write skills for you — it will produce 200 lines of generic advice it already knew without the skill --- # Writing Good Skills: Recommendations - **Describe *when* to trigger**, not just what it does — this is how the agent picks the right skill. If your agent doesn't find your skill when applicable, improve this! - **Keep it (as) short as possible** Challenge every line: does the model really need this? *Note: this can change over time as models evolve* - **Include one concrete example** of the desired outcome - **Test by observing** actual agent behavior, then refine. --- # From Skills to Workflows Some skills are **knowledge** — the delta about *your* project, a few lines the model reads. Others are **workflows**: an ordered procedure the agent *executes* to produce a result — specswarm, superpowers, a porting skill. > Write the procedure as prose in the skill body and the agent treats it as **advice, not a contract** — it skips the "obvious" steps, reorders, eyeballs. Output goes high-variance: sometimes great, sometimes bad. Workflows need their own techniques to stay on the rails. --- # Writing Good Workflows 1. **Forcing functions — artifacts over memory.** A step is done only when its named artifact exists on disk. *No artifact = not done.* Don't trust the agent to self-report. 2. **Hard gates.** Checkpoints that need *evidence* to pass — tests green + a written verify note *before* the commit is allowed. 3. **Make the steps the task list.** Convert the procedure into tasks at the *start* — tracked state, not prose skimmed once and forgotten. 4. **Push determinism into scripts, not prose.** Ship helper scripts with the skill; the agent *runs* the recipe instead of re-deriving it each run. Re-derivation from prose *is* the variance. 5. **Compose, don't inline.** A heavy step invokes a sub-workflow or sub-agent instead of bloating one giant skill. --- # Organizing the Work ## Sub-agents, worktrees, and structured workflows --- # Sub-agents Sub-agents provide **context isolation**:

--- # Git Worktrees **Problem:** Agent working on feature X blocks you from working on feature Y **Solution:** Git worktrees = multiple working directories, one repo ```bash git worktree add ../my-project-feature-x feature-x # /my-project ← your main work # /my-project-feature-x ← agent's sandbox ``` **Benefits:** - Agent can run tests, break things, iterate — without blocking you - Easy to review: just diff the worktree - Clean disposal: `git worktree remove` when done **Watch out:** Devcontainers with hardcoded ports won't run in parallel! --- # Git Platform CLI Tools **`gh` (GitHub) / `glab` (GitLab)** — Give your agent repo superpowers ```bash gh issue view 42 # read issues gh pr create --title "Fix auth bug" --body "Closes #42" # open PRs gh pr checks # CI status gh issue comment 42 --body "Fixed in PR #43" # comment back ``` **Why this matters:** - Agent can work end-to-end: issue → branch → code → PR - No copy-pasting between terminal and browser - CI feedback becomes part of the agent loop --- # Spec-Kit / SpecSwarm (claude) **Spec-Driven Development**: specifications are the source of truth

--- # Structure the Collaboration One level of abstraction at a time. Don't spec and implement in the same breath. | Phase | Focus | Artifact | |-------|-------|----------| | **Specify** | What do we need? | spec.md | | **Research & plan** | How do we build it? | plan.md | | **Task breakdown** | What are the steps? | tasks.md | | **Implement** | Write the code | code + tests | Each phase has its own conversation. Mixing them is how you get half-baked specs and wrong implementations. --- # Spec-Kit: Why It Works | Problem | How Spec-Kit Helps | |---------|-------------------| | **No persistent memory** | Artifacts are the memory | | **Context window rot** | Each phase has focused context | | **Spec-code drift** | Specs are source of truth, regenerate code | | **Weak feedback signal** | Clear tasks + success criteria → strong feedback | | **Chaotic iteration** | Structured workflow with clear phases | | **Hard to restart** | Can regenerate from any artifact | | **Solving the wrong problem** | Forces you to define intent before writing code | --- # Code review when an agent writes the code When humans aren't writing the code, they're reviewing it. Make that time count. | Role | Norm | |------|------| | **Author** | Read your own diff before sending. Minimize unrelated changes. | | **Author** | Disclose intent (prototype? production?), not tool use. | | **Author** | Every commit to main must work — feature-flag if cross-stack. | | **Reviewer** | Prioritize reviews over writing — your queue blocks the team. | | **Reviewer** | Focus on APIs and tests, not personal style. | | **Reviewer** | An agent can help you digest a PR — but **you** must understand it. | | **Both** | PRs are not for architecture debates. Design docs are. | | **Both** | Style debates → linter rule or skill, not PR comments. | --- # Your tips and tricks ? What have **you** discovered? - Prompting patterns that work - Tools or configurations worth sharing - Pitfalls to avoid - Workflows that save time --- # Part 5: Hands-on Session ## Teach Claude something it couldn't do --- # Rescue a Slop Article Pick one of these AI-written articles. Ask Claude *"rewrite this, make it better"* — it just gives you **smoother slop**. Your job: build a **reusable skill** that *reviews* (catches the tells, the empty claims, the outright lies), **then** rewrites. Present the **structure** you built — not the prose.
① Arch "Rust init"
② The AI Revolution
③ What is AI?
④ ML in the Modern World
--- # Some Tips - **Know the goal first.** Who's it for, and what do you want from it? *"I'm a teacher, my readers are students"* and *"I'm a marketer, I want clicks"* produce completely different rewrites. - **Prove the gap.** Run baseline *"make it better"* and capture where it stays slop *against that goal*. No gap → nothing to teach. - **Review, then improve — with adversaries.** Two phases. For the review, spin up several subagents, each with specific instructions on a clean context, each adversarially attacking *one* thing: tells · false claims · phantom references · structure. - **Steer with your taste.** Give it the phrasings you ban *and* a style distilled from documents you wrote or admire — what to avoid *and* what to aim for. --- # For Coders: Dockerize Anything Point an agent at any GitHub project and it'll hand you a **runnable Docker image**. That part it can do on its own. **But can it make sure the image:** - **starts ergonomically** — one command, sane defaults, no fiddling? - **runs many instances side by side** — no clashing ports, names, or volumes? - **runs as non-root**? - **has no known vulnerabilities** — actually scanned, not assumed? Each "but can it" is a review lens — one adversarial subagent each. The deliverable is a reusable **`/dockerize`** skill that bakes the standards in. --- # More Ideas — Same Pattern - **Tech Debt Hunter** — `/techdebt` finds dead/duplicated code, proposes fixes - **CSV Health Inspector** — any CSV → a data-quality report - **Documentation Generator** — point at a repo → structured docs --- # Call to Action ## Modular context is a team sport --- # Build It Together Modular agent context isn't just better for the agent — **it's better for collaboration**. A skill, a guardrail, a linter rule — **shareable building blocks**. When you write one, everyone benefits. **What I want to see:** - **Create skills** — even small ones. A commit workflow, a review checklist - **Share best practices** — what worked, what didn't. Write a skill, not a wall of text - **Build on each other's work** — extend a skill, improve guardrails, add tests > The goal isn't 10 people building 10 workflows. It's 10 people building on 1 workflow, making it great. ---  # Questions? **Thank you for attending!**  Slides: agentic-coding-workshop-3d0579.pages.gitlab.pp.kapernikov.com