AI-native observability

Kortex

The central nervous system for DevOps. Kortex senses every event in your stack, reasons about it with real AI, takes safe corrective action on its own, and remembers every incident forever.

Kortex Control Room — saude-publica
Incidents
Latency spike
2m ago · p99
Worker drift
14m ago · slot-3
OOM kill
resolved · 1h ago
p99 latency · last 30m
+240%
p99
1.4s
err/s
0.8
rps
312
Chat
YouWhy is saude-publica slow?
Kortexp99 latency rose +240% in the last 12m. Correlated with deploy at 17:43. Likely cause: missing index on prontuarios.paciente_id. Run the migration?
Confirm
Investigate

A nervous system, not a dashboard

Kortex is organized around five capabilities that mirror how a biological brain handles its body.

01 — Senses

Universal ingest

Logs, metrics, traces, health checks, crashes, deploys — from any source via HTTP, gRPC, OTLP, or Unix socket. No vendor lock-in.

02 — Brain

AI-native analysis

Statistical baselines catch the obvious. Stack-trace clustering catches the recurring. LLM reasoning catches the hidden root cause.

03 — Reflexes

Autonomous remediation

Restart workers, drain slots, autoscale, roll back deploys, isolate sick backends — automatically, with explicit opt-in and a kill switch.

04 — Memory

Searchable history

Every incident is stored, indexed, and learned from. Auto-generated playbooks. MTTR tracked per category.

05 — Coordination

Cross-system workflows

Compose actions across multiple Koder products and external systems. Drain, migrate, reload, verify, mark deploy successful — one workflow.

What Kortex actually does

Concrete capabilities, not buzzwords.

📡

OTLP-compatible ingest

Accept OpenTelemetry logs, metrics, and traces out of the box. Bring your existing instrumentation.

📈

Statistical anomaly detection

Z-score over moving windows on p50/p95/p99 latency, error rates, throughput, saturation. Cheap, fast, no LLM required.

🧬

Error fingerprinting

Stack traces normalized and grouped. First occurrences flagged. Never debug the same error twice.

🧠

Pluggable LLM providers

Anthropic Claude, OpenAI, Ollama (local), or koder-ai. Switch via config. Tool use unified across providers.

💬

Conversational debugging

Ask "why is service X slow?" and watch the AI fetch logs, compare baselines, check deploys, and respond with citations.

Declarative reflex rules

Simple TOML rules: when X then do Y. Cooldowns, dry-run mode, audit log, kill switch.

📚

Auto-generated playbooks

Resolved incidents become reusable runbooks. Same problem next time? Kortex already knows the fix.

🔗

Native koder-jet integration

Direct consumer of the koder-jet event bus. No exporters, no scrapers, no intermediate format.

🔐

Koder ID single sign-on

OIDC login, role-based access (viewer, operator, admin), full audit trail of every action.

How it works

From raw event to autonomous fix in three stages.

1

Sense

Point your services at Kortex over OTLP, gRPC, HTTP, or a local socket. Events stream into ClickHouse continuously.

2

Reason

Statistical detectors spot anomalies in real time. The LLM brain investigates by querying logs, metrics, deploys, and history — then proposes a root cause with evidence.

3

Act

Approve a fix in chat, or let a reflex rule fire it automatically. Kortex executes the action, verifies success, and writes the incident to memory.

How it compares

Kortex is the first observability product designed around autonomy from day one.

Capability Kortex Datadog New Relic PagerDuty
Universal event ingest (OTLP)
Statistical anomaly detection
AI-native root cause analysisadd-onadd-on
Conversational debugging with tool use
Autonomous remediation built inrunbooks
Cross-system workflow coordinationlimited
Pluggable LLM providers (incl. local)
Self-hosted, single binary
Auto-generated playbooks from incidents

Give your infrastructure a brain.

Kortex is in early development. Sign in with Koder ID to follow progress and join the early access list.

Sign In to Get Early Access