Framework
Metrics
The pkg/metrics package provides a column-oriented time-series database for recording, querying, and automatically pruning application metrics. It replaces external monitoring stacks (Prometheus, Grafana) and analytics platforms (PostHog, Google Analytics) with a single, zero-dependency engine that runs inside your Stanza process.
import "github.com/stanza-go/framework/pkg/metrics"
Creating a store
Create a metrics store with a directory path and optional configuration:
store := metrics.New("/path/to/metrics",
metrics.WithSystemMetrics(),
metrics.WithLogger(logger),
)
All metric data — partitions, series registry, column files — is persisted under the given directory. The directory is created automatically on Start.
Options
| Option | Default | Description |
|---|---|---|
WithRetention(d) | 30 * 24h | Data retention period; partitions older than this are pruned hourly |
WithFlushInterval(d) | 5s | How often the in-memory buffer is flushed to disk |
WithFlushSize(n) | 1024 | Buffer size that triggers an immediate flush |
WithSystemMetrics() | disabled | Enables Go runtime metrics collection every 10 seconds |
WithLogger(l) | — | Logger for start/stop, flush errors, and prune operations |
Lifecycle integration
The store must be started before recording and stopped on shutdown. Integrate with the lifecycle system:
lc.Append(lifecycle.Hook{
OnStart: store.Start,
OnStop: store.Stop,
})
Start creates the data directory, loads the series registry, and launches background goroutines for periodic flushing, pruning, and optional system metrics collection.
Stop flushes remaining buffered data, closes all partitions, saves the series registry, and waits for all background goroutines to finish.
Recording metrics
Record writes a metric sample with a name, value, and optional labels:
store.Record("http_requests", 1, "method", "GET", "status", "200")
store.Record("order_total", 49.99, "currency", "USD")
store.Record("cpu_usage", 0.85)
Labels are alternating key-value string pairs. An odd trailing label is ignored. Samples are buffered in memory and flushed to disk periodically (every 5 seconds by default) or when the buffer reaches flushSize.
Each unique combination of metric name + labels creates a series. The series registry maps these combinations to numeric IDs for compact storage:
http_requests{method=GET,status=200} → series ID 1
http_requests{method=POST,status=201} → series ID 2
order_total{currency=USD} → series ID 3
Label cardinality
Keep label values bounded. Labels like user_id or request_id create a new series per unique value, bloating the registry. Use labels for dimensions you'll filter and aggregate by — method, status, path, tier — not unique identifiers.
Querying metrics
Query retrieves time-series data with optional aggregation:
result, err := store.Query(metrics.Query{
Name: "http_requests",
Start: time.Now().Add(-1 * time.Hour),
End: time.Now(),
Step: 1 * time.Minute,
Fn: metrics.Sum,
Labels: map[string]string{
"method": "GET",
},
})
Query fields
| Field | Type | Required | Description |
|---|---|---|---|
Name | string | yes | Metric name to query |
Start | time.Time | yes | Range start (inclusive) |
End | time.Time | yes | Range end (inclusive) |
Step | time.Duration | no | Aggregation interval; 0 returns raw samples |
Fn | AggFn | no | Aggregation function; default Avg |
Labels | map[string]string | no | Label filters — all specified keys must match exactly |
Aggregation functions
| Function | Description |
|---|---|
Sum | Sum of values in each step |
Avg | Average of values in each step |
Min | Minimum value in each step |
Max | Maximum value in each step |
Count | Number of samples in each step |
Last | Last value in each step |
Result structure
The result contains one SeriesData entry per matching series:
for _, series := range result.Series {
fmt.Println(series.Name, series.Labels)
for _, pt := range series.Points {
fmt.Printf(" %v: %.2f\n", time.UnixMilli(pt.T), pt.V)
}
}
Each Point has T (Unix milliseconds) and V (float64 value). When Step > 0, points are evenly spaced at the step interval with aggregated values. When Step == 0, raw samples are returned.
Listing metrics and labels
Discover what metrics exist and what label values are available:
// All registered metric names, sorted alphabetically.
names := store.Names()
// All unique values for a label key on a metric, sorted.
methods := store.LabelValues("http_requests", "method")
// → ["DELETE", "GET", "PATCH", "POST"]
These are useful for building metric pickers and label filters in admin UIs.
Store statistics
Stats returns a snapshot of storage state:
s := store.Stats()
fmt.Println(s.SeriesCount, s.PartitionCount, s.DiskBytes)
| Field | Type | Description |
|---|---|---|
SeriesCount | int | Number of unique time series (name + label combinations) |
PartitionCount | int | Number of daily partitions on disk |
DiskBytes | int64 | Total bytes used by column files |
OldestDate | string | Oldest partition date (YYYY-MM-DD) |
NewestDate | string | Newest partition date (YYYY-MM-DD) |
System metrics
When WithSystemMetrics() is enabled, the store automatically records Go runtime metrics every 10 seconds:
| Metric | Description |
|---|---|
go_goroutines | Current goroutine count |
go_heap_alloc_bytes | Heap bytes allocated and still in use |
go_heap_inuse_bytes | Heap bytes in spans with at least one object |
go_heap_objects | Number of allocated heap objects |
go_stack_inuse_bytes | Stack bytes in use by goroutines |
go_gc_pause_ns | Duration of most recent GC pause (nanoseconds) |
go_gc_runs | Total completed GC cycles |
go_alloc_bytes_total | Cumulative bytes allocated (even if freed) |
go_sys_bytes | Total bytes obtained from OS |
go_mallocs_total | Cumulative heap allocations |
go_frees_total | Cumulative heap frees |
These metrics have no labels — one series each. They provide a baseline of application health without any manual instrumentation.
Storage design
The store uses a column-oriented layout with daily partitioning:
metrics/
├── series.txt ← series registry (name + labels → ID)
├── 2026-03-23/ ← daily partition
│ ├── ts.col ← timestamps ([]int64, 8 bytes each)
│ ├── sid.col ← series IDs ([]uint64, 8 bytes each)
│ └── val.col ← values ([]float64, 8 bytes each)
├── 2026-03-22/
│ └── ...
Row N across all three column files represents the same sample. This layout enables:
- Efficient time-range queries — only partition directories overlapping the query range are scanned.
- Simple append writes — each column file is append-only within a partition.
- Easy pruning — deleting a partition is just removing a directory.
- Compact storage — 24 bytes per sample (8 bytes × 3 columns), no encoding overhead.
The series registry (series.txt) is a tab-separated text file mapping name\tlabel1=val1,label2=val2 to numeric series IDs. It's append-only on disk with an in-memory lookup protected by RWMutex.
Write path
Record()resolves the series ID (creating a new one if needed)- Sample is appended to an in-memory buffer
- Buffer is flushed to disk when it reaches
flushSizeor everyflushInterval - On flush, samples are sorted by timestamp and grouped by date partition
Query path
- Find matching series IDs from the registry (by name and label filters)
- Scan only partition directories overlapping the time range
- Read column files with
ReadAt(doesn't affect write position) - Filter by series ID and time range
- Check the in-memory buffer for unflushed samples
- Aggregate into time buckets if
Step > 0
Auto-pruning
A background goroutine runs every hour and deletes partition directories older than the retention period (default 30 days). This is simple directory deletion — no compaction, no tombstones. The retention window slides forward automatically.
To customize retention:
store := metrics.New(dir,
metrics.WithRetention(7 * 24 * time.Hour), // keep 7 days
)
Thread safety
All methods are safe for concurrent use. Multiple goroutines can call Record simultaneously, and queries can run concurrently with writes. Internal locks protect the buffer, partitions, and series registry. Queries use ReadAt on column files, which doesn't interfere with append writes.
API reference
Store
| Method | Signature | Description |
|---|---|---|
New | New(dir string, opts ...Option) *Store | Create a new store |
Start | (ctx context.Context) error | Initialize and start background goroutines |
Stop | (ctx context.Context) error | Flush, close partitions, stop goroutines |
Record | (name string, value float64, labels ...string) | Record a metric sample |
Query | (q Query) (*Result, error) | Query time-series data |
Names | () []string | List all metric names |
LabelValues | (name, labelKey string) []string | List label values for a metric |
Stats | () StoreStats | Storage statistics snapshot |
Types
| Type | Description |
|---|---|
Point | {T int64, V float64} — timestamp (Unix ms) and value |
SeriesData | {Name string, Labels map[string]string, Points []Point} — one series from a query |
Result | {Series []SeriesData} — complete query result |
Query | Query parameters (name, labels, start, end, step, fn) |
StoreStats | Storage statistics (series count, partitions, disk bytes, date range) |
AggFn | Aggregation function enum (Sum, Avg, Min, Max, Count, Last) |
Option | Functional option for store configuration |
Tips
- One store per application. All metrics — system, HTTP, business, client — go into the same store. Series are separated by name and labels, not by store instance.
- Use
WithSystemMetrics()in production. It adds 11 series with negligible overhead (oneruntime.ReadMemStatscall every 10 seconds) and gives you instant visibility into goroutine leaks, memory growth, and GC behavior. - Record at the point of interest. Don't batch or delay metrics recording —
Recordis cheap (in-memory append). The flush goroutine handles disk I/O in the background. - Choose Step wisely for queries. A 1-minute step over 24 hours produces 1,440 points per series. For longer ranges (7d, 30d), increase the step (5m, 1h) to keep responses manageable.
- Label filters are exact match. There's no regex or wildcard matching. Design label values to be filterable — use
status=200notstatus=200 OK.
See the Application Metrics recipe for HTTP and business metric patterns, and the Client-Side Analytics recipe for frontend event tracking.