Benchmarking¶
vibeSpatial ships a unified benchmarking CLI called vsbench. Use it to
measure individual operations, run regression suites, compare against
GeoPandas, and generate HTML reports. For repo-local GPU and strict-native
workflows, invoke it through uv run vsbench ... so the project-managed
environment and GPU runtime detection stay intact.
Quick start¶
# List available operations
uv run vsbench list operations
# Run a single operation benchmark
uv run vsbench run buffer --scale 100k
# Run the smoke suite (fast, good for local dev)
uv run vsbench suite smoke
# Compare vibeSpatial vs GeoPandas on your own script
uv run vsbench shootout my_workflow.py
Commands¶
vsbench run <operation>¶
Run a single operation benchmark at a given scale.
vsbench run intersects --scale 10k --repeat 5
vsbench run buffer --scale 1m --precision fp32 --gpu-sparkline
vsbench run dissolve --compare geopandas --json
vsbench run bounds-pairs --rows 20000 --arg dataset=uniform --arg tile_size=256
vsbench run clip-rect --arg kind=polygon --arg rect=100,100,700,700
Use vsbench list operations --json to discover typed operation-specific
parameters. Operation arguments are passed with repeatable --arg key=value
flags and validated against the operation schema before execution.
Default operation listings and suites measure public GeoPandas-compatible APIs;
private owned-array and kernel diagnostics require --include-internal or
vsbench kernel.
Common flags (shared by run, pipeline, suite, kernel):
Flag |
Default |
Description |
|---|---|---|
|
operation default |
Input size: |
|
none |
Exact input row count override for |
|
3 |
Number of timed iterations |
|
|
Force |
|
none |
Compare against |
|
|
Fixture format: |
|
off |
Show per-stage GPU utilization sparkline |
|
off |
Emit NVTX ranges for Nsight profiling |
|
off |
Enable execution trace warnings |
|
off |
Output results as JSON |
|
off |
Minimal output |
|
none |
Write results to file |
vsbench pipeline <name>¶
Run a named multi-stage pipeline benchmark (e.g. spatial-join, overlay, nearby-buildings).
vsbench pipeline spatial-join --suite-level ci
vsbench pipeline overlay --gpu-sparkline
The --suite-level flag (smoke, ci, full) controls which scale
points the pipeline runs at.
vsbench suite {smoke,ci,full}¶
Run a predefined suite of benchmarks.
vsbench suite smoke # Fast check (~30s)
vsbench suite ci --json --output ci.json # CI gate
vsbench suite full --gpu-sparkline # Full profiling run
Suites run serially and isolate each benchmark item in a subprocess by default
so CUDA allocator state, failed kernels, or OOMs do not contaminate later
items. Use --in-process only for local debugging. Use --item-timeout N to
override the default 600 second per-item timeout.
Use --pipeline <name> (repeatable) to limit to specific pipelines.
vsbench kernel <name>¶
Run a Tier-2 NVBench kernel microbenchmark (requires cuda-bench).
vsbench kernel point_in_polygon --scale 100k --bandwidth
The --bandwidth flag reports GB/s and percent-of-peak memory bandwidth.
vsbench compare <baseline> <current>¶
Detect performance regressions between two JSON result files.
vsbench suite ci --json --output baseline.json
# ... make changes ...
vsbench suite ci --json --output current.json
vsbench compare baseline.json current.json
Returns exit code 1 if regressions are detected.
vsbench report <results>¶
Generate an HTML report from a JSON result file.
vsbench report ci.json -o report.html
vsbench list {operations,pipelines,fixtures,kernels}¶
Discover what benchmarks are available.
vsbench list operations # Public API operations
vsbench list operations --json # Includes operation parameter schemas
vsbench list operations --include-internal # Include private diagnostics
vsbench list operations --category io # Filter by category
vsbench list pipelines # Available pipelines
vsbench list fixtures # Fixture specs and scales
vsbench list kernels # NVBench kernel benches (Tier 2)
vsbench fixtures generate¶
Pre-generate benchmark fixture files (synthetic datasets in various formats and scales).
vsbench fixtures generate # All scales, all formats
vsbench fixtures generate --scale 100k --format parquet
vsbench fixtures generate --force # Regenerate even if cached
vsbench shootout <script.py>¶
Head-to-head comparison of a Python script running under real GeoPandas vs
vibeSpatial. The script should import geopandas as gpd – vsbench
handles the import swap transparently.
uv run vsbench shootout benchmarks/shootout/nearby_buildings.py --scale 10k --repeat 3
uv run vsbench shootout benchmarks/shootout/accessibility_redevelopment.py --scale 10k --repeat 3
uv run vsbench shootout examples/nearby_buildings.py --repeat 5
uv run vsbench shootout my_etl.py --with pyogrio --json
The script argument accepts a single Python file or a directory of scripts.
For GPU and VIBESPATIAL_STRICT_NATIVE=1 runs, uv run vsbench shootout ...
is the supported launch mode.
Flag |
Default |
Description |
|---|---|---|
|
3 |
Timed runs per engine |
|
none |
Passed as |
|
off |
Skip the untimed script warmup run; the vibespatial leg still prewarms registered GPU pipelines before timing |
|
auto |
Python interpreter with real geopandas |
|
none |
Extra pip deps for the geopandas env (repeatable) |
|
300 |
Per-run timeout in seconds |
For workflow parity checks, treat --no-warmup --repeat 1 as a cold-start
probe, not a steady-state benchmark. The default --repeat 3 is the right
floor for judging parity on the top-level workflow shootouts.
Fingerprint correctness checking¶
Scripts can print a deterministic summary line to stdout:
SHOOTOUT_FINGERPRINT: rows=998 bounds=(-9.55, 71.1, 1004.26, 1010.0) convex_hull_area=105251.17
When both engines emit a fingerprint, vsbench compares them with numeric
tolerance (rtol=1e-3) to catch correctness regressions while allowing
expected floating-point divergence between GPU and CPU paths. A mismatch
is reported as a test failure.
Fixture scales¶
Name |
Rows |
|---|---|
|
1,000 |
|
10,000 |
|
100,000 |
|
1,000,000 |
Fixtures are cached in .benchmark_fixtures/ and auto-generated on first
use. Use vsbench fixtures generate to pre-populate.