Benchmarking¶

vibeSpatial ships a unified benchmarking CLI called vsbench. Use it to measure individual operations, run regression suites, compare against GeoPandas, and generate HTML reports. For repo-local GPU and strict-native workflows, invoke it through uv run vsbench ... so the project-managed environment and GPU runtime detection stay intact.

Quick start¶

# List available operations
uv run vsbench list operations

# Run a single operation benchmark
uv run vsbench run buffer --scale 100k

# Run the smoke suite (fast, good for local dev)
uv run vsbench suite smoke

# Compare vibeSpatial vs GeoPandas on your own script
uv run vsbench shootout my_workflow.py

Commands¶

`vsbench run <operation>`¶

Run a single operation benchmark at a given scale.

vsbench run intersects --scale 10k --repeat 5
vsbench run buffer --scale 1m --precision fp32 --gpu-sparkline
vsbench run dissolve --compare geopandas --json
vsbench run bounds-pairs --rows 20000 --arg dataset=uniform --arg tile_size=256
vsbench run clip-rect --arg kind=polygon --arg rect=100,100,700,700

Use vsbench list operations --json to discover typed operation-specific parameters. Operation arguments are passed with repeatable --arg key=value flags and validated against the operation schema before execution. Default operation listings and suites measure public GeoPandas-compatible APIs; private owned-array and kernel diagnostics require --include-internal or vsbench kernel.

Common flags (shared by run, pipeline, suite, kernel):

Flag	Default	Description
`--scale`	operation default	Input size: `1k`, `10k`, `100k`, `1m`
`--rows`	none	Exact input row count override for `vsbench run`
`--repeat`	3	Number of timed iterations
`--precision`	`auto`	Force `fp32`, `fp64`, or `auto`
`--compare`	none	Compare against `shapely` or `geopandas`
`--input-format`	`parquet`	Fixture format: `parquet`, `geojson`, `shapefile`, `gpkg`
`--gpu-sparkline`	off	Show per-stage GPU utilization sparkline
`--nvtx`	off	Emit NVTX ranges for Nsight profiling
`--trace`	off	Enable execution trace warnings
`--json`	off	Output results as JSON
`--quiet`	off	Minimal output
`--output PATH`	none	Write results to file

`vsbench pipeline <name>`¶

Run a named multi-stage pipeline benchmark (e.g. spatial-join, overlay, nearby-buildings).

vsbench pipeline spatial-join --suite-level ci
vsbench pipeline overlay --gpu-sparkline

The --suite-level flag (smoke, ci, full) controls which scale points the pipeline runs at.

`vsbench suite {smoke,ci,full}`¶

Run a predefined suite of benchmarks.

vsbench suite smoke                       # Fast check (~30s)
vsbench suite ci --json --output ci.json  # CI gate
vsbench suite full --gpu-sparkline        # Full profiling run

Suites run serially and isolate each benchmark item in a subprocess by default so CUDA allocator state, failed kernels, or OOMs do not contaminate later items. Use --in-process only for local debugging. Use --item-timeout N to override the default 600 second per-item timeout.

Use --pipeline <name> (repeatable) to limit to specific pipelines.

`vsbench kernel <name>`¶

Run a Tier-2 NVBench kernel microbenchmark (requires cuda-bench).

vsbench kernel point_in_polygon --scale 100k --bandwidth

The --bandwidth flag reports GB/s and percent-of-peak memory bandwidth.

`vsbench compare <baseline> <current>`¶

Detect performance regressions between two JSON result files.

vsbench suite ci --json --output baseline.json
# ... make changes ...
vsbench suite ci --json --output current.json

vsbench compare baseline.json current.json

Returns exit code 1 if regressions are detected.

`vsbench report <results>`¶

Generate an HTML report from a JSON result file.

vsbench report ci.json -o report.html

`vsbench list {operations,pipelines,fixtures,kernels}`¶

Discover what benchmarks are available.

vsbench list operations                # Public API operations
vsbench list operations --json         # Includes operation parameter schemas
vsbench list operations --include-internal  # Include private diagnostics
vsbench list operations --category io  # Filter by category
vsbench list pipelines                 # Available pipelines
vsbench list fixtures                  # Fixture specs and scales
vsbench list kernels                   # NVBench kernel benches (Tier 2)

`vsbench fixtures generate`¶

Pre-generate benchmark fixture files (synthetic datasets in various formats and scales).

vsbench fixtures generate                           # All scales, all formats
vsbench fixtures generate --scale 100k --format parquet
vsbench fixtures generate --force                   # Regenerate even if cached

`vsbench shootout <script.py>`¶

Head-to-head comparison of a Python script running under real GeoPandas vs vibeSpatial. The script should import geopandas as gpd – vsbench handles the import swap transparently.

uv run vsbench shootout benchmarks/shootout/nearby_buildings.py --scale 10k --repeat 3
uv run vsbench shootout benchmarks/shootout/accessibility_redevelopment.py --scale 10k --repeat 3
uv run vsbench shootout examples/nearby_buildings.py --repeat 5
uv run vsbench shootout my_etl.py --with pyogrio --json

The script argument accepts a single Python file or a directory of scripts. For GPU and VIBESPATIAL_STRICT_NATIVE=1 runs, uv run vsbench shootout ... is the supported launch mode.

Flag	Default	Description
`--repeat`	3	Timed runs per engine
`--scale`	none	Passed as `VSBENCH_SCALE` env var (e.g. `10K`, `100K`)
`--no-warmup`	off	Skip the untimed script warmup run; the vibespatial leg still prewarms registered GPU pipelines before timing
`--baseline-python`	auto	Python interpreter with real geopandas
`--with DEP`	none	Extra pip deps for the geopandas env (repeatable)
`--timeout`	300	Per-run timeout in seconds

For workflow parity checks, treat --no-warmup --repeat 1 as a cold-start probe, not a steady-state benchmark. The default --repeat 3 is the right floor for judging parity on the top-level workflow shootouts.

Fingerprint correctness checking¶

Scripts can print a deterministic summary line to stdout:

SHOOTOUT_FINGERPRINT: rows=998 bounds=(-9.55, 71.1, 1004.26, 1010.0) convex_hull_area=105251.17

When both engines emit a fingerprint, vsbench compares them with numeric tolerance (rtol=1e-3) to catch correctness regressions while allowing expected floating-point divergence between GPU and CPU paths. A mismatch is reported as a test failure.

Fixture scales¶

Name	Rows
`1k`	1,000
`10k`	10,000
`100k`	100,000
`1m`	1,000,000

Fixtures are cached in .benchmark_fixtures/ and auto-generated on first use. Use vsbench fixtures generate to pre-populate.

Benchmarking¶

Quick start¶

Commands¶

vsbench run <operation>¶

vsbench pipeline <name>¶

vsbench suite {smoke,ci,full}¶

vsbench kernel <name>¶

vsbench compare <baseline> <current>¶

vsbench report <results>¶

vsbench list {operations,pipelines,fixtures,kernels}¶

vsbench fixtures generate¶

vsbench shootout <script.py>¶