vibespatial.io.gpu_parse.properties

GPU property extraction primitives for GeoJSON Feature objects.

Extracts numeric, boolean, and null property values directly on GPU, avoiding the per-feature orjson.loads() CPU bottleneck. String properties are intentionally NOT handled – the caller falls back to CPU for those columns.

Pipeline overview:

# Inputs: d_bytes, d_quote_parity, d_depth, feature boundaries
#
# 1. Locate property key positions via pattern_match + depth filter
# 2. Classify value types (string/bool/null/number/complex)
# 3. For numeric columns: extract floats via number_boundaries + parse
# 4. For boolean columns: NVRTC kernel reads "true"/"false"
# 5. For null columns: mark in validity mask
# 6. Schema inference on first N features (only D->H transfer)

The only host round-trip is schema inference (reading unique key names and their types from a small sample). All column extraction runs entirely on GPU.

Kernel classification (ADR-0033):
  • classify_property_values: Tier 1 (byte-level pattern at specific positions)

  • extract_booleans: Tier 1 (byte-level “true”/”false” recognition)

  • All other operations: Tier 2 (CuPy element-wise / gpu_parse reuse)

Precision (ADR-0002):
  • Integer-only byte classification kernels – no PrecisionPlan needed.

  • Numeric property values are parsed to fp64 via parse_ascii_floats (storage is always fp64 per ADR-0002).

Attributes

Functions

infer_property_schema(→ dict[str, int])

Infer property names and their types from a sample of features.

extract_gpu_properties(→ dict[str, cupy.ndarray])

Extract numeric/boolean properties on GPU.

Module Contents

vibespatial.io.gpu_parse.properties.cp = None
vibespatial.io.gpu_parse.properties.KERNEL_PARAM_I64
vibespatial.io.gpu_parse.properties.VTYPE_STRING: int = 0
vibespatial.io.gpu_parse.properties.VTYPE_BOOLEAN: int = 1
vibespatial.io.gpu_parse.properties.VTYPE_NULL: int = 2
vibespatial.io.gpu_parse.properties.VTYPE_NUMBER: int = 3
vibespatial.io.gpu_parse.properties.VTYPE_COMPLEX: int = 4
vibespatial.io.gpu_parse.properties.infer_property_schema(d_bytes: cupy.ndarray, d_quote_parity: cupy.ndarray, d_depth: cupy.ndarray, d_feature_starts: cupy.ndarray, d_feature_ends: cupy.ndarray, *, sample_size: int = 100, property_depth: int = 4) dict[str, int]

Infer property names and their types from a sample of features.

Scans the first sample_size features for property keys at the specified depth, classifies each key’s value type, and returns a schema mapping property name to value type code.

This is the ONLY acceptable D->H transfer in the property extraction pipeline.

Parameters

d_bytescp.ndarray

Device-resident uint8 byte array.

d_quote_paritycp.ndarray

Device-resident uint8 quote parity mask.

d_depthcp.ndarray

Device-resident int32 bracket depth array.

d_feature_startscp.ndarray

Device-resident int64 array of feature start byte offsets.

d_feature_endscp.ndarray

Device-resident int64 array of feature end byte offsets.

sample_sizeint

Number of features to sample for schema detection.

property_depthint

Bracket depth of property keys (default 4 for GeoJSON).

Returns

dict[str, int]

Mapping of property name to value type code (VTYPE_* constants). String columns are included (VTYPE_STRING) so the caller knows which columns need CPU fallback.

vibespatial.io.gpu_parse.properties.extract_gpu_properties(d_bytes: cupy.ndarray, d_feature_starts: cupy.ndarray, d_feature_ends: cupy.ndarray, d_quote_parity: cupy.ndarray, d_depth: cupy.ndarray, *, property_depth: int = 4, sample_size: int = 100) dict[str, cupy.ndarray]

Extract numeric/boolean properties on GPU.

Performs schema inference on a small sample (the only D->H transfer), then extracts each non-string property column entirely on GPU.

Parameters

d_bytescp.ndarray

Device-resident uint8 byte array of the full file.

d_feature_startscp.ndarray

Device-resident int64 feature start byte offsets.

d_feature_endscp.ndarray

Device-resident int64 feature end byte offsets.

d_quote_paritycp.ndarray

Device-resident uint8 quote parity mask.

d_depthcp.ndarray

Device-resident int32 bracket depth array.

property_depthint

Bracket depth of property keys (default 4 for GeoJSON).

sample_sizeint

Number of features to sample for schema inference.

Returns

dict[str, cp.ndarray]

Mapping of property name to device array of values.

  • Numeric properties: float64 array of shape (n_features,)

  • Boolean properties: uint8 array (1=true, 0=false) of shape (n_features,)

  • Null-only properties: excluded from output (all-null columns are dropped)

  • String properties: NOT included (caller falls back to CPU)

  • Complex properties: NOT included (nested objects/arrays skipped)

For columns with mixed types across features (e.g., some features have a numeric value and others have null), the output array has NaN for missing numeric values and 0 for missing boolean values.