vibespatial.io.gpu_parse.properties¶
GPU property extraction primitives for GeoJSON Feature objects.
Extracts numeric, boolean, and null property values directly on GPU,
avoiding the per-feature orjson.loads() CPU bottleneck. String
properties are intentionally NOT handled – the caller falls back to
CPU for those columns.
Pipeline overview:
# Inputs: d_bytes, d_quote_parity, d_depth, feature boundaries
#
# 1. Locate property key positions via pattern_match + depth filter
# 2. Classify value types (string/bool/null/number/complex)
# 3. For numeric columns: extract floats via number_boundaries + parse
# 4. For boolean columns: NVRTC kernel reads "true"/"false"
# 5. For null columns: mark in validity mask
# 6. Schema inference on first N features (only D->H transfer)
The only host round-trip is schema inference (reading unique key names and their types from a small sample). All column extraction runs entirely on GPU.
- Kernel classification (ADR-0033):
classify_property_values: Tier 1 (byte-level pattern at specific positions)
extract_booleans: Tier 1 (byte-level “true”/”false” recognition)
All other operations: Tier 2 (CuPy element-wise / gpu_parse reuse)
- Precision (ADR-0002):
Integer-only byte classification kernels – no PrecisionPlan needed.
Numeric property values are parsed to fp64 via parse_ascii_floats (storage is always fp64 per ADR-0002).
Attributes¶
Functions¶
|
Infer property names and their types from a sample of features. |
|
Extract numeric/boolean properties on GPU. |
Module Contents¶
- vibespatial.io.gpu_parse.properties.cp = None¶
- vibespatial.io.gpu_parse.properties.KERNEL_PARAM_I64¶
- vibespatial.io.gpu_parse.properties.VTYPE_STRING: int = 0¶
- vibespatial.io.gpu_parse.properties.VTYPE_BOOLEAN: int = 1¶
- vibespatial.io.gpu_parse.properties.VTYPE_NULL: int = 2¶
- vibespatial.io.gpu_parse.properties.VTYPE_NUMBER: int = 3¶
- vibespatial.io.gpu_parse.properties.VTYPE_COMPLEX: int = 4¶
- vibespatial.io.gpu_parse.properties.infer_property_schema(d_bytes: cupy.ndarray, d_quote_parity: cupy.ndarray, d_depth: cupy.ndarray, d_feature_starts: cupy.ndarray, d_feature_ends: cupy.ndarray, *, sample_size: int = 100, property_depth: int = 4) dict[str, int]¶
Infer property names and their types from a sample of features.
Scans the first
sample_sizefeatures for property keys at the specified depth, classifies each key’s value type, and returns a schema mapping property name to value type code.This is the ONLY acceptable D->H transfer in the property extraction pipeline.
Parameters¶
- d_bytescp.ndarray
Device-resident uint8 byte array.
- d_quote_paritycp.ndarray
Device-resident uint8 quote parity mask.
- d_depthcp.ndarray
Device-resident int32 bracket depth array.
- d_feature_startscp.ndarray
Device-resident int64 array of feature start byte offsets.
- d_feature_endscp.ndarray
Device-resident int64 array of feature end byte offsets.
- sample_sizeint
Number of features to sample for schema detection.
- property_depthint
Bracket depth of property keys (default 4 for GeoJSON).
Returns¶
- dict[str, int]
Mapping of property name to value type code (VTYPE_* constants). String columns are included (VTYPE_STRING) so the caller knows which columns need CPU fallback.
- vibespatial.io.gpu_parse.properties.extract_gpu_properties(d_bytes: cupy.ndarray, d_feature_starts: cupy.ndarray, d_feature_ends: cupy.ndarray, d_quote_parity: cupy.ndarray, d_depth: cupy.ndarray, *, property_depth: int = 4, sample_size: int = 100) dict[str, cupy.ndarray]¶
Extract numeric/boolean properties on GPU.
Performs schema inference on a small sample (the only D->H transfer), then extracts each non-string property column entirely on GPU.
Parameters¶
- d_bytescp.ndarray
Device-resident uint8 byte array of the full file.
- d_feature_startscp.ndarray
Device-resident int64 feature start byte offsets.
- d_feature_endscp.ndarray
Device-resident int64 feature end byte offsets.
- d_quote_paritycp.ndarray
Device-resident uint8 quote parity mask.
- d_depthcp.ndarray
Device-resident int32 bracket depth array.
- property_depthint
Bracket depth of property keys (default 4 for GeoJSON).
- sample_sizeint
Number of features to sample for schema inference.
Returns¶
- dict[str, cp.ndarray]
Mapping of property name to device array of values.
Numeric properties: float64 array of shape
(n_features,)Boolean properties: uint8 array (1=true, 0=false) of shape
(n_features,)Null-only properties: excluded from output (all-null columns are dropped)
String properties: NOT included (caller falls back to CPU)
Complex properties: NOT included (nested objects/arrays skipped)
For columns with mixed types across features (e.g., some features have a numeric value and others have null), the output array has NaN for missing numeric values and 0 for missing boolean values.