vibespatial.io.dbf_gpu

GPU DBF (dBASE III) reader – fixed-width record parsing on GPU.

Parses the attribute table component of ESRI Shapefiles entirely on the GPU for numeric columns, keeping device-resident arrays that can feed directly into GPU analytics without a host round-trip.

DBF is a fixed-width binary format:

  1. 32-byte main header – record count, header length, record length.

  2. Field descriptors (32 bytes each) until 0x0D terminator – column names, types, widths, offsets.

  3. Fixed-width records starting at header_length – each record has a 1-byte deletion flag followed by concatenated field values, right-padded with spaces.

Because records are fixed-width, field indexing is pure arithmetic: record_i_field_j = header_length + i * record_length + field_offset_j. No search/scan kernels are needed.

Column extraction strategy by DBF field type:

  • Numeric (‘N’, ‘F’): Tier 1 NVRTC kernel (dbf_extract_numeric) parses fixed-width ASCII at known byte offsets to float64 on GPU. Each thread handles one record for one field. Output stays device-resident.

  • Date (‘D’): Tier 1 NVRTC kernel (dbf_extract_date) parses YYYYMMDD to int32 on GPU. Output stays device-resident.

  • Logical (‘L’): Tier 2 CuPy element-wise byte comparison at known offsets. Output stays device-resident.

  • Character (‘C’): D->H copy of column bytes, strip trailing spaces on CPU. Returns numpy string array (string data has no GPU use case).

All structural parsing (header, field descriptors) is CPU-side because the header is typically < 4 KB. No PrecisionPlan is needed because this is I/O parsing: integer-only byte classification and ASCII-to-number conversion that always produces fp64 storage. Same rationale as csv_gpu.py and gpu_parse/numeric.py.

Tier classification (ADR-0033):
  • Header parsing: CPU (small data, one-time)

  • Numeric field extraction: Tier 1 (custom NVRTC – ASCII-to-float64)

  • Date field extraction: Tier 1 (custom NVRTC – ASCII-to-int32)

  • Logical field extraction: Tier 2 (CuPy element-wise)

  • Deletion flag: Tier 2 (CuPy element-wise)

  • Character field extraction: CPU (string data)

Attributes

cp

Classes

DbfFieldDescriptor

Metadata for one DBF field (column).

DbfHeader

Parsed DBF file header.

DbfColumn

One parsed column from a DBF file.

DbfGpuResult

Result of GPU DBF parsing.

Functions

read_dbf_gpu_from_bytes(→ DbfGpuResult)

Parse DBF from in-memory bytes on GPU. No temp file needed.

read_dbf_gpu(→ DbfGpuResult)

Parse a DBF file on GPU.

dbf_result_to_dataframe(result, *[, include_deleted])

Convert a DbfGpuResult to a pandas DataFrame.

Module Contents

vibespatial.io.dbf_gpu.cp = None
class vibespatial.io.dbf_gpu.DbfFieldDescriptor

Metadata for one DBF field (column).

name: str
field_type: str
length: int
decimal_count: int
offset: int
class vibespatial.io.dbf_gpu.DbfHeader

Parsed DBF file header.

version: int
record_count: int
header_length: int
record_length: int
fields: list[DbfFieldDescriptor]
class vibespatial.io.dbf_gpu.DbfColumn

One parsed column from a DBF file.

Attributes

namestr

Column name from the DBF field descriptor.

dtypestr

Logical type: ‘float64’, ‘int32’, ‘string’, ‘bool’.

datacp.ndarray | np.ndarray

Parsed column data. Device-resident (CuPy) for numeric, date, and logical columns. Host-resident (numpy) for string columns.

null_maskcp.ndarray | np.ndarray | None

Optional boolean mask where True = null/missing value. Device-resident for GPU columns, host-resident for string columns.

name: str
dtype: str
data: object
null_mask: object | None = None
class vibespatial.io.dbf_gpu.DbfGpuResult

Result of GPU DBF parsing.

Attributes

columnsdict[str, DbfColumn]

Column name -> parsed column mapping.

n_recordsint

Number of records in the DBF file.

active_maskcp.ndarray

Device-resident uint8 array, shape (n_records,). 1 = active record, 0 = deleted (marked with *).

headerDbfHeader

Parsed header metadata.

columns: dict[str, DbfColumn]
n_records: int
active_mask: object
header: DbfHeader
vibespatial.io.dbf_gpu.read_dbf_gpu_from_bytes(raw: bytes, *, columns: list[str] | None = None) DbfGpuResult

Parse DBF from in-memory bytes on GPU. No temp file needed.

vibespatial.io.dbf_gpu.read_dbf_gpu(path: pathlib.Path | str, *, columns: list[str] | None = None) DbfGpuResult

Parse a DBF file on GPU.

Numeric columns are parsed entirely on GPU and stay device-resident. Character columns are copied to host and returned as numpy string arrays.

Parameters

pathPath or str

Path to the .dbf file.

columnslist of str, optional

Subset of column names to read. If None, all columns are read.

Returns

DbfGpuResult

Parsed result with column data, record count, active mask, and header.

Notes

The header (< 4 KB) is parsed on CPU. Record data is transferred to the GPU via kvikio (if available) or CuPy, and numeric fields are parsed in parallel by NVRTC kernels.

vibespatial.io.dbf_gpu.dbf_result_to_dataframe(result: DbfGpuResult, *, include_deleted: bool = False)

Convert a DbfGpuResult to a pandas DataFrame.

All device-to-host transfers are batched outside the column loop to avoid per-column sync overhead (ZCOPY002).

Parameters

resultDbfGpuResult

Output of read_dbf_gpu.

include_deletedbool, default False

If False, filter out records marked as deleted in the DBF.

Returns

pandas.DataFrame