vibespatial.io.dbf_gpu¶
GPU DBF (dBASE III) reader – fixed-width record parsing on GPU.
Parses the attribute table component of ESRI Shapefiles entirely on the GPU for numeric columns, keeping device-resident arrays that can feed directly into GPU analytics without a host round-trip.
DBF is a fixed-width binary format:
32-byte main header – record count, header length, record length.
Field descriptors (32 bytes each) until 0x0D terminator – column names, types, widths, offsets.
Fixed-width records starting at
header_length– each record has a 1-byte deletion flag followed by concatenated field values, right-padded with spaces.
Because records are fixed-width, field indexing is pure arithmetic:
record_i_field_j = header_length + i * record_length + field_offset_j.
No search/scan kernels are needed.
Column extraction strategy by DBF field type:
Numeric (‘N’, ‘F’): Tier 1 NVRTC kernel (
dbf_extract_numeric) parses fixed-width ASCII at known byte offsets to float64 on GPU. Each thread handles one record for one field. Output stays device-resident.Date (‘D’): Tier 1 NVRTC kernel (
dbf_extract_date) parses YYYYMMDD to int32 on GPU. Output stays device-resident.Logical (‘L’): Tier 2 CuPy element-wise byte comparison at known offsets. Output stays device-resident.
Character (‘C’): D->H copy of column bytes, strip trailing spaces on CPU. Returns numpy string array (string data has no GPU use case).
All structural parsing (header, field descriptors) is CPU-side because
the header is typically < 4 KB. No PrecisionPlan is needed because
this is I/O parsing: integer-only byte classification and ASCII-to-number
conversion that always produces fp64 storage. Same rationale as
csv_gpu.py and gpu_parse/numeric.py.
- Tier classification (ADR-0033):
Header parsing: CPU (small data, one-time)
Numeric field extraction: Tier 1 (custom NVRTC – ASCII-to-float64)
Date field extraction: Tier 1 (custom NVRTC – ASCII-to-int32)
Logical field extraction: Tier 2 (CuPy element-wise)
Deletion flag: Tier 2 (CuPy element-wise)
Character field extraction: CPU (string data)
Attributes¶
Classes¶
Metadata for one DBF field (column). |
|
Parsed DBF file header. |
|
One parsed column from a DBF file. |
|
Result of GPU DBF parsing. |
Functions¶
|
Parse DBF from in-memory bytes on GPU. No temp file needed. |
|
Parse a DBF file on GPU. |
|
Convert a DbfGpuResult to a pandas DataFrame. |
Module Contents¶
- vibespatial.io.dbf_gpu.cp = None¶
- class vibespatial.io.dbf_gpu.DbfFieldDescriptor¶
Metadata for one DBF field (column).
- name: str¶
- field_type: str¶
- length: int¶
- decimal_count: int¶
- offset: int¶
- class vibespatial.io.dbf_gpu.DbfHeader¶
Parsed DBF file header.
- version: int¶
- record_count: int¶
- header_length: int¶
- record_length: int¶
- fields: list[DbfFieldDescriptor]¶
- class vibespatial.io.dbf_gpu.DbfColumn¶
One parsed column from a DBF file.
Attributes¶
- namestr
Column name from the DBF field descriptor.
- dtypestr
Logical type: ‘float64’, ‘int32’, ‘string’, ‘bool’.
- datacp.ndarray | np.ndarray
Parsed column data. Device-resident (CuPy) for numeric, date, and logical columns. Host-resident (numpy) for string columns.
- null_maskcp.ndarray | np.ndarray | None
Optional boolean mask where True = null/missing value. Device-resident for GPU columns, host-resident for string columns.
- name: str¶
- dtype: str¶
- data: object¶
- null_mask: object | None = None¶
- class vibespatial.io.dbf_gpu.DbfGpuResult¶
Result of GPU DBF parsing.
Attributes¶
- columnsdict[str, DbfColumn]
Column name -> parsed column mapping.
- n_recordsint
Number of records in the DBF file.
- active_maskcp.ndarray
Device-resident uint8 array, shape
(n_records,). 1 = active record, 0 = deleted (marked with*).- headerDbfHeader
Parsed header metadata.
- n_records: int¶
- active_mask: object¶
- vibespatial.io.dbf_gpu.read_dbf_gpu_from_bytes(raw: bytes, *, columns: list[str] | None = None) DbfGpuResult¶
Parse DBF from in-memory bytes on GPU. No temp file needed.
- vibespatial.io.dbf_gpu.read_dbf_gpu(path: pathlib.Path | str, *, columns: list[str] | None = None) DbfGpuResult¶
Parse a DBF file on GPU.
Numeric columns are parsed entirely on GPU and stay device-resident. Character columns are copied to host and returned as numpy string arrays.
Parameters¶
- pathPath or str
Path to the .dbf file.
- columnslist of str, optional
Subset of column names to read. If None, all columns are read.
Returns¶
- DbfGpuResult
Parsed result with column data, record count, active mask, and header.
Notes¶
The header (< 4 KB) is parsed on CPU. Record data is transferred to the GPU via kvikio (if available) or CuPy, and numeric fields are parsed in parallel by NVRTC kernels.
- vibespatial.io.dbf_gpu.dbf_result_to_dataframe(result: DbfGpuResult, *, include_deleted: bool = False)¶
Convert a DbfGpuResult to a pandas DataFrame.
All device-to-host transfers are batched outside the column loop to avoid per-column sync overhead (ZCOPY002).
Parameters¶
- resultDbfGpuResult
Output of
read_dbf_gpu.- include_deletedbool, default False
If False, filter out records marked as deleted in the DBF.
Returns¶
pandas.DataFrame