#til
## Vortex <> `cudf`
I've spent a lot of my week working on supporting exporting the results of a Vortex file scan as `cudf::column_view`. The goal is to allow query engines and tools that rely on cudf to be able to consume Vortex directly. ``
PR: https://github.com/vortex-data/vortex/pull/6253
Test Harness: github.com/vortex-data/cudf-test-harness/
The integration is actually fairly straightforward: there's an [Arrow C Device data interface](https://arrow.apache.org/docs/format/CDeviceDataInterface.html) specification which defines a struct that must be populated on the Rust side, and is read on the cudf side using [`cudf::from_arrow_device()`](https://docs.rapids.ai/api/libcudf/legacy/group__interop__arrow#ga578115ecaaca1df84f8ed0506f36f03b).
The struct lives in host memory, but all of the buffers that it holds points into device memory.
The easiest way to do that is with Anaconda:
```
conda install -c rapidsai -c conda-forge -c nvidia rapidsai::libcudf
```
## Troubleshooting
I've been hitting some weird issues after converting into cudf format.
The issue doesn't manifest for primitive arrays, which are serialized using the fixed-width strategy (i.e. 1 null buffer, one value buffer, no children):
```
$ ./cudf-test-harness check my-test-obj.so
Column Statistics(prims):
Type: UINT32
Size: 5 rows
Null count: 0
Has nulls: no
Num children: 0
values[0] = 0
values[0] = 1
values[0] = 2
values[0] = 3
values[0] = 4
```
However, doing the same thing with a `Utf8View` array fails spectacularly:
```
# update the test-obj.so file to export an array of string_view
$ ./cudf-test-harness check my-test-obj.so
Column Statistics(strings):
Type: STRING
Size: 5 rows
Null count: 0
Has nulls: no
Num children: 1
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemcpyAsync.
========= Saved host backtrace up to driver entry point at error
========= Host Frame: rmm::device_buffer::copy_async(void const*, unsigned long) [0xadf3] in librmm.so
========= Host Frame: rmm::device_buffer::device_buffer(void const*, unsigned long, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0xb0cd] in librmm.so
========= Host Frame: cudf::string_scalar::string_scalar(cudf::string_view const&, bool, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x356b451] in libcudf.so
========= Host Frame: cudf::string_scalar::string_scalar(rmm::device_scalar<cudf::string_view>&, bool, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x356b4c5] in libcudf.so
========= Host Frame: std::unique_ptr<cudf::scalar, std::default_delete<cudf::scalar> > cudf::detail::(anonymous namespace)::get_element_functor::operator()<cudf::string_view, (void*)0>(cudf::column_view const&, int, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [clone .isra.0] [0x9456e9] in libcudf.so
========= Host Frame: cudf::detail::get_element(cudf::column_view const&, int, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x946bef] in libcudf.so
========= Host Frame: cudf::get_element(cudf::column_view const&, int, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x946ddf] in libcudf.so
========= Host Frame: run_check(char const*) in main.cpp:177 [0x10b17] in cudf-test-harness
========= Host Frame: main in main.cpp:242 [0x11092] in cudf-test-harness
=========
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaGetLastError.
========= Saved host backtrace up to driver entry point at error
========= Host Frame: rmm::device_buffer::copy_async(void const*, unsigned long) [0xae28] in librmm.so
========= Host Frame: rmm::device_buffer::device_buffer(void const*, unsigned long, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0xb0cd] in librmm.so
========= Host Frame: cudf::string_scalar::string_scalar(cudf::string_view const&, bool, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x356b451] in libcudf.so
========= Host Frame: cudf::string_scalar::string_scalar(rmm::device_scalar<cudf::string_view>&, bool, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x356b4c5] in libcudf.so
========= Host Frame: std::unique_ptr<cudf::scalar, std::default_delete<cudf::scalar> > cudf::detail::(anonymous namespace)::get_element_functor::operator()<cudf::string_view, (void*)0>(cudf::column_view const&, int, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [clone .isra.0] [0x9456e9] in libcudf.so
========= Host Frame: cudf::detail::get_element(cudf::column_view const&, int, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x946bef] in libcudf.so
========= Host Frame: cudf::get_element(cudf::column_view const&, int, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x946ddf] in libcudf.so
========= Host Frame: run_check(char const*) in main.cpp:177 [0x10b17] in cudf-test-harness
========= Host Frame: main in main.cpp:242 [0x11092] in cudf-test-harness
=========
Error: Failed to convert Arrow array to cuDF column: CUDA error at: /tmp/conda-bld-output/bld/rattler-build_librmm/work/cpp/src/device_buffer.cpp:107: cudaErrorInvalidValue invalid argument
========= Target application returned an error
========= ERROR SUMMARY: 2 errors
```