#til ## Vortex <> `cudf` I've spent a lot of my week working on supporting exporting the results of a Vortex file scan as `cudf::column_view`. The goal is to allow query engines and tools that rely on cudf to be able to consume Vortex directly. `` PR: https://github.com/vortex-data/vortex/pull/6253 Test Harness: github.com/vortex-data/cudf-test-harness/ The integration is actually fairly straightforward: there's an [Arrow C Device data interface](https://arrow.apache.org/docs/format/CDeviceDataInterface.html) specification which defines a struct that must be populated on the Rust side, and is read on the cudf side using [`cudf::from_arrow_device()`](https://docs.rapids.ai/api/libcudf/legacy/group__interop__arrow#ga578115ecaaca1df84f8ed0506f36f03b). The struct lives in host memory, but all of the buffers that it holds points into device memory. The easiest way to do that is with Anaconda: ``` conda install -c rapidsai -c conda-forge -c nvidia rapidsai::libcudf ``` ## Troubleshooting I've been hitting some weird issues after converting into cudf format. The issue doesn't manifest for primitive arrays, which are serialized using the fixed-width strategy (i.e. 1 null buffer, one value buffer, no children): ``` $ ./cudf-test-harness check my-test-obj.so Column Statistics(prims): Type: UINT32 Size: 5 rows Null count: 0 Has nulls: no Num children: 0 values[0] = 0 values[0] = 1 values[0] = 2 values[0] = 3 values[0] = 4 ``` However, doing the same thing with a `Utf8View` array fails spectacularly: ``` # update the test-obj.so file to export an array of string_view $ ./cudf-test-harness check my-test-obj.so Column Statistics(strings):   Type: STRING   Size: 5 rows   Null count: 0   Has nulls: no   Num children: 1 ========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemcpyAsync. =========     Saved host backtrace up to driver entry point at error =========         Host Frame: rmm::device_buffer::copy_async(void const*, unsigned long) [0xadf3] in librmm.so =========         Host Frame: rmm::device_buffer::device_buffer(void const*, unsigned long, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0xb0cd] in librmm.so =========         Host Frame: cudf::string_scalar::string_scalar(cudf::string_view const&, bool, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x356b451] in libcudf.so =========         Host Frame: cudf::string_scalar::string_scalar(rmm::device_scalar<cudf::string_view>&, bool, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x356b4c5] in libcudf.so =========         Host Frame: std::unique_ptr<cudf::scalar, std::default_delete<cudf::scalar> > cudf::detail::(anonymous namespace)::get_element_functor::operator()<cudf::string_view, (void*)0>(cudf::column_view const&, int, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [clone .isra.0] [0x9456e9] in libcudf.so =========         Host Frame: cudf::detail::get_element(cudf::column_view const&, int, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x946bef] in libcudf.so =========         Host Frame: cudf::get_element(cudf::column_view const&, int, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x946ddf] in libcudf.so =========         Host Frame: run_check(char const*) in main.cpp:177 [0x10b17] in cudf-test-harness =========         Host Frame: main in main.cpp:242 [0x11092] in cudf-test-harness =========  ========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaGetLastError. =========     Saved host backtrace up to driver entry point at error =========         Host Frame: rmm::device_buffer::copy_async(void const*, unsigned long) [0xae28] in librmm.so =========         Host Frame: rmm::device_buffer::device_buffer(void const*, unsigned long, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0xb0cd] in librmm.so =========         Host Frame: cudf::string_scalar::string_scalar(cudf::string_view const&, bool, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x356b451] in libcudf.so =========         Host Frame: cudf::string_scalar::string_scalar(rmm::device_scalar<cudf::string_view>&, bool, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x356b4c5] in libcudf.so =========         Host Frame: std::unique_ptr<cudf::scalar, std::default_delete<cudf::scalar> > cudf::detail::(anonymous namespace)::get_element_functor::operator()<cudf::string_view, (void*)0>(cudf::column_view const&, int, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [clone .isra.0] [0x9456e9] in libcudf.so =========         Host Frame: cudf::detail::get_element(cudf::column_view const&, int, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x946bef] in libcudf.so =========         Host Frame: cudf::get_element(cudf::column_view const&, int, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >) [0x946ddf] in libcudf.so =========         Host Frame: run_check(char const*) in main.cpp:177 [0x10b17] in cudf-test-harness =========         Host Frame: main in main.cpp:242 [0x11092] in cudf-test-harness =========  Error: Failed to convert Arrow array to cuDF column: CUDA error at: /tmp/conda-bld-output/bld/rattler-build_librmm/work/cpp/src/device_buffer.cpp:107: cudaErrorInvalidValue invalid argument ========= Target application returned an error ========= ERROR SUMMARY: 2 errors ```