kdb+cuda
I’ve been discussing a GPU implementation of q style languages with some former colleagues recently. Dusting off some old examples..
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # Compile the code into a shared library; the compiler # produces code for the GPU to execute the kernel # and a regular shared library for the host to kdb+ # can load the functions. amandla> nvcc -I. --compiler-options '-fPIC' -m64 cu.cu \ --shared -o ../l64/cu.so # Now, in kdb+, load in the gpu_square function and # test drive it.. q)square:`cu 2:(`gpu_square;1) q)numbers: "e"$til 10 q)numbers 0 1 2 3 4 5 6 7 8 9e q)square[numbers] q)numbers 0 1 4 9 16 25 36 49 64 81e q) |
Here’s the code.. all very simple.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | #include <cuda.h> #include "k.h" // Export the function we will load into kdb+. extern "C" K gpu_square(K x); // Define the "Kernel" that executes on the // CUDA device. __global__ void square_array(float *a, int N) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx<N) a[idx] = a[idx] * a[idx]; } // A function to use from kdb+ to square a vector of reals by // - allocating space on the GPU // - copying the data over from the K object // - doing the work // - copy back and overwrite the K object data K gpu_square(K x) { // Pointers to host & device arrays float *host_memory = (float*) &(kF(x)[0]), *device_memory; // Allocate memory on the device for the // data and copy it to the GPU size_t size = xn * sizeof(float); cudaMalloc((void **)&device_memory, size); cudaMemcpy(device_memory, host_memory, size, cudaMemcpyHostToDevice); // Do the computaton on the card int block_size = 4; int n_blocks = xn/block_size + (xn%block_size == 0 ? 0:1); square_array <<< n_blocks, block_size >>> (device_memory, xn); // Copy back the data, overwriting the input, free // the memory we allocated on the graphics card cudaMemcpy(host_memory, device_memory, size, cudaMemcpyDeviceToHost); cudaFree(device_memory); return 0; } |
Now to something more interesting.. like a BGM model of a Bermudan swaption. Monte Carlo tends to be pretty zippy on a high-end GPU..

shweta Said,
August 6, 2009 @ 7:45 am
Hi,
We are working on performing mathematical operations on GPU
using CUDA. The inputs are to be obtained from a kdb data base. I need some pointers to how can we connect to the data base using C. Appreciate your help.
Thanks,
Shweta
niall Said,
August 6, 2009 @ 8:01 pm
I’d start with the “Interfacing with C” cookbook I wrote on http://code.kx.com