CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`