Backend System

Tensor Frame uses a pluggable backend system that allows tensors to run on different computational devices. This page documents the backend architecture and API.

Backend Trait

All backends implement the Backend trait:

pub trait Backend: Debug + Send + Sync {
    fn backend_type(&self) -> BackendType;
    fn is_available(&self) -> bool;
    
    // Tensor creation
    fn zeros(&self, shape: &Shape, dtype: DType) -> Result<Storage>;
    fn ones(&self, shape: &Shape, dtype: DType) -> Result<Storage>;
    fn from_slice(&self, data: &[f32], shape: &Shape) -> Result<Storage>;
    
    // Arithmetic operations
    fn add(&self, lhs: &Storage, rhs: &Storage) -> Result<Storage>;
    fn sub(&self, lhs: &Storage, rhs: &Storage) -> Result<Storage>;
    fn mul(&self, lhs: &Storage, rhs: &Storage) -> Result<Storage>;
    fn div(&self, lhs: &Storage, rhs: &Storage) -> Result<Storage>;
    
    
    // Reduction operations
    fn sum(&self, storage: &Storage, axis: Option<usize>) -> Result<Storage>;
    fn mean(&self, storage: &Storage, axis: Option<usize>) -> Result<Storage>;
    
    // Data access
    fn to_vec_f32(&self, storage: &Storage) -> Result<Vec<f32>>;
}

Storage Types

Each backend uses a different storage mechanism:

pub enum Storage {
    Cpu(Vec<f32>),                    // CPU: simple Vec
    Wgpu(WgpuStorage),                // WGPU: GPU buffer
    Cuda(CudaStorage),                // CUDA: device pointer
}

pub struct WgpuStorage {
    pub buffer: Arc<wgpu::Buffer>,    // WGPU buffer handle
}

pub struct CudaStorage {
    pub ptr: *mut f32,                // Raw CUDA device pointer
    pub len: usize,                   // Buffer length
}

Backend Selection

Automatic Selection

By default, Tensor Frame automatically selects the best available backend:

CUDA (if available and feature enabled)
WGPU (if available and feature enabled)
CPU (always available)

// Uses automatic backend selection
let tensor = Tensor::zeros(vec![1000, 1000])?;
println!("Selected backend: {:?}", tensor.backend_type());

Manual Selection

You can also explicitly specify backend priority:

use tensor_frame::backend::{set_backend_priority, BackendType};

// Force CPU backend
let cpu_backend = set_backend_priority(vec![BackendType::Cpu]);

// Prefer WGPU over CUDA
let gpu_backend = set_backend_priority(vec![
    BackendType::Wgpu,
    BackendType::Cuda, 
    BackendType::Cpu
]);

Backend Conversion

Convert tensors between backends:

let cpu_tensor = Tensor::ones(vec![100, 100])?;

// Convert to GPU backend (if available)
let gpu_tensor = cpu_tensor.to_backend(BackendType::Wgpu)?;

// Convert back to CPU
let back_to_cpu = gpu_tensor.to_backend(BackendType::Cpu)?;

Performance Characteristics

CPU Backend

Pros: Always available, good for small tensors, excellent for development
Cons: Limited parallelism, slower for large operations
Best for: Tensors < 10K elements, prototyping, fallback option
Implementation: Uses Rayon for parallel CPU operations

WGPU Backend

Pros: Cross-platform GPU support, works on Metal/Vulkan/DX12/OpenGL
Cons: Compute shader overhead, limited by GPU memory
Best for: Large tensor operations, cross-platform deployment
Implementation: Compute shaders with buffer storage

CUDA Backend

Pros: Highest performance on NVIDIA GPUs, mature ecosystem
Cons: NVIDIA-only, requires CUDA toolkit installation
Best for: Production workloads on NVIDIA hardware
Implementation: cuBLAS and custom CUDA kernels

Backend Availability

Check backend availability at runtime:

use tensor_frame::backend::{cpu, wgpu, cuda};

// CPU backend is always available
println!("CPU available: {}", cpu::CpuBackend::new().is_available());

// Check GPU backends
#[cfg(feature = "wgpu")]
if let Ok(wgpu_backend) = wgpu::WgpuBackend::new() {
    println!("WGPU available: {}", wgpu_backend.is_available());
}

#[cfg(feature = "cuda")]
println!("CUDA available: {}", cuda::is_available());

Cross-Backend Operations

Operations between tensors on different backends automatically handle conversion:

let cpu_tensor = Tensor::ones(vec![100])?;
let gpu_tensor = Tensor::zeros(vec![100])?.to_backend(BackendType::Wgpu)?;

// Automatically converts gpu_tensor to CPU backend for the operation
let result = cpu_tensor + gpu_tensor;

Custom Backends

You can implement custom backends by implementing the Backend trait:

#[derive(Debug)]
struct MyCustomBackend;

impl Backend for MyCustomBackend {
    fn backend_type(&self) -> BackendType {
        // Would need to extend BackendType enum
        BackendType::Custom
    }
    
    fn is_available(&self) -> bool {
        true  // Your availability logic
    }
    
    // Implement all required methods...
    fn zeros(&self, shape: &Shape, dtype: DType) -> Result<Storage> {
        // Your implementation
    }
    
    // ... more methods
}

Memory Management

Reference Counting

Tensors use Arc<dyn Backend> for backend sharing
Storage is reference counted within each backend
Automatic cleanup when last reference is dropped

Cross-Backend Memory

Converting between backends allocates new memory
Original data remains valid until all references dropped
No automatic synchronization between backends

GPU Memory Management

WGPU backend uses WGPU's automatic memory management
CUDA backend manually manages device memory with proper cleanup
Out-of-memory errors are propagated as TensorError::BackendError

Keyboard shortcuts

Tensor Frame Documentation