OpenCL Utils#

cherab.iter.tools.opencl.opencl_utils.get_flops(device, verbose=True)#

Returns the theoretical peak performance of specified OpenCL GPU or accelerator. Currently supports only Nvidia, AMD, Intel or Mali GPUs.

Parameters:

device (device_type) – OpenCL device
verbose (bool) – Verbose output. Default: True.

Returns:

Theoretical peak performance in GFLOPs

Return type:

float

cherab.iter.tools.opencl.opencl_utils.get_best_gpu(platforms=None, device_type=12, verbose=True)#

Finds the fastest (in terms of theoretical peak performance) GPU and/or accelerator available in specified OpenCL platforms

Parameters:

platforms (Optional[list[pyopencl._cl.Platform]]) – List of pyopencl platform instances, by default None.
device_type (device_type) – OpenCL device type (GPU, ACCELERATOR, or both), by default pyopencl.device_type.GPU | pyopencl.device_type.ACCELERATOR.
verbose (bool) – Verbose output, by default True.

Returns:

The fastest GPU found

Return type:

pyopencl.Device

cherab.iter.tools.opencl.opencl_utils.get_first_device(platforms=None, device_type=12, verbose=True)#

Returns the first OpenCL device of specified type available in specified OpenCL platforms

Parameters:

platforms (Optional[list[pyopencl._cl.Platform]]) – List of pyopencl platform instances, by default None.
device_type (device_type) – OpenCL device type (CPU, GPU, ACCELERATOR, …), by default pyopencl.device_type.GPU | pyopencl.device_type.ACCELERATOR.
verbose (bool) – Verbose output, by default True.

Returns:

The first GPU found

Return type:

pyopencl.Device

cherab.iter.tools.opencl.opencl_utils.device_select(platfrom_id=None, device_id=None, device_type=12, verbose=True)#

OpenCL device selector. Returns the most powerfull OpenCL device availabe if device_type is GPU or accelerator or the first OpenCL device available if device_type is CPU.

Parameters:

platfrom_id (Optional[int]) – OpenCL platform ID, by default None.
device_id (Optional[int]) – OpenCL device ID (in the selected OpenCL platform), by default None.
device_type (device_type) – The type(s) of OpenCL device, by default pyopencl.device_type.GPU | pyopencl.device_type.ACCELERATOR.
verbose (bool) – Verbose output, by default True

Returns:

Selected OpenCL device.

Return type:

pyopencl.Device

Sart OpenCL#

class cherab.iter.tools.opencl.sart_opencl.SartOpencl(geometry_matrix, laplacian_matrix=None, device=None, block_size=256, copy_column_major=True, block_size_row_maj=64, use_atomic=True, steps_per_thread=64)#

Bases: object

This is a GPU version of invert_sart and invert_constrained_sart functions from the CHERAB. The geometry matrix and Laplacian matrix are provided on initialisation because they must be copied to GPU memory, which takes time. Then, inversions may be performed multiple times for different measurement vectors without copying the matrices each time. If required, the Laplacian matrix can be updated in any time by calling update_laplacian_matrix (new_laplacian_matrix) method.

Parameters:

geometry_matrix (ndarray) – The sensitivity matrix describing the coupling between the detectors and the voxels. Must be an array with shape (Nd, Ns).
laplacian_matrix (ndarray | None) – The laplacian regularisation matrix of shape (Ns, Ns). If not provided, the unconstrained SART inversion will be performed, by default None
device (pyopencl.Device) – OpenCL device which will be used for computations, by default None (autoselect).
block_size (int) – Number of GPU threads per block. Must be the power of 2. For the best performance try from 256 to 1024 for Nvidia (use 1024 on high-end GPUs), from 64 to 256 for AMD and from 16 to 64 for Intel GPUs, by default 256.
copy_column_major (bool) – If True, the two copies of geometry matrix will be stored in the GPU memory. One in row-major order and the other one in column-major order. This provides much better performance of the inversions but requires twice as much GPU memory. Set this to False if there is not enough GPU memory or if the SART inversion will be called only once, by default True.
block_size_row_maj (int) – If copy_column_major is set to False, this parameter defines the number of GPU threads per block in mat_vec_mult_row_maj() kernel used to calculate y_hat. Must be lower than block_size, by default 64 (optimal value for Nvidia GPUs).
use_atomic (bool) – If True, increases the number of thread blocks that can run in parallel with the help of atomic operations(custom atomic add on floats). Set this to False, if the atomic operations are running slow on your device (Nvidia GPUs before Kepler, some AMD APUs, some Intel GPUs), by default True.
steps_per_thread (int) – If use_atomic is set to True, this parameters defines the maximum number of loop steps performed by the parallel threads in a single thread block, by default 64 (optimal for Nvidia GPUs).

clean()#: Releases GPU buffers

update_laplacian_matrix(laplacian_matrix)#

Updates the Laplacian matrix in GPU memory

Parameters:: laplacian_matrix (ndarray) – laplacian matrix to update
Return type:: None

__call__(measurement_vector, initial_guess=None, max_iterations=250, relaxation=1.0, beta_laplace=0.01, conv_tol=0.0001, time_limit=None)#

Performs the inversion for a given measurement vector.

Parameters:

measurement_vector (ndarray) – The measured power/radiance vector with shape (Nd).
initial_guess (Optional[Union[ndarray, float]]) – An optional initial guess, can be an array of shape (Ns) or a constant value that will be used to seed the algorithm. When processing the series of measurements consecutive over time, use the solution found for the previous time moment as the initial guess for the next time moment, by default None.
max_iterations (int) – The maximum number of iterations to run the SART algorithm before returning a result, by default 250.
relaxation (float) – The relaxation hyperparameter, by default 1. See: A. Andersen and A. Kak, Ultrasonic imaging 6, 81 (1984) for more information on this hyperparameter.
beta_laplace (float) – The regularisation hyperparameter in the range [0, 1]. Used only if the Laplacian matrix was provided on initialisation, by default 0.01.
conv_tol (float) – The convergence limit at which the algorithm will be terminated, unless the maximum number of iterations has been reached. The convergence is calculated as the normalised squared difference between the measurement and solution vectors, by default 1.e-4.
time_limit (Optional[float]) – If set, the iterations will stop after this time limit (in seconds) is reached.

Returns:

The inverted solution vector of shape (Ns) and the convergence achieved after each iteration.

Return type:

tuple[ndarray, list[float]]