This repository provides a serial implementation of the algorithm in C language, as well as two versions of the parallel equivalent in CUDA, with and without the usage of shared memory. The project was undertaken as part of the "Parallel and distributed systems" course of AUTH university.
A [Gaussian] kernel was used for the weighting function. The code was tested for different data sets and information regarding the execution time and correctness were extracted. In addition, the two versions of the parallel algorithm were tested and compared.
## Dependencies
For the serial algorithm only a compiler is needed (e.g. gcc).
To compile the parallel versions, the standard CUDA toolkit installation instructions for the intended platform should be followed beforehand as described [here].