Exercise 2 for the course "Parallel and distributed systems" of THMMY in AUTH university.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

1.1 KiB

Project includes 6 versions of a knn algorithm implementation:
Serial - space optimized
Serial - time optimized
MPI parallel - blocking communications - space optimized
MPI parallel - blocking communications - time optimized
MPI parallel - non blocking communications - space optimized
MPI parallel - non blocking communications - time optimized

Project folder also includes some test files and execution results (stats folder).

In folder testFiles there is a dataset of 60000 points of 30 dimensions each, as well as three IDX files extracted from Matlab storing correctly sorted IDs. These files are named according to the convention: numberOfPoints_k after the number of points and selected k that were used to run the Matlab script.

To run any version first run make. Then copy testFiles/data.bin and one of the IDX test files into the folder. Finally run with: mpiexec -np numTasks ./prog.out numPoints numDimensions k data.bin idxFileName

To extract a new IDX file from Matlab run knn with the dataset and then extract IDX variable to an ods/excel. Open the generated file and save as csv.