Device chosen is "GeForce GTX 480" Device has 15 multi processors and compute capability 2.0 Max threads per block supported are 1024 Reading dataset and labels... Done. Device memory allocation wall clock time = 0.039595 calculate_kernel_matrix_kernel called with: dimBlock.x = 30, dimBlock.y = 30 dimGrid.x = 35, dimGrid.y = 35 calculate_denominator called with: dimBlock.x = 1024, dimBlock.y = 1 dimGrid.x = 1, dimGrid.y = 1 shift_points_kernel called with: dimBlock.x = 28, dimBlock.y = 32 dimGrid.x = 37, dimGrid.y = 1 Recursion n. 0, error 638.769335 Recursion n. 1, error 86.996834 Recursion n. 2, error 540.383480 Recursion n. 3, error 130.879803 Recursion n. 4, error 126.467953 Recursion n. 5, error 256.415922 Recursion n. 6, error 6.383913 Recursion n. 7, error 1.206431 Recursion n. 8, error 0.373697 Recursion n. 9, error 0.190936 Recursion n. 10, error 0.107748 Recursion n. 11, error 0.061548 Recursion n. 12, error 0.035299 Recursion n. 13, error 0.020304 Recursion n. 14, error 0.011708 Recursion n. 15, error 0.006766 Recursion n. 16, error 0.003918 Recursion n. 17, error 0.002273 Recursion n. 18, error 0.001321 Recursion n. 19, error 0.000769 Recursion n. 20, error 0.000448 Recursion n. 21, error 0.000262 Recursion n. 22, error 0.000153 Recursion n. 23, error 0.000090 Copying between device and host wall clock time = 0.111358 Total number of recursions = 23 Mean Shift wall clock time = 1.279128