The goal is to learn to program GPUs using CUDA. Implement parallel merge sort using CUDA. Implement a CPU function
int mergesort(int *list, int *sorted, int n)
mergesort must use CUDA for its entire sorting and then return the sorted array in the user provided space pointed to by sorted.
Compile the functions into a library called cuMergesort - that is cuMergesort.so or libcuMergesort.a, so the test code can directly call your function. You should write your own application program to test the library. The scoring will be based on correctness, speed and scalability on a single K40.
In addition to the a library and the makefile, you must also submit a document, design.pdf describing your design and listing major design decisions you made. On what basis did you make these decisions? List experiments and their results that lead you to make the design decision you made. Why? Does your code scale? Demonstrate. Other than the correctness, you will be graded also on the speed and scalability of the implementation and your design choices.
Submit one zip files including everything.