The goal is to learn to program in shared memory model. Implement parallel LU decomposition. In particular, implement the following functions:
int luDecompose(double *A, double* l, double* u, int n); int luDecomposeP(double *A, double* l, double* u, int n);
luDecompose is the serial version and luDecomposeP os the parallel version. They should return an error code less than 0 on error and the value of 0 on successful completion. The input matrix A is nXn. Expect n to be large -- it may exceed 106. For algorithms, see here and here..
Compile the functions into a library called luDecompose - that is luDecompose.so or luDecompose.a, so the test code can directly call your function. Submit the source code along with a makefile that builds the library. You should write your own application program to test the library. The scoring will be based on correctness, speed and scalability on multicore shared memory systems. It will be run on computers with different core counts.
In addition to the library, you must submit a document, design.pdf describing your design and listing major design decisions you made. On what basis did you make these decisions? List experiments and their results that lead you to make the design decision you made. Why? Does your code scale? Demonstrate.
You should use tasks to implement it. Other than the correctness, you will be graded also on the speed and scalability of the implementation and your design choices.
Submit one zip files including everything.