### COL 730 Assignment 2

Submit your assignment online at moodle

The goal is to learn to program in distributed memory model. Implement parallel LU decomposition. In particular, implement the following function:

```int luDecomposeMPI(char *filename, double* l, double* u, int *n);
```

luDecomposeMPI is the MPI version. It should return an error code less than 0 on error and the value of 0 on successful completion. The input matrix A is in a file, with n returned from the function. Expect n to be large -- it may exceed 106. This time the entire matrix may not fit in any single node. For algorithms, see here and here..

Use pivoting to deal with problems related to numerical instability.

For algorithms, also see this (with Figure 1.4 having pseudo-code) and this. (You may prefer to use the blocked version for this Assignment.) You can also refer to the Matlab version of LU decomposition here .

Submit the source code along with a makefile that builds executables as described below. The scoring will be based on correctness, speed and scalability on up to 16 nodes (and up to 384 MPI processes).

This time you should do parallel IO so that input reading and gathering is also parallelized. You should write a second program (name it "convertin") that takes the matrix in standard format in a file named "matrix" and re-writes it in a format of your choice on the file system in a file called "pmatrix". The matrix in the input file is in standard white space-separated "%f" format floats in row major order -- one row per line (so n = no. of lines and n = number of words per line). Your main program should read the matrix in "pmatrix", compute LU and write the results in files "plmatrix" and "pumatrix", respectively. Finally, write a third program (named "convertout") that converts both "plmatrix" and "pumatrix" files, respectively, to two row-major white-space separated "%f" format ascii files (one row per line) depicting full matrices in the standard format. You may name the output files "lmatrix" and "umatrix".

Also, produce the pivoted input matrix in a file called "ppmatrix." Its converted standard ascii format version can be named "apmatrix".

You will be timed on your first program only, named "computeLU", which reads the input matrix in your format and writes the output in your format.

In addition to the executables and the makefile, you must also submit a document, design.pdf describing your design and listing major design decisions you made. On what basis did you make these decisions? List experiments and their results that lead you to make the design decision you made. Why? Does your code scale? Demonstrate. Other than the correctness, you will be graded also on the speed and scalability of the implementation and your design choices.

Submit one zip files including everything.