An analysis of the advantages and disadvantages of the different commercial providers (e.g. Azure, AWS, Google Cloud) over an onsite HPC and compared to each other for multicore workloads using batch processing.
Section 1: Parallel processing using cloud computing (40 marks)
The company you work for, a small consultancy with 40 consultants, needs to decide on its future computing strategy. The firm`s workload requires approximately 1,600 CPU-hours per consultant each month for running parallel codes. You have been asked to prepare a report recommending whether the company should invest in an on-premise HPC cluster or migrate these workloads to a major cloud provider.
Your recommendations should include:
- An analysis of the advantages and disadvantages of the different commercial providers (e.g. Azure, AWS, Google Cloud) over an onsite HPC and compared to each other for multicore workloads using batch processing. Your answer should include referenced examples.
- A comparison of the financial cost of an onsite HPC vs using cloud storage. You should include numerical examples in your comparison.
- Your recommendations to the company. This should include what platform you think they should use and why. You must justify this using the information in parts 1 and 2.
- A reference list, please use Harvard Style Referencing, you should reference everything including where you got your costs from. (https://www.scribbr.co.uk/referencing/harvard-style/)
Marks will be awarded based on the following criteria:
15 Marks: Your analysis of the different platforms Vs HPC
10 Marks: Your cost comparison
10 Marks: Your recommendations
5 Marks: Your use of references
This section should be approximately 1500 words.
Section 2: Parallel Programming Exercise (60 Marks)
To complete this assignment you will need the source code provided at the following URL.
https://moodlecurrent.gre.ac.uk/mod/resource/view.php?id=2163909
You are provided with a C program code (called jacobi2d.c) that solves a rectangular 2 dimensional heat conductivity problem using the Jacobi iterative method.
This code can be compiled and linked to produce a executable file called jacobiSerial by using the following commands:
gcc jacobi2d.c –o jacobiSerial.out
To run the executable type in the executable name: ./jacobiSerial.out
In addition to your PDF report, your final ZIP archive submission must contain all the source code implemented in this section. The code should be clean, and well-commented and include:
- Your final serial implementation from Part 1
- Your final parallel implementation from Part 2
- The SLURM submission script used for Part 3
Compile and execute the codes using the University HPC. Note this is a shared resource with a queue may become busy near the hand in date so make sure you give plenty of time to run your code and don’t leave it to the last minute. If you are unsure how to use the HPC please check the lab notes and the instructions on Moodle.
Please ensure you follow these steps carefully and with the code provided, work based on other Jacobi codes will not be marked and will receive a grade of 1.
Step 1 (20 Marks)
You are required to modify the code with boundary conditions set at top 10°C, bottom 10°C, left 25°C and right 40°C. This should be tested by running the code with a range of problem sizes. To do this you are required to modify the codes to:
- Reflect the boundary conditions described above
- Record the run-time of your code under a range of problem sizes greater than 100×100 using different levels of compiler optimization. Only include the wall clock time for the main execution loop.
- You will need to stop the results from printing if you are to obtain realistic measurements of the execution time. Make this configurable with a command line argument in jacobi.c and update the SLURM script accordingly.
In the report, please submit the snippets of code responsible for each of the modifications along with an explanation of how they work.
Step 2 (20 Marks)
You are then required to modify the application you created in step 1 to produce a basic parallel version using OpenMP. Copy your code into a new file (eg. jacobiparallel.c) so both versions can be maintained. The following commands will compile your parallel version on a platform that has OpenMP installed:
gcc -fopenmp jacobiparallel.c –o jacobiparallel.out
This version must be tested to establish correct operation using 1, 2, 4 and 8 threads/processors, regardless of performance. (These versions may run on any platform you choose as performance is not an issue at this stage.)
In your report, provide the result for a 16×16 case for 1,2,4 and 8 processors demonstrating the correct execution. Additionally, provide a graph of the runtime of the parallel version of this code against the number of cores.
Finally, describe the required changes and provide code snippets for each of the modifications you made. Discuss any OpenMP pragmas you might have used. Were any algorithm changes needed to avoid race conditions and if so, where?
Step 3 (20 Marks)
You will perform a rigorous performance evaluation of your OpenMP solver on an HPC system using the SLURM queue. The goal is to measure the scalability of your code and critically analyze the results. For all performance tests, you must disable the grid printing output from your code to ensure your timing measurements are accurate.
The performance tests will need to be done in 2 steps:
- Compile your serial code from Part 1 with and without compiler optimisation flags. Run each for several large grid sizes (e.g., 512×512, 1024×1024, and 2048×2048) and record the execution time for each. H
- Compile your parallel OpenMP code from Part 2 with the same optimization flags as the optimised code in the previous step (adding -fopenmp for OpenMP support). For each of the large grid sizes you tested above, run the code using 2, 4, and 8 threads. Record the parallel execution time for each combination of grid size and thread count (P).
Provide a report containing an interpretation of all your performance data. This report should contain, at a minimum:
- A table of all the collected timings, and a plot of the speedup vs the number of threads along with an analysis of your results.
- A discussion on how changing compiler optimisation flags changed the performance of your serial code, including a mention of potential issues that could occur with aggressive optimisation
- A discussion on how well your code scales with large problem sizes, and what might be preventing you from obtaining ideal linear speedup. Additionally, discuss why you might not observe the expected speedup if you were to run your code with small problem sizes.
The report portion for this section is expected to be approximately 1500 words.
Grading Criteria
Marks allocated according to the following rubric:
|
|
Part 1: Analysis of the different platforms (15 Marks) |
Part 1: Cost Comparison (10 Marks) |
Part 1: Recommendations (10 marks) |
Part 1: Referencing (5 marks) |
Part 2: Step 1 20 Marks |
Part2: Step 2 20 Marks |
Part2: Step 3 20 Marks |
|
0-49% |
unsatisfactory analysis that doesn’t cover the basics |
An unsatisfactory analysis that doesn’t cover the basics |
An unsatisfactory recommendation that doesn’t cover the basics |
Unsatisfactory referencing that doesn’t use the correct style or sources |
An unsatisfactory step, lacking either timings of the code, changes to boundary conditions, or any explanation in the report |
An unsatisfactory step, lacking the required parallelisation steps to the code or any explanation in the report |
An unsatisfactory step, there is no/little evidence of timing the code or no explanation in the report |
|
50-59% |
A satisfactory analysis that could be improved by considering more points |
A satisfactory analysis that could be improved by considering more points |
A satisfactory recommendation that could be improved by considering more points |
Satisfactory referencing, but more sources needed |
A satisfactory step, however, there are errors in the code or lack of detail in the report. |
A satisfactory step, however, there are errors in the code or lack of detail in the report. |
A satisfactory step, however, there are errors in the timings or lack of detail in the report. |
|
60-69% |
A good analysis that covers most of the salient points, could be improved however |
A good analysis that covers most of the salient points, could be improved however |
A good recommendation that covers most of the salient points, could be improved however |
Good Referencing, but could be improved with more variety of sources |
A good step, however, the code is mostly correct and documented but there is a lack of discussion of the results. |
A good step, however, the code is mostly correct and documented but there is a lack of discussion of the results. |
A good step, however, the timings are mostly correct and documented but there is a lack of discussion of the results. |
|
70-79% |
A very good analysis that covers most of the salient points |
A very good analysis that covers most of the salient points |
A very good recommendation that covers most of the salient points |
Very good Referencing, but could be improved with more variety of sources |
A very good step, however, the code and timings are correct but there are a few limitations in the report |
A very good step, however, the code is correct but there are a few limitations in the report. |
A very good step, however, the timings is correct but there are a few limitations in the report. |