Assessment
Module Title |
Cloud Computing |
|
---|---|---|
Module Code |
CSC-40039 |
|
Assessment Type |
Assignment |
|
Assessment Title |
CSC-40039 Assignment |
|
Weighting (% of module mark) |
100% |
|
Assessment Length (word count or equivalent) |
2000 words |
|
Submission Deadline (date and time) |
20 December 2024 |
1 pm |
Format of Submission |
||
Feedback Release Date [please ensure that this aligns with the requirements of Section 8 of the Assessment and Feedback Code of Practice. |
22/01/2025 |
|
Staff contact details |
Dr Goksel Misirli Email: g.misirli@keele.ac.uk Office: CR040, Colin Reeves Building |
Assessment Details:
The coursework involves writing a technical report (90%) and developing an application (10%) using Hadoop and MapReduce for a Bioinformatics use case. Completing four AWS modules from the AWS Academy Cloud Foundations course will contribute 20% of the technical report. You DO NOT need previous Bioinformatics knowledge for this assignment. Input files are provided in plain text and are tab-delimited. Please see the “Additional information” section about submission and formatting guidelines and the input files.
Part I – Application (10%)
Background. A biopharmaceutical company requires a computational analysis platform to compare various bacterial organisms based on their features. Your task is to investigate how cloud resources can be utilised to develop a scalable application to analyse several organisms as and when required. As the lead developer, you have been given seven input files (see the Appendix). Once developed, the application can compare and analyse thousands of organisms.
Each line in these files includes an identifier for a protein in the second column and information about the protein’s function in the fifth column (Figure 1). A protein’s function is specified via a related Gene Ontology (GO) term. Figure 1 shows example rows for the P0A9Q1 protein. The GO:0003677 term in the “GO ID” column (5th column) indicates the “DNA binding” function. Hence, it can be concluded that the P0A9Q1 protein can bind to DNA.
Task. You will create an application using Hadoop and MapReduce for the following task using Java or Python. You will use the files provided via KLE as input (see the Appendix for these files).
1
The GO:0030420 term (“establishment of competence for transformation”) and its subterms indicate whether a bacteria can have the competence feature, which is about being transformed to take up exogenous genetic materials. Your task is identifying which organisms may exhibit “competence” as a physiological state, using the input files.
Hint: In the context of this coursework, the use of the GO:0030420 term and its subterms in a file suggests that the corresponding organism may exhibit the competence feature. Therefore, you can count the GO assignments for the GO:0030420 term and related subterms for each file.
Figure 1. Example rows for the P0A9Q1 protein. The first row includes “GO:0003677” in the 5th column.
Part II – Technical report (90%)
For part two of this coursework, your task is to produce a report of approximately 2000 words. The report will cover the following topics.
a) Design a native cloud application for a medical imaging application that will be used to access different types of data, including patient details, diagnosis results, related images, and metadata about the images (60%). The application will be available as a web service. Regarding your design, you can only refer to AWS resources covered in lectures and the AWS Academy Cloud Foundations course. Part II does not involve any coding. You are expected to present your design only.
• Explain the IaaS, PaaS, and SaaS concepts using real-world examples from existing cloud providers and the literature in the context of your design (25%). • Describe your design and how you plan to integrate different AWS products for this web service, considering cloud-related characteristics, such as storage, scalability, and security (10%).
• Provide an overview diagram of your design (10%).
• Describe your web service design and its REST interface for different access and search options (15%).
b) Describe and evaluate the bioinformatics application (see section 0) you will develop (10%).
• Describe your source and output files and folders and how they should be used. • Critically evaluate design aspects and findings related to your solution (see section 0).
c) AWS Academy Cloud Foundations course (20%). List four AWS modules you completed from the online AWS Academy Cloud Foundations course and your score for each module. The course is accessible from the link below.
https://awsacademy.instructure.com//courses/96465
Invitation emails have been sent. Your score must be 70% or more to complete an AWS module.
2
Module Learning Outcomes:
In this assignment all module learning outcomes will be assessed.
https://www.keele.ac.uk/catalogue/current/csc-40039.htm
Assessment Criteria:
Submissions will be marked according to the university’s generic assessment criteria, available at https://www.keele.ac.uk/policyzone/data/assessmentcriterialevel3456ug.
Marks regarding Part I will be awarded considering design, coding and mark-up quality; functionality; presentation quality, and coding comments. Related to Part II.c, the AWS scores will be verified via AWS Academy using the course link provided.
Feedback to Students:
The feedback and the provisional marks will be provided via KLE.
Inclusive Practice:
This assignment has been designed to be inclusive. Examples include:
• The assignment includes various tasks, some of which correspond to specific practicals. You will be informed in advance when a practical contributes towards the assignment. This approach helps spread the work required during the semester, giving you flexibility. Moreover, this module comes with a complementary online AWS course. Completing some parts of this online AWS course contributes directly to the assignment as specified in Part II. You will be informed about this during the first lecture.
• The asynchronous AWS materials give you an opportunity to learn about industry relevant skills at your own pace.
• Although the assignment tasks are provided, you have the flexibility to design the cloud application in Part II in different ways.
• You can implement the application in Part I by either using Java or Python. • The reading list provides online materials to access information.
• Lectures are recorded, and you can watch any lecture session later. • You can ask questions during the lectures and practicals regarding the assignment. You will be informed about how a session may relate to the assignment in lectures. • Six practical sessions are scheduled, although you will be given materials for four practicals. You can use the remaining practicals to work on your assignment and clarify any issues, getting direct support from the demonstrators.
• Seven-day automatic extensions are supported for this assignment. Please check the university regulations, such as allowing you to apply for up to three assignments for
3
automatic extensions. You may be given additional time to submit your work if you require reasonable adjustments.
Use of Artificial Intelligence (AI):
AI can be used as a research tool and an efficient search mechanism.
Academic Misconduct:
Academic misconduct is doing something that could give you an unfair advantage in an assessment. It includes, but is not limited to, the following: plagiarism; collusion; contract cheating; cheating in an examination; falsification of data or sources; falsification of official documents or signatures. The University treats academic misconduct very seriously and penalties will be given for proven cases, including termination of studies in serious cases. It is therefore very important that you understand how to prepare and take assessments honestly. In order to assist you with this there are various resources and help available both as part of your programme of study and also centrally. For more information please visit: https://www.keele.ac.uk/students/academiclife/appeals-complaints
conduct/studentacademicconduct/
Academic Skills Support:
The Academic and Digital Skills team provide a range of additional online resources (e.g., study guides, Sways, Podcasts, workshops etc) to help you with your academic work and assessments. You can find more information here.
Additional information:
• Submission guidelines.
o Submit a technical report saved in PDF format that describes your work using the “report submission drop-box”. Name the PDF according to your 8-digit student ID (e.g., 09015680.pdf). The PDF report will also include the source code for your application, the list of output folders and files, and the content of these files for Part I as an Appendix at the end of the file. The number of words limit does not apply to this Appendix section for Part I.
o Additionally, upload a single ZIP file using the “supplementary software submission dropbox” and include all the source code, output files and folders. Name the zip file according to your 8-digit student ID (e.g., 09015680.zip). o Checklist.
▪ The report can contain up to five display items (figures or tables).
▪ The number of words MUST NOT exceed 2000, including the title,
references, and figure and table captions.
4
▪ The student ID and the number of words MUST be provided at the
beginning of the report.
▪ Make sure you reference existing research from the literature and follow academic conventions throughout the report, for example, using the
Harvard style.
• How to create a better submission.
o Your code should be clean, commented, and carefully tested.
o The technical report should be written in an academic style, referencing the literature and existing approaches. You can use existing information from the literature, to provide examples, back up your claims, or construct your
arguments. However, do not copy text directly and follow academic conduct principles. Please see the Academic Misconduct section of this document.
o Design decisions should be clearly explained. You should critically evaluate your designs, findings, and the potential integration with the cloud.
o Make sure to complete the practicals. Revise the lecture slides in KLE and modules from the AWS Academy Cloud Foundations course.
• Input files.
o The table below includes the list of bacterial organisms and corresponding input files. The files are available from the KLE assessment page.
Organism |
Input File |
|
---|---|---|
1 |
Escherichia coli K-12 |
Escherichia_coli_K-12_ecocyc_83333.gaf |
2 |
Bacillus subtilis 168 |
Bacillus_subtilis_168-224308.gaf |
3 |
Bacillus amyloliquefaciens FZB42 |
Bacillus_amyloliquefaciens_FZB42-326423.gaf |
4 |
Bacillus licheniformis ATCC 14580 |
Bacillus_licheniformis_ATCC_14580-279010.gaf |
5 |
Bacillus megaterium DSM 319 |
Bacillus_megaterium_DSM_319-592022.gaf |
6 |
Geobacillus kaustophilus HTA426 |
Geobacillus_kaustophilus_HTA426-235909.gaf |
7 |
Geobacillus thermodenitrificans NG80 |
Geobacillus_thermodenitrificans_NG80_2-420246.gaf |
o File format. Data are available in the Gene Annotation File (GAF) format, which defines tab-delimited columns to represent information, as shown below. The table below shows the GAF format comprised of 17 tab-delimited fields. More details can be found at http://geneontology.org/docs/go-annotation-file-gaf format-2.0.
Column |
Content |
Required? |
Cardinality |
Example |
---|---|---|---|---|
1 |
DB |
required |
1 |
UniProtKB |
2 |
DB Object ID |
required |
1 |
P12345 |
3 |
DB Object Symbol |
required |
1 |
PHO3 |
4 |
Qualifier |
optional |
0 or greater |
NOT |
5 |
GO ID |
required |
1 |
GO:0003993 |
6 |
DB:Reference (|DB:Reference) |
required |
1 or greater |
SGD_REF:S000047763 |
7 |
Evidence Code |
required |
1 |
IMP |
8 |
With (or) From |
optional |
0 or greater |
GO:0000346 |
9 |
Aspect |
required |
1 |
F |
10 |
DB Object Name |
optional |
0 or 1 |
Toll-like receptor 4 |
5
11 |
DB Object Synonym (|Synonym) |
optional |
0 or greater |
hToll |
---|---|---|---|---|
12 |
DB Object Type |
required |
1 |
protein |
13 |
Taxon(|taxon) |
required |
1 or 2 |
taxon:9606 |
14 |
Date |
required |
1 |
20090118 |
15 |
Assigned By |
required |
1 |
SGD |
16 |
Annotation Extension |
optional |
0 or greater |
part_of(CL:0000576) |
17 |
Gene Product Form ID |
optional |
0 or 1 |
UniProtKB:P12345-2 |
6