Testing data transformations in MapReduce programs
Summary (1 min read)
1. Introduction 24
- SARS-CoV-2, as a virus firstly discovered in Wuhan, China, in December 2019, is a new coronavirus in 25 humans (1, 2, 3) that can cause severe acute respiratory syndromes.
- The SARS-CoV-2 genomes consist of a 5′ cap structure together with a 3′ poly (A) tail, which 28 allows it to acts as mRNA for translation of the replicase polyproteins (5).
- Therefore, spikes can be considered as the main target to inducing the 46 neutralizing antibodies as binding and indirectly fusion inhibitors for the vaccine design both immune 47 people diagnostics and vaccine application.
2.1. RBD Sequence Retrieval and Structural Prediction 51
- 52 The amino acid sequences of the RBD with the accession number of QHD43416.1 were retrieved from 53 the NCBI database in FASTA format (13).
- To initiate an immune response, first, the B cell epitope interacts with B lymphocytes (14).
- ABCpred 58 Prediction Server (http://www.imtech.res.in/raghava/abcpred/) was applied to predict potential linear 59 Based on the recurrent neural networks (RNNs), B-cell epitopes were predicted for 14-mers with a default 60 threshold of 0.51 (15).
- Thus can induces the neutralizing antibodies as indirect fusion inhibitors due to prohibit of 68 conformation change.
2.4. Physiochemical properties, solubility, antigenicity and allergenicity prediction 70
- Various physicochemical features of the RBD, including theoretical pI, instability index, aliphatic index, 71 estimated half-life in the mammalian reticulocytes in vitro, extinction coefficient, grand average of 72 hydropathicity , and molecular weight, were determined using the online web server ProtParam 73 (http://web.expasy.org/protparam/) (17).
- 84 Also, the secondary structure properties of the RBD were predicted via the RaptorX Property web server 85 (http://raptorx.uchicago.edu/StructurePropertyPred/predict/) (22). 86 87 2.6.
- Tertiary structure prediction and validation 88 89 The three-dimensional (3D) structure of the spike (S) glycoprotein was downloaded from the Iterative 90 Threading ASSEmbly Refinement (I-TASSER) server (https://zhanglab.ccmb.med.umich.edu/I-91 TASSER/).
- These angles 98 were then calculated based on the van der Waal radius of the side chain.
Did you find this useful? Give us your feedback
Citations
18 citations
Cites background or methods from "Testing data transformations in Map..."
...2015 [9] Big Data and Cloud Computing to process large data....
[...]
...[9] propose a testing technique named MRFlow, which is based on data flow test criteria and oriented to transformations analysis between the input and the output, and it can test defects in MapReduce programs....
[...]
13 citations
Cites background from "Testing data transformations in Map..."
...Several research lines propose generating the test input data through different approaches: using data flow [50], based on a bacteriological algorithm [51], or with input domain analysis together with combinatorial testing [52]....
[...]
11 citations
9 citations
Cites methods from "Testing data transformations in Map..."
...Other testing techniques focus on the generation of test input data with different approaches: using data flow [41], based on a bacteriological algorithm [42], or with input domain together with combinatorial testing [43]....
[...]
5 citations
Cites methods from "Testing data transformations in Map..."
...The MRFlow technique in [19] builds a data flow graph to define the paths to test and uses graph-based testing [1] to search for test cases in MapReduce....
[...]
References
20,309 citations
"Testing data transformations in Map..." refers background in this paper
...process large volumes of information in data-intensive applications, such as Big Data environments....
[...]
17,663 citations
1,084 citations
"Testing data transformations in Map..." refers background or methods in this paper
...In addition to the graph, the definition and uses of every variable are determined [25]....
[...]
...The testing technique named MRFlow (MapReduce data Flow) is based on data flow test criteria [25]....
[...]
...In the case of DEF-K/DEF-V to P-USE-TRANS(var, n, seq), instead of creating one tp, several tp are created following all of the next nodes after the conditional statement n, as in other data flow test criteria [25]....
[...]
834 citations
"Testing data transformations in Map..." refers background in this paper
...Categories and Subject Descriptors D.2.4 [Software Engineering]: Software/Program Verification – Validation General Terms Reliability, Verification....
[...]
750 citations
"Testing data transformations in Map..." refers background in this paper
...3 Related Work Several testing approaches exist over the MapReduce programs, but most of them are focused on testing the performance [16][14][9] and few are oriented to testing the functionality, that is the goal of this paper....
[...]
Related Papers (5)
Frequently Asked Questions (16)
Q2. What have the authors stated for future works in "Testing data transformations in mapreduce programs" ?
As future work the authors plan to apply MRFlow in more programs and to automate the technique in areas such as test coverage items, the execution of test cases, the derivation of test cases or the graph on which these test cases are derived.
Q3. What is the function that solves the problem?
The Map function receives the data input and emits a <key, value> pair, then the Reduce function receives <key, list(values)> pairs that contain all the information about each subproblem, and finally solves it with a <key, value> pairs.
Q4. What is the purpose of the transformations in MRFlow?
The program has 5 tp obtained from the transformation between values and sum (tp1), the non-existence of values transformations (tp2, tp3 and tp4) and the non-existence of key transformations (tp5).
Q5. What is the purpose of MRFlow graph?
In MRFlow graph, the paths under test start in definition of key/value and finish in each possible last transformation of such variables.
Q6. What is the purpose of the MapReduce program?
The MapReduce programs are often used in Big Data programs [28], which process large data (Volume), with a necessary performance (Velocity) and with different types of data, data from different sources, and data without apparently a data model such as for example emails or videos (Variety).
Q7. What is the purpose of the MRFlow graph?
In the MRFlow graph of both programs, the <key, list(value)> input variables has one definition and the program contains 3 transformations: transformation of values into another variable, no values transformation and no key transformation.
Q8. What is the output of the program that receives a word?
If WordCount program receives "hello, hello, hello", the expected output is hello:3, but the real output is hello:1, hello,:2 because the Reduce function receives an invalid key "hello," that is not a word.
Q9. What is the reason why MRFlow fails?
The faults are caused by the non-validation of key, but MRFlow in other programs could detect other defects relative to the transformations of keys and values.
Q10. What is the importance of MapReduce programs?
The quality in MapReduce programs is important due to their use in critical sectors, like health (ADN alignment [27]) or security (image processing in ballistics [17]).
Q11. What is the definition of the values variable?
The values variable is defined in node 0 and the last transformations are sum and values depending on whether statement 3 is reached or not.
Q12. What connector indicates that a transformation exists with both elements of the sequence?
The conjunction connector indicates that a transformation exists with both elements of the sequence, and the disjunction connector indicates that several transformations exist, one for each part of the sequence.
Q13. What is the purpose of the transformation paths?
The paths under test, called transformation paths (tp), are extracted from transformations between input and output in MRFlow graph.
Q14. What type of tests can be performed in MRFlow?
In order to explore the applicability of the testing technique, MRFlow is applied over two popular programs: WordCount [3] which counts the occurrences of each word in a text, and IPCountry [24] which counts the number of IPs (Internet Protocol addresses) in each country.
Q15. What is the purpose of this paper?
In order to detect these defects, this paper proposes a testing technique that analyzes the program transformations which could produce the failures.
Q16. What are the main features of MRFlow?
As future work the authors plan to apply MRFlow in more programs and to automate the technique in areas such as test coverage items, the execution of test cases, the derivation of test cases or the graph on which these test cases are derived.