Ad ID: 6576
Added: March 7, 2022
Ately (see Materials and Methods). The final expression matrices from the various compendia were used to construct the condition-independent and condition-dependent co-expression networks described in this study. Correlation matrices were first calculated using all probesets (30,217) with the Pearson’s correlation coefficient (r) to define expression similarity between probesets. Given the difficulty in distinguishing between poorly expressed genes and background noise, and in order to provide sufficient coverage for GCA, all probesets represented on the array were included in the analysis. Given the low level of functional annotation for each probeset within the Genechip citrus genome array initially compiled by Affymetrix, the latest gene annotation of the sweet orange genome  was retrieved from the Citrus sinensis Annotation Project (CAP) . The sweet orange genome annotation, which was based on evidence-based annotation and ab initio gene finding programs (described thoroughly in ), provides an accurate representation of the genes of sweet oranges. Therefore, an attempt to reannotate the probesets was initiated. By using the consensus sequence of each probeset and performing a BLASTx search against all sweet orange protein-coding genes  (described in the Methods section), 23,178 probesets (from a total of 30, 217) were successfully annotated. Similarly, a separate annotation previously conducted by Zheng and Zhao , based on Arabidopsis orthologs and homologs managed to ascribed 22,773 probesets with a putative function. In most cases, the probesets’ annotations our’s and the latter study were similar. Nevertheless, the union of these annotations resulted in 25,147 probesets having at least one putative function ascribed to each probeset (based on either approaches), which constitutes an improvement over previous functional annotation attempts and provides a better overview of the gene function of citrus genes represented on the array. Next, raw r values for every relationship between probesets were transformed into highest reciprocal ranks (HRR), which serves as an index for gene co-expression. Similar to mutual ranks (MR), HRR defines the mutual coexpression relationship between two entities (genes) of interest, is relatively simple to calculate, and is robust to outliers PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/627520 while effectively retaining weak but significant co-expression relationships [14,26]. Statistical significance of HRR values estimated from the distribution of HRR values (of 100 microarray data permutations)  showed values between 310 and 340 (P < 0.01), and would provide a reasonable cut-off to infer co-expression relationships in most cases (Additional file 1: Table S3). This HRR cut-off for biological relevance value is similar to those previouslyWong et al. BMC Plant Biology 2014, 14:186 http://www.biomedcentral.com/1471-2229/14/Page 4 ofreported for HRR GCN in Arabidopsis (HRR cut-off 228)  and grapevine (HRR cut-off 350) . Additionally, BIZ 114 HRR values 1,200 were also statistically significant (P < 0.05) in most cases. While this analysis revealed that HRR values 340 (and 1200) would be statistically reliable to construct the various GCNs, we empirically determined that the top 100 HRR (top k = 100) for each gene would also be a reasonable threshold for managing the list of co-expressed genes while maintaining biological relevance (and statistical significance). Previous studies have discussed several examples in which defining a.