|
|
| Restart | Tutorial & Test Data | Documentation | Go to Annotation-Modules | Group page |
|
Step 1: Specie selectionThe first step consists in selecting the particular specie the user wants to analyze (see figure 1). Currently 7 different species are implemented:
Step 2: Input and gene table selectionThis step requires the user to select an input option and a gene table to be matched (figure 2).Input optionsCurrently three different input options have been implemented:
Depending on the selected input option ConDist will perform a particular statistical test that best suits the input data (for a more detailed description please refer to the user manual). Gene tablesCurrently, the user can select between two different gene tables, RefSeq genes and Ensembl genes. The vast majority of the annotations depend on the genomic context, e.g. depend on the DNA sequence. Therefore, we opted to include just gene tables for which genomic coordinates are available.
Step 3: Selection of available annotations for statistical comparisonAfter, selecting the species, the input option and the gene table, the PHP interface looks up all available annotations for the selected combination. The third input page can be divided into three parts. On the top, the already selected parameters can be seen and changed if wanted. The second section consist of a list of default annotations sets. The advantage of these default sets resides in that the user has not to click-through all annotations but can directly choose one of the default sets and launch in directly. Therefore, several different default sets have been generated. For example, one set gives a cross-section of all available feature groups containing the most important annotations of each group. After that also for each feature group a default set with the most important annotations exists.
Finally, the third part of the page is made up of all existing annotations. Given that the annotation database is comprised by approximately 200 features, we opted to present the annotations in a compact way. By default, just the 6 feature groups can be seen: i) base composition, ii) physical properties of DNA and chromatin, iii) evolution, iv) general gene/protein properties, v) overlap with genomic elements and vi) gene expression (see figure 3 at the bottom). By clicking on top of the feature group name, all available annotations within this group become visible (see figure 4). Each feature group may be comprised by different sub groups. Figure 4 shows the expansion of a particular feature group (Base composition) which consists of several subgroups, among them those which can be seen in figure 4 like GC-content in different genomic gene regions, GC-content in the coding region and density of dinucleotides in different promoter regions. Each annotations appears with a short name (under which it is saved in de DB) and a short description of the feature. Moreover, when passing the mouse over the feature name, a pop-up window opens with a longer description. Each annotation can be selected by means of a check-box. After selecting, all desired annotations, by means of the Send Features button at the bottom of the page; the program is launched (send to a queuing system).
After launching the program, an intermediate page is shown which displays the current status of the job (figure 5). First, the PHP interface reads out the information about the job of the queuing system and shows the current status like, pending + position in queue or executing. Moreover, the job ID is shown and a link to the output page is given. When the program finishes, the output page gets automatically loaded in the browser. However, by means of the link, the user can also bookmark it and check the output later.
The figures 6-8 show the output pages for three possible input options. Each of the three pages is made up of a common part (virtually identical in all three pages) and an individual part (which corresponds to the particular statistical analysis for the selected input option). The common part contains:
Figure 6 shows the output for the analysis of an input gene list comparing it to a set of reference genes (the input list is a sub-set of the reference set). Like mentioned above, the top part of the page displays a common summary of the input data (common between all three types of analysis). The bottom table finally shows the specific output data for this analysis. On the left part, the result for the Kolmogorov-Smirnov test can be seen, and on the right side the outcome for the randomization test is shown. In case of the KS-test, the p-value, the maximal difference and the cumulative fraction plot are given. In case of the random sampling method the z-score, p-value, sampling mean and standard deviation are given. Moreover, the distribution of the sampling means is given. Note that this distribution should be fairly Gaussian, otherwise this test will loose its validity.
Figure 7 shows the output for a comparison of two gene lists. It can be seen that the top part of the output page is virtually identical to the output explained above (figure 6) and also the left part of the bottom table (the KS-test) is identical. The difference is in the other statistical test displayed in the bottom table on the right (randomization test of the means). The table shows the observed distance of the means between the two tables, the mean distance of the random reassignments (note that this should be virtually 0), the standard deviation and p-value. The p-value is the fraction of the random assignments which show a higher distance (Abs(difference)) than the observed distance.
Finally, figure 8 shows the results for the comparison of an input gene list with the corresponding homologous genes in a different species. Again, the top part is common as explained above. For this analysis type just one statistical test has been implemented, the paired t-test (the two gene lists are not independent as they hold the homologous of the same gene). The output table (bottom) shows the paired mean difference (the mean value of the differences between each pair of homologous genes), its standard deviation, the Student t and the corresponding p-value in the normal approximation (e.g. t is treated like it would be z, which from very low sample sizes on 30 is a very good approximation). Moreover, at the right side of the table, the distribution of the differences between the gene pairs is also shown. |