Guide for Annotations-Modules
In the following, we will briefly discuss the different steps in Annotation-Modules.
Species
After launching the Annotation-Modules php interface, the first page which is generated asks for the species which is going to be analysed
(see Figure 1)
|
Figure 1: The species currently implemented in Annotation-Modules. |
Input & Parameters
The second page can be divided into two parts, the upper part shows the input options, and the bottom part deals with the
different parameters.
Three different input options are implemented:
| ID | Annotations (comma separated) |
| NM_152486 | blue,yellow |
| NM_032129 | blue |
| NM_198576 | yellow,black |
![]() Figure 2: The figure shows the input options of Annotations-Modules. Note that in all the pages the top box gives a short review of the options which have been already chosen. In this case it's the species/database, which can be also changed in this box. |
| Choose a gene table | Algorithms which accept different gene identifiers need to map these input IDs to an internal "working" table (for example Ensemble IDs). In general, this is invisible to the user and the table cannot be chosen. The disadvantage is that when mapping between gene IDs (like NCBI gene symbols), Protein IDs (like IPI, Swiss-Prot) or transcript IDs (Ensemble -ENST, or RefSeq), this will always lead to ambiguous decisions like the handling of multiple mappings etc. We therefore decided to offer different "cross-link tables" to avoid unnecessary mappings in some cases. For example, if the user is just interested in GO categories (which are annotated on a protein level) and his input list consists of protein IDs, a protein table like IPI or Swiss-Prot can be chosen to avoid unnecessary mappings. | |
| The filter method | It may sometimes happen that various genes in a list start or end (or both) at the same position in the genome The following options to exist: no filtering: Do not perform any filtering, accept all genes TSS:Group transcripts which start at the same position in the genome and remove all except one TSSTES: Group transcripts which start and end at the same position in the genome and remove all except one |
|
| The combination depth | The maximal number of combinations which are considered (the maximal number of items in one module) | |
| The p-value limit | Different p-value limits which can be chosen by the user. If p-value = 1 is chosen, just the first 20 Annotation-Modules with the lowest p-values are presented (whether significant or not!!!) | |
| The maximal number of combinations | The maximal number of combinations which are considered at each level. This parameter is important for lowering computational time. If you use a high combination depth (maximum number of items in a set of annotations) and a large number of different annotations, this parameter should not be set to a high value (otherwise, the computational time would increase considerably). |
![]() Figure 3: The figure shows the different parameters implemented in Annotations-Modules. The input data (the test data at the top of the page) is composed of RefSeq identifiers, so we have chosen RefSeq genes as the cross-link table. The test data gene list consists of genes which have a CpG island in their promoter regions. Therefore, we group the genes by their TSS coordinates, to eliminate the redundancy caused by these genes. |
![]() ![]() ![]() Figure 4: The annotations available for human RefSeq genes. |
![]() Figure 5: While the program is running, the php interfaces will check periodically if the application has finished. The user can leave open the browser window or bookmark the link and come back later. The results will be posted to this window. |
![]() Figure 6: The output page presenting the 4 different output files. |
![]() Figure 7: The output in html format with red shading see documentation. |
![]() Figure 7: A short summary of the job. |