CDS complement (839 - 1108) /gene="1" /product="gp1" /function="hypothetical protein" /locus tag="GalacticEye_1" /note=Original Glimmer call @bp 1108 has strength 6.21; Genemark calls start at 1108 /note=SSC: 1108-839 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp01 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.02239E-59 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.543, -3.5316035464369326, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp01 [Gordonia phage Woes] ],,YP_009273392,100.0,1.02239E-59 SIF-HHPRED: SIF-Syn: NKF, the upstream gene is a DNA helicase (Pham 54968) conserved in phages Anamika and Hello, both of which are in subcluster CS3. There is no downstream gene for this reverse gene, as it is the first gene in the genome. /note=Primary Annotator Name: Juarez, Sabrina /note=Auto-annotation: Glimmer and GeneMark both call a start site at 1108. /note=Coding Potential: Good coding potential is found in both Self- and Host-Trained GeneMark in the reverse reading frame. The chosen start site includes all the coding potential for the ORF in Host- and covers all atypical potential, but falls slightly short of typical coding potential in GeneMark Self-Trained. This site results in the longest ORF, begins with a common start codon, ATG, and is long enough, 270 bp to be valid. /note=SD (Final) Score: The final score is the best option, -3.532. It has the second best z-score, 2.543. /note=Gap/overlap: A 4bp overlap with the downstream gene suggests the gene may be part of an operon. This overlap is conserved in other phages, such as, Hello and Newt. /note=Phamerator: The Pham number as of 03/25/2022 is 10099. It is conserved in other phages within subcluster CS3, such as Anamika and Guillaume. /note=Starterator: The most called start site is not present in GalaticEye.Instead, start site 3 is called which is 1108 in GalacticEye. It is found in 21/60 of the genes within this pham. However, it is called 100% of the time when present and was manually annotated for 13 CS3 subcluster phages. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence, this is a real gene, and most likely starts at 1108. /note=Function call: No Known Function. The top two phagesDB BLAST hits have the function listed as unknown function in Gordonia phage Woes (E-value of 3e-47 and 100% identity) and Gordonia phage Teal (E-value of 3e-47 and 100% identity). The top three NCBI BLAST hits are listed as hypothetical proteins (100% coverage, 87%+ identity, and E-value <7e-50). CDD and HHpred had no relevant hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Araque, Colette /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: You might want to include more supporting information in your analysis like does this chosen start site encode for a common start codon or does this start site result in the LORF. CDS complement (1105 - 3405) /gene="2" /product="gp2" /function="DNA helicase" /locus tag="GalacticEye_2" /note=Original Glimmer call @bp 3405 has strength 9.66; Genemark calls start at 3405 /note=SSC: 3405-1105 CP: yes SCS: both ST: SS BLAST-Start: [DNA helicase [Gordonia phage Teal]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.527, -3.5660483139923818, no F: DNA helicase SIF-BLAST: ,,[DNA helicase [Gordonia phage Teal]],,QDF16864,100.0,0.0 SIF-HHPRED: DNA replication licensing factor MCM7; protein-DNA complex, replisome, AAA+ helicase, CMG, GINS, fork DNA, MCM, fork protection complex, CIP-box, REPLICATION; HET: ANP; 3.4A {Saccharomyces cerevisiae (strain ATCC 204508 / S288c)},,,6SKO_7,48.9556,99.3 SIF-Syn: The function of this gene is DNA helicase, which is conserved in phages Anamika and Hail2Pitt, both of which are in subcluster CS3. Upstream gene is in pham 10099, as it is in both Anamika and Hail2Pitt. Downstream gene is in pham 4181, which is also the same in Anamika and Hail2Pitt. /note=Primary Annotator Name: Li, Shally /note=Auto-annotation: Glimmer and GeneMark both list the start site at 3405. /note=Coding Potential: The self-trained GeneMark has high coding potential throughout the entire gene. The host-trained GeneMark has High coding potential through some parts of the gene, but not all. For both the self and host trained GeneMarks, the coding potential is only on the third reading frame in the reverse direction, indicating that this gene is in the reverse direction. /note=SD (Final) Score:-3.566. This is the best final score with a reasonable gap. /note=Gap/overlap: Overlap of 4 bp (-4bp), this indicates that this gene may be part of an operon. /note=Phamerator: The pham is 54968 as of 3/31/22. Pham 54968 has 60 members, 14 of which are drafts, and all belong to cluster CS. The gene is conserved in all 46 non-draft members, including phages Anamika and Hail2Pitt. /note=Starterator: Start site 3 is called by 38 of the 46 non-draft members of the pham. this start site is called 100% of the time when it is present. Start site 3 corresponds to position 3405 in GalacticEye. This start position is agreed upon by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with start site 3405. /note=Function call: DNA Helicase. The top 2 non-draft hits (Teal and Newt) on phagesdb BLAST have the function of DNA helicase with e-values of 0 and high scores (1518 and 1517, respectively). Both of these phages also belong to the same subcluster as GalacticEye (CS3). The top two NCBI BLAST hits also call the function of DNA Helicase with e-values of 0, 100% coverage, and 99% identity. HHPRED had several strong hits with the function of DNA helicase (e-value < 10^-10, probability 99%, coverage > 47%). These hits were from a diverse range of organisms, from yeasts to humans, suggesting that this gene is highly conserved across species and may be evidence of horizontal gene transfer. There were no relevant hits in CDD. /note=Transmembrane domains: Both TmHmm and Topcons do not predict any transmembrane domains so this protein is not a membrane protein. /note=Secondary Annotator Name: Bharadwaj, Shreya /note=Secondary Annotator QC: I have QC`ed this annotation and agree with the primary annotator`s location and function calls. CDS complement (3402 - 3620) /gene="3" /product="gp3" /function="hypothetical protein" /locus tag="GalacticEye_3" /note=Original Glimmer call @bp 3581 has strength 7.36; Genemark calls start at 3581 /note=SSC: 3620-3402 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_GUILLAUME_3 [Gordonia phage Guillaume] ],,NCBI, q1:s1 100.0% 4.36574E-45 GAP: -23 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.87, -3.4175091270247986, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_GUILLAUME_3 [Gordonia phage Guillaume] ],,QAX94289,100.0,4.36574E-45 SIF-HHPRED: SIF-Syn: Will refer to pham numbers for NKF: 4181, upstream gene is 3389, downstream gene is DNA helicase, just like in phage Hail2Pitt. 4181, upstream gene is 3389, downstream gene is 54968, just like in phage Newt. 4181, upstream gene is 3389, downstream gene is 54968, just like in phage Luker. Completed 3/31/2022 (in case pham numbers alter) /note=Primary Annotator Name: Patel, Rishi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 3581. Both call the start codon as GTG. /note=Coding Potential: The coding potential for this open reading frame (ORF) is only on the reverse strand which shows that it is a reverse gene. In the Host-Trained GeneMark, the coding potential is apparent within the called start site (3581) and stop site (3402), which is evidence that it is a real gene. Additionally, the Self-Trained GeneMark also shows coding potential in the region, however, the potential extends beyond the called start site (3581) past the 3600 nucleotide mark. This is evidence that the gene is real, however, the start site may be different from what was originally called. /note=SD (Final) Score: The final score is not the best option of all the start sites listed (-5.344), in fact, it is not even the second or third best. The best Final Score is -2.505 (start site at 3653) and the second best is -3.418 (start site 3620). Additionally, the Z score is not even the third highest. The best two Z scores are, once again, the two start sites mentioned above with 3653 being a bit better than 3620. Once again, this shows evidence against the called start site. /note=Gap/overlap: The gap between the stop site and the downstream gene (Gene #2) appears to be conserved across phages in the cluster (CS3) at -4, most likely indicating an operon. However, the gap between the start site and the upstream gene (Gene #4) does not appear to be conserved. The called start site shows a gap of 16 bp, however, when looking at phages of the same cluster, all of them show overlap with gene #4. For example, phage Guillaume has a -23 bp overlap with gene #4 and there are similar overlaps in phage Hello and Harambe. Thus, this may be an indication of an incorrect start site. However, all of the phages that I looked at (Harambe, Hello, Guillaume, Hail2Pitt) showed synteny between their respective gene #3, showing that this is likely a real gene. /note=Phamerator: Gene is found in pham number 4181 as of 3/25/2022. The gene is conserved when looking into phages Harambe, Hello, and Guillaume, which are all in the same cluster (CS3) as GalacticEye, thus it appears conserved in Cluster CS3 (evidence for real gene). This pham also contains phages from sublusters CS1 and CS2 along with CS3 – CS2 and CS3 make up the majority of the pham. Phamerator does not call a function. /note=Starterator: The called start site for my gene is Start site #7 (3581). Among 13 non-draft phages of cluster CS3, only 1 manual annotation called the start site at this site. On the other hand, start site #5 (3620) was called 10 times. Also, within the entire pham, start site #7 is only called once among 38 non-draft phages, while start site #5 is called 35 out of 38 times among non-draft phages. Also, all 51 phages of the pham (draft or non-draft) contain start site #5 (heavily conserved), while start site #7 is only in 22 out of 51 – and only called 22.7% of the time when it is present. A caveat here is that a lot of the phages in the pham do not have start site #7 at all, however, when looking at this more closely, start site #5 is still called more often among phages that have both #7 and #5. Among phages with both starts in the pham, 14/18 call start site #5 still, while only 2/18 call start site #7. It is apparent that the majority of manual annotations point in the direction of start site #5, which is strong evidence against the called start site for gene #3 of GalacticEye. /note=Location call: Considering the evidence above, this gene is most definitely a real gene. However, the called start site by GeneMark and Glimmer does not appear to be correct (3581). The evidence, despite the start codon being TTG (more rare), points to start site 3620 based on a multitude of evidence. The strongest evidence being that starterator calls 3620 (start site #5) the most times, meaning that this is the most conserved start site. Additionally, the Z score and Final score are a lot better for this start site compared to the called one. Finally, the gap between our gene (gene #3) and gene #4 appears to show a conserved overlap in a lot of phages from the same cluster, while, in GalacticEye with the called start, it shows a gap instead. Ultimately, the called start site (3581) will be altered to 3620 as of 3/31/2022. /note=Function call: Neither NCBI GeneBank Blast nor PhagesDB Blast garnered any evidence regarding a function call for the gene. Out of all of the results listed, none of them showed a function beyond “no known function” or “hypothetical protein.” Additionally, when looking at HHpred, a list of potential functions came up, however, the e-values were extremely high. For example, the top listed hit on HHpred showed a potential for hydrolase activity, however, the e-value was 7.2. Thus, this is not informative towards the function of the gene. Finally, CDD also did not produce any hits as to the function of the gene. Thus, it is safe to say, at the moment, that gene #3 of GalacticEye has no known function (NKF). /note=Transmembrane domains: No transmembrane domains (TMDs) were predicted by either TMHMM or TOPCONS, thus, it appears that my gene is not considered a “membrane protein.” At this point, this is in agreement with previous function calls for my gene as there were no sufficient calls for the function of the gene. As of now, this gene has no known function. /note=Secondary Annotator Name: Likwong, Chloe /note=Secondary Annotator QC: I agree with the primary annotator. Despite the Final Score, Z-score, and Start codon not being the best among the candidates, Startertor reports that Start@3620 has 35 MA`s, while Start@3581 only has one MA`s. Furthermore, this start covers the coding potential and seems to conserved in multiple phage genomes. CDS complement (3598 - 3723) /gene="4" /product="gp4" /function="hypothetical protein" /locus tag="GalacticEye_4" /note=Original Glimmer call @bp 3723 has strength 9.72; Genemark calls start at 3723 /note=SSC: 3723-3598 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp04 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 4.97957E-17 GAP: 123 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.793, -5.855923068877513, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp04 [Gordonia phage Woes] ],,YP_009273395,100.0,4.97957E-17 SIF-HHPRED: SIF-Syn: NKF, upstream gene is in Pham 4181 (Possible Function being NKF), downstream gene is in Pham 3524 (possible function being DNA Primase), just like phages Woes and Harambe. /note=Primary Annotator Name: Patel, Sahaj /note=Auto-annotation: Gene Mark and Glimmer called the start site to be at 3723 and the stop site to be at 3598. Additionally, it is a gene that runs in the reverse direction. /note=Coding Potential: Yes. Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. High coding potential is found in the GeneMark Self, but low coding potential found in the GeneMark Host. Therefore, there will be coding potential in this gene since the GeneMark Self supports the gene being active. /note=SD (Final) Score: -5.856, which was the second highest Final Score. The Z score is 1.793, which is the second-highest Z score. This justifies the existence of the start site in question, as it has a high Final and Z score, and it is further supported by the starterator. /note=Gap/overlap:The gap value is 123, which signifies no overlap and the possibility of the gene not being part of an operon. This overlap is conserved in other phages (compared to Hail2Pitt and Harambe, amongst others), so it, therefore, makes sense. There is a gap present after the gene which is about a 100 base pairs long, and there is no atypical activity present in the gap. The gap is present in other complete viral genomes (such as Hail2Pitt and Harambe), which justifies the presence of the gap. /note=Phamerator: 3389. Date 2/04/2022. It is conserved, found in Hail2Pitt (CS), Harambe (CS), and Woes (CS). /note=Starterator: Start site 14 in Starterator was manually annotated in 15/51 non-draft genes in this Pham. Start 14 is 3723 in Powerpuff. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 3723. /note=Function call: The function of this protein cannot be determined for numerous reasons. Phages DB Blast states that this protein has no known function, and this is reflected by the hits this database presented. They were found in the phages Teal and Nimil3, and the hits had high scores and low e-values (compared to the rest of the data presented). Additionally, NCBI BLAST reflected the same findings as to the Phages DB Blast, with the hits coinciding with two phages Woes and Anamika, and also calling that the protein has no known function. No data was gathered from CDD. Lastly, HHPred called two possible functions to the protein (NADH dehydrogenase and Ubiquinone Oxidoreductase), but the e- values were extremely high (4.8 and 5.4 respectively) and the probability was below 90% (71.97 and 70.63 respectively). This signifies that this protein indeed has no known function. /note=Transmembrane domains: ThHmm and Topcons predict there to be no transmembrane proteins present (predicted to be 0). /note=Secondary Annotator Name: Pramana, Martin /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. However, there are minor changes that need to be made such as stating that the function is NKF in the function call. Furthermore, small typos in the function call section such as stating the “the values were extremely high” for HHpred which was referring to the e-values. CDS complement (3847 - 4758) /gene="5" /product="gp5" /function="DNA primase/polymerase" /locus tag="GalacticEye_5" /note=Original Glimmer call @bp 4629 has strength 7.75; Genemark calls start at 4758 /note=SSC: 4758-3847 CP: yes SCS: both-gm ST: SS BLAST-Start: [DNA primase [Gordonia phage Guillaume]],,NCBI, q1:s1 100.0% 0.0 GAP: 139 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.024, -2.5052746077145835, yes F: DNA primase/polymerase SIF-BLAST: ,,[DNA primase [Gordonia phage Guillaume]],,QAX94291,99.3399,0.0 SIF-HHPRED: AE_Prim_S; AE_Prim_S: primase domain similar to that found in the small subunit of archaeal and eukaryotic (A/E) DNA primases.,,,cd04860,43.8944,99.2 SIF-Syn: Gene is conserved in other non-draft phages, including Anamika, Harambe, Hello, etc. DNA primase consistently called as the function. /note=Primary Annotator Name: Lin, Yuri /note=Auto-annotation: Glimmer calls the start site at 4629 bp with a start codon of ATG, and GeneMark calls the start site at 4758 bp with a start codon of TTG. /note=Coding Potential: Good coding potential in the reverse direction is indicated for this ORF on Genemark for both the host- and self-trained algorithms. The stop site includes all of the coding potential for both GeneMarks, but the start site of 4629 bp (Glimmer) does not; the coding potential extends to about 4750 bp on the self-trained GeneMark and about 4700 bp on the host-trained GeneMark. The GeneMark start site of 4758 would be a better choice to encompass all of this coding potential. /note=SD (Final) Score: The start site at 4758 bp has the least negative Final Score of -2.505 and the highest Z-score of 3.024, while the start site at 4629 has a much more negative score of -4.994. These values support a start site of 4758 bp. /note=Gap/overlap: The start site at 4758 bp has a gap of 139 bp, and the start site at 4629 has a gap of 268. The large gaps make sense in this context, as the subsequent gene switches in orientation. /note=Phamerator: As of 3/31/22, the pham number is 3524. The gene is conserved in 73 other phages, of which 21 are drafts. Other phages that have this gene with the same pham number are in clusters CS, CX, DF, and two are singletons. /note=Starterator: Start site 21 in Starterator was manually annotated in 43/52 non-draft genes in this pham and is the most often called start site in published annotations. Start site 21 is at position 4758 bp on GalacticEye. This further supports the start site called by GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 4758. The coding potential, SD Score, and most importantly, conservation of this start site in other genes in this pham support this call. /note=Function call: DNA primase. Two of the four top PhagesDB BLAST hits have the function of DNA primase (e-value <1e-155), and the top two hits on NCBI BLAST were also DNA primase (100% coverage, 98.46%+ identity, and E-value=0.0). The top hit in HHpred calls the gene a DNA primase small subunit PriS, with a 98% probability, 3e-8 e-value, 93.65 score/coverage. CDD had a hit with the AE_Prim_S_like super family member COG4951 (e-value 3.46e-07). Hits to HHpred also have polymerase function. /note=Transmembrane domains: No TMDs were predicted by TmHmm and TOPCONS. /note=Secondary Annotator Name: Pay, Iona /note=Secondary Annotator QC: Need to explain further why Starterator supports Start 4 over Start 18 -- your description in this section makes it sound like Start 18 would be the more supported site. Otherwise, very nicely done! [Edit: Addressed, thank you for catching that! Start sites 4 and 18 were typos--should have been start site 21 for both] CDS 4898 - 6295 /gene="6" /product="gp6" /function="hypothetical protein" /locus tag="GalacticEye_6" /note=Original Glimmer call @bp 4898 has strength 6.63; Genemark calls start at 4907 /note=SSC: 4898-6295 CP: yes SCS: both-gl ST: SS BLAST-Start: [terminase small subunit [Gordonia phage Diabla]],,NCBI, q1:s1 100.0% 0.0 GAP: 139 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.921, -5.40983604789782, no F: hypothetical protein SIF-BLAST: ,,[terminase small subunit [Gordonia phage Diabla]],,QOP65336,86.9658,0.0 SIF-HHPRED: SIF-Syn: NKF, upstream DNA primase, downstream terminase, large subunit. Similar phages: Lahirium, Anamika /note=Primary Annotator Name: Perez, Joshua /note=Auto-annotation: Gene was called by both Glimmer and Genemark, but do not start at the same site. The preferred program, Glimmer, has a start site at 4898 and calls ATG. Genemark has a start site at 4907 and calls ATG. /note=Coding Potential: Yes, this gene has reasonable coding potential predicted within the second RF forward. This chosen start site does cover all this coding potential in both the Host and Self GeneMark. /note=SD (Final) Score: -5.410; has the second highest final score present. The Z score present for this start is 1.921, which is decently high in terms of Z scores. /note=Gap/overlap: Glimmer start has a gap of 268bp and seems to be conserved in the Pham Maps. This large gap is due to a change in direction from reverse in gene 5 to forward in gene 6. This gap seems to be conserved but slightly shorter (around 150 bp) in other phages, with a few being Anamika, Neovie, and Guillaume. /note=Phamerator: Pham 100279 3/31/22 is present in other members of the cluster CS. I used Austin, Diabla, and 5 others from cluster CS. 7 of the similar members called a terminase small subunit as their function. /note=Starterator: start site number is 11 (4898bp) is as one of the most annotated start sites, it has 43/61 manual draft annotations, is called 93.3% of the time when present. This is considered to be the same as the suggested start on the drop down menu. /note=Location call: The overall evidence shows that this gene is most likely real at start 11 and bp number 4898 due to high coding potential in its second RF forward and evidence from Starterator. Glimmer start site of 4898 is the most likely start site, as it has the second highest ORF and second lowest gap bp. This is most likely a real gene as it has coding potential. /note=Function call: The top 5 NCBI hits sorted by e-value had low e-values (0), but only a decent % identity (around 74%). For example, phage Diabla has a 73.99% identity and an e-value of 0. This gene is said to be a terminase small subunit, which must be confirmed with other programs. The first hits were NKF. There is no data available from CDD. HHpred had no significant data with high e-values. Overall, the low percent identities and lack of evidence toward this gene’s function classify it as NKF. /note=Transmembrane domains: No predicted TMDs by TOPCON or TMHMM, which leads me to believe that absence of TMDs indicates that this gene is not a membrane protein and that the function remains unknown. /note=Secondary Annotator Name: Cho, Emily /note=Secondary Annotator QC: I have QCed this location call and agree with the primary annotator. Note: For the starterator section, clarify if the start site was the most conserved start site to use “suggested start” drop down menu. Otherwise, it should be NI. Fill synteny box. Do not check draft genomes as evidence for BLAST hits. Do not check NCBI BLAST hits that are called with a function and low identity to call this protein NKF. Very good explanation in your location call and gap on why it is okay to have a relatively large gap upstream more (does it show synteny with other final genomes? etc.) and selections of evidence for choosing the start site. CDS 6288 - 8264 /gene="7" /product="gp7" /function="terminase, large subunit" /locus tag="GalacticEye_7" /note=Original Glimmer call @bp 6288 has strength 14.08; Genemark calls start at 6288 /note=SSC: 6288-8264 CP: yes SCS: both ST: SS BLAST-Start: [terminase [Gordonia phage Hello] ],,NCBI, q1:s1 100.0% 0.0 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.32, -3.9822859353357907, no F: terminase, large subunit SIF-BLAST: ,,[terminase [Gordonia phage Hello] ],,QAX95278,100.0,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,81.459,100.0 SIF-Syn: Terminase, upstream gene is NKF in pham 100279 and downstream gene is a portal protein in pham 3448, just like in phage Guillaume. /note=Primary Annotator Name: Shah, Aayushi /note=Auto-annotation: Called in Glimmer and GeneMark, both at start site 6288 /note=Coding Potential: There is reasonable coding potential between the putative ORF, and the chosen start site covers all this coding potential. /note=SD (Final) Score: -3.982, best final score on PECAAN. Z-score is 2.32, which is tied for the best z-score called on PECAAN. /note=Gap/overlap: overlap of 8, large overlap but best choice based on coding potential and final score /note=Phamerator: The pham as of 03/31/22 is 40564. A lot of other phages in this cluster have this pham present, as seen in Adgers, Beaver, and Hello. /note=Starterator: Start site 69 is conserved across many manually annotated genomes in the pham, and represents the start site at bp 6288. 45/52 non-draft genes call this start site, which is strong evidence. It is the most annotated start site. /note=Location call: Based on evidence the gene is real and has a start site of 6288, which covers all coding potential and has the most reasonable overlap and final score. This is also the most manually annotated start site for this gene. /note=Function call: The top 5 PhagesDB BLAST hits call the function as a terminase with e values of 0 for all of them, constituting strong evidence. In the NCBI BLAST, the top 8 calls sorted by e value called it as a terminase, with e values of 0 for all, also strong evidence. While no CDD hits were found, HHpred found multiple hit with e values of under 1e-28, all with above 99% probability, and above 80% coverage that called it as a terminase or larger terminase. The listed function of terminase, large subunit makes sense, but this must be checked with if a small subunit of terminase is also present in this genome. If not, the function will be finalized as just terminase. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Enos, Alex /note=Secondary Annotator QC: I have QCed this location call and agree with the primary annotator. CDS 8384 - 10936 /gene="8" /product="gp8" /function="portal protein" /locus tag="GalacticEye_8" /note=Original Glimmer call @bp 8384 has strength 6.41; Genemark calls start at 8384 /note=SSC: 8384-10936 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Gordonia phage Anamika]],,NCBI, q1:s1 100.0% 0.0 GAP: 119 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.36, -3.9166971242758706, no F: portal protein SIF-BLAST: ,,[portal protein [Gordonia phage Anamika]],,ATW61105,99.7647,0.0 SIF-HHPRED: Portal protein; G20C, portal protein, bacteriophage, transport protein; 1.9A {Thermus phage P7426},,,5NGD_B,34.1176,99.7 SIF-Syn: The function of this gene is Portal Protein, which is conserved in phages Anamika and Harambe, both of which are in subcluster CS3. Upstream gene is in pham 40564, as it is in both Anamika and Harambe. Downstream gene is in pham 14254, in Anemika and Harambe it is 14171. /note=Primary Annotator Name: Villarreal, Alexia /note=Auto-annotation: Glimmer and Genemark Start sites are both at 8384. The Start codon is GTG. /note=Coding Potential: Coding Potential of this ORF is only on the forward strand, which indicates it is a forward gene. There is strong Coding Potential indicated by both Genemark Self and Host. Lots of synteny conservation observed of gene within other phages indicate high coding potential. /note=SD (Final) Score: The final score for this start site is -3.917, which is one of the better values listed indicating that this start site is the best and most probable. /note=Gap/overlap: There is a gap of 119, which may appear to be large however of the given values this start site provides the smallest gap between this gene and the others; this start site contains the longest ORF and is in good standing supporting this start site. /note=Phamerator:pham:3448. Date 03/31/2022. It is conserved in other phages within the same cluster CS such as Anamika, Luker, and Newt. /note=Starterator: Start site 10 in Starterator was manually annotated in 37/52 non-draft genes in this pham. Start 10 is at 8384 in GalacticEye. This evidence agrees with the site predicted by Glimmer and GeneMark. This Start site is the most annotated start site within this pham. /note=Location call: Confidently confirm this gene’s start site at 8384. Would choose to keep this state site for the data suggested by guidelines regarding coding potential, same starting site predictions as delivered by Glimmer and GeneMark, high synteny with other phage data, most reasonable (smallest) gap recorded between all other start sites, the start codon being GTG is highly probable as it is one of the more common initiation codons, as well as a good promising Final score of -3.917 and a good z-score of 2.36. Based on the above evidence, this is a real gene and the most likely start site is 8384. /note=Function call: According to the data obtained by PhagesDB Blastp and NCBI Blastp, as well as HHpred, suggested function of the gene is a portal protein as there is promising evidence with high query coverage (100%), high percentage of identity conservation (99%) as well as extremely low E-values of 0. /note=Transmembrane domains:TMHMM or TOPCONS both do not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Maraziti, Gabriela /note=Secondary Annotator QC: I have QC`ed this function call and agree with the first annotator. CDS 10923 - 13151 /gene="9" /product="gp9" /function="minor tail protein" /locus tag="GalacticEye_9" /note=Original Glimmer call @bp 10923 has strength 5.42; Genemark calls start at 10923 /note=SSC: 10923-13151 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage Newt]],,NCBI, q1:s1 100.0% 0.0 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.024, -2.794070146961553, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Newt]],,QDH48656,99.8652,0.0 SIF-HHPRED: SIF-Syn: This gene encodes for a minor tail protein, which is conserved in phages Teal, and Luker of the CS3 subcluster. The upstream gene is in pham 3448 and is a portal protein which is the same as the upstream gene in the genomes of both Teal and Luker. The downstream gene is in pham 101844 and has no known function which is the same as the upstream gene in the genomes of both Teal and Luker. /note=Primary Annotator Name: Araque, Colette /note=Auto-annotation: Both Glimmer and GeneMark called the same start site of 10923. Start site 10923 has a corresponding start codon of ATG. /note=Coding Potential: The gene has coding potential predicted within the putative ORF according to both GeneMark Self and Host; however, this coding potential is not consistent throughout (it goes up and down at various points throughout the ORF). The chosen start site covers all the corresponding coding potential. /note=SD (Final) Score: For the 10923 start site, there is a SD score of -2.794 and a Z score of 3.024. -2.794 is strong and is the best SD score as it is the least negative compared to the other scores. 3.024 is very strong since it’s above 2 and is the second best score after that of start site 10878. /note=Gap/overlap: There is a -14bp overlap which is longer than what is ideal, but can still be considered reasonable as it lies below the 50bp maximum. This overlap/gap is the smallest compared to the other start sites. The length of the gene with this corresponding start site is 2229bp long which is acceptable as it is well over the 120bp minimum.There is one other alternative start candidate that creates a longer ORF, but its gap is over 50bp (-59bp) and its SD score is more negative than the one of 10923 (-2.804). /note=Phamerator: As of 3/31/2022 this gene is found in Pham 14254. There are 49 total members in this pham (38 non-drafts and 11 drafts). This pham is present in other members of the same cluster (CS) which my phage belongs to. Some phages used for comparison are Luker, Guillaume, and Teal. The Phams database did not have a function called for this gene of the GalacticEye phage, but the majority of the members of pham 14254 did have a function called: minor tail protein. This function is consistent and found within the approved function list. /note=Starterator: There is a reasonable start site choice that is conserved among the members of pham 14254. The number of this conserved start site is 26 which corresponds with the 10923 base pair coordinate of my phage (site predicted by Glimmer and GeneMark). While this start site is found in only 10 of 49 genes in pham 14254, when it is present, it is called 100% of the time. There are 6 manual annotations of this start site, the most of any start site for this gene. /note=Location call: Gene 9 of the GalacticEye phage appears to be a real gene due to it being conserved in phamerator and it having considerable coding potential. The 10923 start site seems the most likely as it has a strong SD and Z score, encompasses all the coding potential, and is called 100% of the time when present in genes of the same pham. While the overlap is not ideal, I think the aforementioned reasons are more weighty in showing the potential of this start site. /note=Location call: Gene 9 of the GalacticEye phage appears to be a real gene due to it being conserved in phamerator and it having considerable coding potential. The 10923 start site seems the most likely as it has a strong SD and Z score, encompasses all the coding potential, and is called 100% of the time when present in genes of the same pham. While the overlap is not ideal, I think the aforementioned reasons are more weighty in showing the potential of this start site. /note=Function call: Gene 9 most likely encodes for a minor tail protein. There are multiple phagesDB BLAST hits that suggest this function and those hits have a strong e value of 0. There are also multiple NCBI BLAST hits that call this function with strong identity percentages and e values. /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs meaning that this is not a transmembrane protein. /note=Secondary Annotator Name: Nelson, Sarah (Shiloh) /note=Secondary Annotator QC: solid work on this annotation, including the synteny box! I agree with all of the information listed, location call, function call, etc & all evidence has been considered CDS 13222 - 13527 /gene="10" /product="gp10" /function="hypothetical protein" /locus tag="GalacticEye_10" /note=Original Glimmer call @bp 13393 has strength 3.55; Genemark calls start at 13393 /note=SSC: 13222-13527 CP: yes SCS: both-cs ST: NI BLAST-Start: [hypothetical protein SEA_LUKER_10 [Gordonia phage Luker]],,NCBI, q1:s12 100.0% 2.75868E-69 GAP: 70 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.329, -4.9523647543926295, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LUKER_10 [Gordonia phage Luker]],,QDH48258,90.1786,2.75868E-69 SIF-HHPRED: SIF-Syn: NKF. The upstream gene is in pham 15254 and encodes a minor tail protein similar to Guillaume. The downstream gene is in pham 3379 and encodes a capsid maturation protease similar to Lahirium. /note=Primary Annotator Name: Bharadwaj, Shreya /note=Auto-annotation: Neither Glimmer nor GeneMark called start sites of 13222. Both Glimmer and GeneMark called a start site of 13393. However, according to the starterator report, the start site with the greatest number of manual annotations was start site 6 @13222 with 29 manual annotations /note=Coding Potential: Coding potential is present according to both host trained and self-trained Genemark.There are black humps on both the self and host trained gene mark on the third frame of the coding potential graphs. There is an upward tick mark at roughly the 13222 start site and a downward tick mark at the 13527 stop site. This is indicative of a real gene that is a forward gene. /note=SD (Final) Score: -4.952, this is the third best final score but makes sense given that all of the start sites have large gaps. The z-score associated with this start site was 2.329 which is the third best z-score. /note=Gap/overlap: 70 which indicates a gap of 70 bp, this is on the larger side for gaps but this has been conserved across other phage genomes. /note=Starterator: only 2 other non-draft genes and they both call a very short gene which leaves a large gap. /note=Location call: With all of the evidence taken together, this is a real gene with a most likely start site of 13222. /note=Function call: NKF. All of the PhagesDB Blast hits indicate with a low e-value (2e-57) that the function is unknown for this gene. HHPRED hits indicated that it could be a cytochrome c oxidase but the e-values are very high (2.9-4.6) and the %coverage was very low (32-38%). NCBI BLAST hits indicate a “hypothetical protein” with a low e-value of >2e-68. CDD had no relevant hits. /note=Transmembrane domains: TMHMM and TOPCONS do not predict any TMDs, so this is not a membrane protein. /note=Secondary Annotator Name: Ruiz, Paola /note=Secondary Annotator QC: I have QC’ed this location call and do not agree with primary annotator. Starterator supports 13222 start site. Although Glimmer and GeneMark do not call the start site at 13222, I would still mention that Glimmer and GeneMark do agree with eachother and call for start site 13393. I would add z score under SD(final) Score. For Function call, you mention that phagesDB hits indicate a “high e value”, I think you meant “low e value” since the values were good! Maybe change the wording to “All of the phagesDB hits with low e value of 2e-57 indicated that the function is unknown.” Unselect HHPRED hits because the evalues are too high (Should be (<10e-3)) and do not really support function call. CDS 13542 - 14756 /gene="11" /product="gp11" /function="capsid maturation protease" /locus tag="GalacticEye_11" /note=Original Glimmer call @bp 13542 has strength 15.97; Genemark calls start at 13542 /note=SSC: 13542-14756 CP: yes SCS: both ST: SS BLAST-Start: [capsid maturation protease [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 0.0 GAP: 14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.537, -4.116905853494032, no F: capsid maturation protease SIF-BLAST: ,,[capsid maturation protease [Gordonia phage Woes] ],,YP_009273402,100.0,0.0 SIF-HHPRED: Phage_GPO ; Phage capsid scaffolding protein (GPO) serine peptidase,,,PF05929.14,38.3663,99.8 SIF-Syn: Capsid maturation protease, upstream gene NKF in pham 103570, downstream is pham 3174 with function Capsid Decoration Protein, just like in phages Jormungandr and Luker /note=Primary Annotator Name: Likwong, Chloe /note=Auto-annotation: Glimmer and GeneMark predicted the same starting sites at Start@13542. /note=Coding Potential: Via both GeneMark, Start@13542 covers the coding potential found in both the host and self maps. /note=SD (Final) Score: Final Score of -4.117 and a Z-score of 2.537 for the start site @13542. These values are not the best compared to the other start site candidates, but are the best among reasonable start sites (not too big of gap). /note=Gap/overlap: There is a 14bp gap with the upstream gene. In the gap of 14bp, there is no coding potential present in either GeneMark Self or Host maps. /note=Phamerator: This gene is in pham 3379 as of the date 3/31/2022. There are 73 members, 14 of which are non-final drafts. GalacticEye is found in cluster CS with members like Austin and Butterball, and the majority of the members are part of this cluster as well. The remaining members are part of clusters DF or CX, while a few others are singletons. Majority of the Final Draft genes list the function capsid maturation protease. /note=Starterator: In pham 3379, GalacticEye has the “Most Annotated” start, which is Start site 11 @13542 that has 46 MA’s. /note=Location call: GalacticEye 11 gene seems to be a real gene given that Start@13542 covers the coding potential, has a gap of 14, and has 46 MA’s done compared to the other potential Start sites; hence, the location call is at Start @13542, and also has the most positive Z-score and Final Score compared to the other start site candidates. /note=Function call: The function of the gene is: capsid maturation protease. In PhagesDB, there are multiple hits with strong e-values of 0 that list the function as capsid maturation protease. For NCBI BLAST hits, there are also strong e-values of 0 list capsid maturation protease as the function; the top three hits in NCBI BLASTP list a value >99% for %coverage, %identity, and %alignment. /note=Transmembrane domains: No transmembrane domains detected by TMHMM or TOPCONS, indicating that the gene is not a membrane protein. /note=Secondary Annotator Name: Juarez, Sabrina /note=Secondary Annotator QC: QC complete, I agree with the annotation and location call with the primary annotator. You may want to include more evidence for the function calls, such as coverage, % identity, or probability for a couple hits from phageDB or NCBI Blast. HHPred seems to have some possibly relevant hits as well. CDS 14793 - 15263 /gene="12" /product="gp12" /function="capsid decoration protein" /locus tag="GalacticEye_12" /note=Original Glimmer call @bp 14793 has strength 10.86; Genemark calls start at 14793 /note=SSC: 14793-15263 CP: yes SCS: both ST: SS BLAST-Start: [capsid decoration protein [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.73579E-111 GAP: 36 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.927, -2.7863799983944713, yes F: capsid decoration protein SIF-BLAST: ,,[capsid decoration protein [Gordonia phage Woes] ],,YP_009273403,100.0,1.73579E-111 SIF-HHPRED: HDPD ; Bacteriophage lambda head decoration protein D,,,PF02924.17,85.2564,99.4 SIF-Syn: Capsid decoration protein (3174), the upstream gene is capsid maturation protease (3379), downstream is major capsid protein (3542), similarly found in CS3 phages Guillaume and Anamika. /note=Primary Annotator Name: Pramana, Martin /note=Auto-annotation: Both GeneMark (Self and Host) and Glimmer calls for the same start site of 14793 /note=Coding Potential: There is a reasonable coding potential in the ORF with a forward direction (located at the third frame). This coding potential includes all possible start sites and is found in both GeneMark Self and Host. /note=SD (Final) Score: -2.786. It is the best Final Score on PECAAN. /note=Gap/overlap: There is an upstream gap of 36 bp. This gap is relatively reasonable because it is conserved in other phages such as Gaullime and Anamika. Furthermore, there is no coding potential between the upstream gene and the start site. This start site is the longest ORF. /note=Phamerator: As of 3/31/2022 the gene is found in Pham 3174. It is conserved in cluster CS3 (Anamika and Guillaume) but also found in other clusters such as CS2 (Diabla and Beaver). /note=Starterator: Start site 2 is conserved in 52/52 non draft members and is the most manually annotated start site. Start site 2 has a position of 15087 bp in Anamika. The start site agrees with both Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the start site is most likely 14793 bp. Starterator agrees with both Glimmer and GeneMark. /note=Function call: Capsid Decoration Protein. The top 2 matches from PhagesDB BLASTp function is capsid decoration protein. Both hits have high % identity (100%), low E values (9e-88), and high query coverage. Similarly, the 2 hits from NCBI BLASTp call for the same function of capsid decoration protein, with low E values (<3.55102e-110), reasonable % identity (>98.7179%), and high query coverage. CDD did not give any matches/ hits. The top 3 matches from HHpred all have high probability (>99.3), high % coverage (>75.641%), and low E-values (<8.9e-11). /note=Transmembrane domains: Both TMHMM and TOPCONS did not predict any TMDs, so it is not a transmembrane protein. /note=Secondary Annotator Name: Li, Shally /note=Secondary Annotator QC: I agree with the above location and function calls. CDS 15276 - 16475 /gene="13" /product="gp13" /function="major capsid protein" /locus tag="GalacticEye_13" /note=Original Glimmer call @bp 15276 has strength 14.45; Genemark calls start at 15276 /note=SSC: 15276-16475 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 0.0 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.268, -2.0720764396375664, yes F: major capsid protein SIF-BLAST: ,,[major capsid protein [Gordonia phage Woes] ],,YP_009273404,99.7494,0.0 SIF-HHPRED: Major head protein; Virus, Virion, Capsid, Major Capsid Protein, Decoration Protein, Icosahedral Virus, Caudovirus, Thermophilic, Bacteriophage; 2.8A {Thermus virus P74-26},,,6O3H_B,92.7318,100.0 SIF-Syn: This phage displays synteny with CS3 phages Hail2Pitt, Neoevie and Teal; and is consistently called as major capsid protein from pham 3542. The upstream gene displays synteny, and is consistently called as a capsid decoration protein. The downstream gene displays synteny, and is a protein with NKF. /note=Primary Annotator Name: Pay, Iona /note=Auto-annotation: GeneMark and Glimmer both call the same start site, 15276, which begins with GTG. /note=Coding Potential: Self-trained GeneMark shows coding potential in the putative ORF in both the third and fifth frame, with the potential in the third frame appearing much more significant and continuing from 13600, indicating that this gene might be part of an operon. The host-trained GeneMark also shows similar continuity in the putative ORF in the third frame, and no results in the fifth frame. This gene is also a reasonable length at 1200 bp. /note=SD (Final) Score: This start has the highest Z-score, and the least negative final score by about 2, indicating that the automatically annotated start is likely correct. /note=Gap/overlap: This gene has a reasonable gap of 12 bp, which is difficult to check against the coding potential maps given the large scale used by GeneMark. /note=Phamerator: As indicated by the coding maps and small gap, this gene appears very close or possibly overlapping with the genes either side of it in several CS3 phage genomes, including Neoevie and Luker. /note=Starterator: Start 7 (15276) has been annotated manually 47 times, and found in 91.9% of the genes in this phamily, and called 100% of the time when present, indicating that this is the correct start site. /note=Location call: Given the strong Starterator evidence for this start site, strong coding potential, and high degree of conservation in Phamerator, this gene appears to be real and start at 15276. /note=Function call: This gene is a member of 3542, and is listed by many phages in clusters CS and CX as a major capsid protein. NCBI BLASTp lists several very strong hits (E values of 0), all corresponding to Gordonia phage major capsid proteins, such as accession numbers YP_009273404 and QAX94299. PhagesDB BLAST confirms these results with phages such as Annamika, Harambe and Hail2Pitt, which all have similalry low e values and ascribed functions of major capsid proteins. CDD returns 1 hit with an low e-value, and good percent coverage, also for a major capsid protein (pfam03864). HHPred also returns several results with excellent e-values for major capsid proteins. /note=Transmembrane domains: TmHmm does not predict any transmembrane proteins. /note=Secondary Annotator Name: Patel, Rishi /note=Secondary Annotator QC: I agree with the annotations of this gene. The location call at start site 15276 appears to be the correct one as it appears to be heavily conserved among genes within the same pham and among other Cluster CS3 phages. Additionally, the coding potential appears within the called start and stop site and the gaps appears to be conserved. The function call also appears to be correct - major capsid protein. The e-values are extremely low (close to 0 and some are 0) for other function calls of other phages within Cluster CS3 that call a major capsid protein as well. Just remember to do the synteny box, check all evidence (believe you forgot PhagesDB blast), and check whether coding potential is present within the putative ORF (update this when done). CDS 16485 - 16679 /gene="14" /product="gp14" /function="hypothetical protein" /locus tag="GalacticEye_14" /note=Original Glimmer call @bp 16485 has strength 9.84; Genemark calls start at 16485 /note=SSC: 16485-16679 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ANAMIKA_14 [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 2.60181E-37 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.268, -2.0111200136961407, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ANAMIKA_14 [Gordonia phage Anamika] ],,ATW61111,100.0,2.60181E-37 SIF-HHPRED: SIF-Syn: NKF, upstream gene is major capsid protein (pham 3542, 3/31/22), downstream is NKF (pham 53201, 3/31/22), just like in phage Anamika and Harambe. /note=Primary Annotator Name: Cho, Emily /note=Auto-annotation: Both Glimmer and GeneMark call 16,485 as the start site. /note=Coding Potential: The host-trained GeneMark shows reasonable coding potential predicted within the putative ORF, and the chosen start site 16,485 covers all this coding potential. /note=SD (Final) Score: -2.011, The highest (best) value among 4 start site candidates /note=Gap/overlap: Upstream gap of 9. In the reasonable range (<50bp). The length of the gene is also reasonable (>120bp) /note=Phamerator: Found in Pham 7132 as of 30th March, 2022. The pham is in other members of the cluster CS, and there was no function called for any of the members. /note=Starterator: Start number 3 (16,485) was called most often, and 46 of the 46 non-draft genes called this start site, meaning it is conserved in all members in the pham. Starterator was informative. /note=Location call: The gene highly seems to be real based on the starterator (which showed the most conserved and manually annotated start site) and phamerator with a start site of 16,485 and good coding potential, all covered by the called start site. The gene also shows synteny with many other annotated genomes like Anamika and Harambe. It is also the common ATG start site with the best final score and Z score of 3.268, which is close to 2. The start site also yields the longest open reading frame. /note=Function call: The top 5 PhagesDb BLASTp hits, sorted by E-value, suggested no known function, with a score of 131 bits (329), 100 % identity, and low E-values. All hits except two genomes called no known function (NKF), and they had e values ranging from 6e-15 to 6e-31. The two searches that called function of minor tail protein and metalloprotease had an e-value of 6.7 and 8.7, so they were unlikely hits. The top 5 NCBI BLASTp hits, sorted by E-value, suggested NKF also,with 100% query coverage, high % identity (>89.06%), and low E-values ranging from 1e-31 to 3e-37. There were no NCBI Conserved Domain Database hits. HHpred hits showed very low probability and % coverage, so there were no meaningful hits. Therefore, the gene seems to have NKF. /note=Transmembrane domains: No TMHs reported from TmHmm. No TMDs reported from Topcon. This protein with no known function does not seem to have a transmembrane domain. /note=Secondary Annotator Name: Patel, Sahaj /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 16746 - 17030 /gene="15" /product="gp15" /function="hypothetical protein" /locus tag="GalacticEye_15" /note=Original Glimmer call @bp 16746 has strength 14.55; Genemark calls start at 16746 /note=SSC: 16746-17030 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp15 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 2.23512E-55 GAP: 66 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.024, -2.970161406017234, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp15 [Gordonia phage Woes] ],,YP_009273406,98.9362,2.23512E-55 SIF-HHPRED: SIF-Syn: The upstream gene is NKF (pham 7132), downstream gene is head-to-tail adaptor (pham 101914). This is similar to other cluster CS phages including Luker, Hello, Anamika, and Newt. /note=Primary Annotator Name: Enos, Alex /note=Auto-annotation: Glimmer and GeneMark both call 16746 as the start site. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The chosen start site covers all coding potential. /note=SD (Final) Score: The Final score of -2.970 was the best for the possible start sites. The alternative was -5.999 /note=Gap/overlap: The gap is 66 for which is the smaller of the 2 possible start sites and while it is a bit large, is conserved in other phages such as Anamika and Hello /note=Phamerator: Pham 53201 as of 3/31/22. It is conserved and found in Anamika (CS) and Hello (CS). /note=Starterator: Start site 1 in Starterator was manually annotated in 36/37 non-draft genes in this pham. Start 1 is 16746 in GalacticEye. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the consistent evidence from Glimmer, GeneMark, and Starterator, this is a real gene and the most likely start site is 16746 . /note=Function call: Unknown Function. All hits from phagesDB BLAST called unknown function with small E-values <10^-44. All NCBI BLAST hits also call "hypothetical protein" (100% coverage, 96%+ identity, and E-value <10^-55). HHpred also called an unknown protein but had very low probability and coverage. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Lin, Yuri /note=Secondary Annotator QC: I have QC`d and agree with the primary annotator. CDS 17037 - 17555 /gene="16" /product="gp16" /function="head-to-tail adaptor" /locus tag="GalacticEye_16" /note=Original Glimmer call @bp 17037 has strength 3.51; Genemark calls start at 17037 /note=SSC: 17037-17555 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail adaptor [Gordonia phage Hail2Pitt] ],,NCBI, q1:s1 100.0% 3.97405E-123 GAP: 6 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.535, -5.921045285829382, no F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Gordonia phage Hail2Pitt] ],,AVP43202,100.0,3.97405E-123 SIF-HHPRED: a.229.1.1 (A:) Hypothetical protein YqbG {Bacillus subtilis [TaxId: 1423]},,,d1xn8a_,42.4419,97.6 SIF-Syn: Head-to-tail adapter, upstream gene is NKF in pham 53201, downstream gene is NKF in pham 65696, like in phages Anamika and Lahirium. /note=Primary Annotator Name: Maraziti, Gabriela. /note=Auto-annotation: Glimmer and GeneMark both call the start at 17037. /note=Coding Potential: The gene has reasonable coding potential within the ORF and the start includes all typical and atypical coding potential. /note=SD (Final) Score: -5.921; this is not the most negative score. The Z-score is not one of the highest at 1.535. /note=Gap/overlap: 6 bp gap. This is the smallest gap of the possible start sites and creates the longest possible ORF. /note=Phamerator: Pham 101914 as of 3/31/2022. The gene and function call is conserved in many other members of the same cluster, CS, such as Adgers and Butterball. /note=Starterator: 37/46 non-draft genes in the pham call start site 4, which corresponds to position 17037 in the GalacticEye genome. This agrees with the start site called by Glimmer and GeneMark. This start site is called 96.1% of the time it is present. /note=Location call: Considering all of the evidence above, this gene is a real gene and has a start site at 17037 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: Head-to-tail adapter. All of the non-draft phagesdb BLAST hits have the head-to-tail adapter function (e-value < 10-96), and the top three NCBI BLAST hits have the same function (99% coverage, 99% identity, and E-value <10^-122). CDD and HHPred did not return any informative hits. /note=Transmembrane domains: no TMDs predicted by TMHMM or TOPCONS, therefore it is not a membrane protein. /note=Secondary Annotator Name: Perez, Joshua /note=Secondary Annotator QC: Great work! Some suggestions I have: Be more clear which line has coding potential (first RF forward, second RF reverse, etc.) I believe it is 3rd RF forward. For the gaps, maybe mention 3(+) phages this gap is conserved with. If the Phamerator returns some functions, maybe it would be useful to input that too (unless it didn`t). For Starterator, is it the most conserved start site? How much % of the time is it called? For location call, maybe add that it has coding potential and a conserved gap with "x" phages as evidence as well. Great work on the function and TMD`s. Also, for the synteny box, maybe input 1 or 2 more phages it shares this synteny with (so we have more coverage and larger sample size of this syteny). CDS 17552 - 18037 /gene="17" /product="gp17" /function="hypothetical protein" /locus tag="GalacticEye_17" /note=Original Glimmer call @bp 17543 has strength 4.45; Genemark calls start at 17543 /note=SSC: 17552-18037 CP: no SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_TEAL_17 [Gordonia phage Teal]],,NCBI, q1:s1 100.0% 1.58861E-112 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.343, -5.178441980375233, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TEAL_17 [Gordonia phage Teal]],,QDF16879,100.0,1.58861E-112 SIF-HHPRED: SIF-Syn: /note=AF 8/11/22: Overwhelming (58/60 non-draft calls) starterator data for start @ 17543, but why?? Upstream gene is head-to-tail adaptor; this gene has strong HHpred hit for "PF05069.16 Phage_tail_S ; Phage virion morphogenesis family". Next start site of 17552 has gap of -4, suggesting operon, which negates poorer RBS score. Going with start of 17552; unsure as to why so many people chose first one. /note=Primary Annotator Name: Nelson, Shiloh /note=Auto-annotation: The gene is 495 bp long, with coding potential in both glimmer and genemark. Both genemark and glimmer call the same start site, at 17543. As this start site is highly conserved, this shows that the gene has a greater chance of being ‘real.’ /note=Coding Potential: Glimmer and genemark show that there is coding potential, without violation of the basic guiding principles. All criteria are met — this is a real gene. The gene is 495 basepairs long, well over the 40 codon minimum length listed in the guiding principles, the gene shows synteny with non-draft phage genes sticker17 and guillaume. There are no switches in gene orientation. The genemark self annotation map shows that the gene is in the 2nd ORF in the forward direction, with negligible coding potential in the 1st reverse direction frame. /note=SD (Final) Score: The Final Score is -4.400 for the start site 17543. This is not the most negative score, however it is a very reasonable score — as it is more negative than -2. /note=Gap/overlap: The gap is -13 for the upstream gene. As this value is negative, this denotes an overlap of -13 bp. /note=Phamerator: All of the genes in this pham belong to the CS cluster. /note=Starterator: Start site 3 (17543) called 58/60 non-draft times.The start site of 17543 is conserved in 70 of the 73 nondraft genes in this pham. This is the most annotated start site. /note=Location call: This is a real gene, as it has good coding potential, which is conserved in the phamerator. From the evidence collected, the start site is 17543. /note=Function call: NKF — hypothetical protein — based on the data provided, it seems that the function of my gene is a hypothetical protein. As there were no hits and the values for HHPRED were substandard, I believe that the function of my ORF is NKF. The CDD site backs up my hypothesis, as there are no hits on this site. Based on the data provided, it seems that the function of my gene is a hypothetical protein. The NCBI values and function calls corroborate these findings — the very strong e-values indicated a hypothetical protein, as the first two hits on NCBI Blast have e-values less than 1e-113, with 100% coverage, 100% alignment and nearly 100% identity. /note=Transmembrane domains: zero — neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Shah, Aayushi /note=Secondary Annotator QC: Typo on location call, says "the start site is 65696." Maybe be a bit more explicit about why the start site with a 16 overlap was chosen over the 4 overlap with better final score, as I assume it is because of the starterator data and final score but it would be better if it was more clear. Also, the less negative the final score, the better– this wording was confusing in your explanation. Otherwise, I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score, and the calling as no known function. /note=— Shiloh — secondary annotator`s comments considered 4.13.22 — the start site with a 13 bp overlap was chosen over the start site with a 4 bp overlap as it has a less negative final score CDS 18034 - 18519 /gene="18" /product="gp18" /function="hypothetical protein" /locus tag="GalacticEye_18" /note=Original Glimmer call @bp 18034 has strength 3.13; Genemark calls start at 18034 /note=SSC: 18034-18519 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp18 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.6246E-114 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.939, -5.2484617369160205, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp18 [Gordonia phage Woes] ],,YP_009273409,100.0,1.6246E-114 SIF-HHPRED: SIF-Syn: Function unknown. The Pham number is 19209, upstream Pham number 65696, downstream Pham number 8730, just like in phages Guillaume and Harambe. Upstream and downstream Phams are also no known function. /note=Primary Annotator Name: Ruiz, Paola /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 18034. /note=Coding Potential: For both GeneMark Host and Self-Trained the coding potential in this ORF is mainly on the forward strand, indicating that this is a forward gene. It accurately shows an upward hash around 18034 and ends with a downward hash at 18519 for both. In Self Trained Genemark, the typical and atypical coding potential also spans the length of the start and end sight. In Genemark Host, the typical coding potential does not span the length of the start and end sight. Overall, the gene has good coding potential. /note=SD (Final) Score: The final score is -5.248. This is not the best final score on PECAAN, it is only the fifth best. The z score is 1.939 which is not the best and is below 2. These numbers make sense as the gene may be part of an operon where RBS scores are typically poor. /note=Gap/overlap: There is a 4 bp overlap indicating that it may be part of an operon. It is conserved in other phages such as Hello and Lahirium. There is no coding potential in the overlap that may be a new gene. /note=Phamerator: The pham number is 19209 as of 3/31/2022. It is conserved; found in Luker and Hello which are all in cluster CS3. /note=Starterator: Start site number is 7 which correlates to start site 18034 bp for GalacticEye. Start 7 is the most called start number. It was manually annotated 13/21 times for cluster CS3. /note=Location call: Based on the above evidence, this is a real gene. There is a 4 overlap which explains the poor final and Z scores. This gene is an operon with 18034 start site. /note=Function call: Function is unknown. The top 21 phagesdb BLAST hits have unknown function (e value 4e-89, 100% identity, 100% positives). For NCBI BLAST, the top hit was a hypothetical protein with e value of 2e-114 had 100% identity and positives. CDD and HHPRED were not helpful in determining function and most HHpred hits had very high e values >100. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Villarreal, Alexia /note=Secondary Annotator QC: Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 18534 - 18854 /gene="19" /product="gp19" /function="hypothetical protein" /locus tag="GalacticEye_19" /note=Original Glimmer call @bp 18534 has strength 13.29; Genemark calls start at 18534 /note=SSC: 18534-18854 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp19 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 8.09698E-68 GAP: 14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.749, -3.0194634687155197, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp19 [Gordonia phage Woes] ],,YP_009273410,100.0,8.09698E-68 SIF-HHPRED: SIF-Syn: NKF. This gene has no known function and Pham 8730 is conserved in phages, Anamika and Luker. Genome architecture is conserved in other phages such as Newt and Nimi13, the upstream gene is in Pham 19209 and the downstream gene is a major tail protein (Pham 8388). /note=Primary Annotator Name: Juarez, Sabrina /note=Auto-annotation: Both Glimmer and GeneMark both call a start site at 18534. /note=Coding Potential: Good coding potential is found in both Self- and Host-Trained GeneMark in the forward reading frame. The chosen start site includes all of the coding potential for the ORF in Host- and covers all atypical potential, but misses a few base pairs of coding potential at the start in GeneMark Self-Trained. This is the longest ORF at 321 bp. /note=SD (Final) Score: The final score is the second-best option, -3.019. It has the second-best z-score, 2.749. The ORF with the best final score and z-score does not cover all of the coding potentials for this gene. /note=Gap/overlap: This gene has a 14 bp gap with the upstream gene, which is the smallest possible gap for the possible start sites. It is conserved in other phages, such as Hello and Jormugandr. /note=Phamerator: The Pham number as of 04/01/2022 is 8730. It is conserved in other phages within subcluster CS3, such as Anamika and Guillaume. It is represented in clusters CS1 and CS2, as well. /note=Starterator: Start site 2 in Starterator was manually annotated in 37/38 non-draft genes in this Pham. Start 2 is 18534 in GalacticEye. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence, this is a real gene, and most likely starts at 18534. /note=Function call: No Known Function. The top two phagesDB BLAST hits have the function listed as unknown function in Gordonia phage Woes (E-value of 2e-55 and 100% identity) and Gordonia phage Teal (E-value of 2e-55 and 100% identity). The top six NCBI BLAST hits are listed as hypothetical proteins. The top result was Gordonia phage Woes (71% coverage, 100% identity, and E-value of 2e-31). The following hits are also Gordonia phages (70% coverage, 66%+ identity, and E-values <2e-14). CDD and HHpred had no relevant hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Villarreal, Alexia /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 18858 - 19877 /gene="20" /product="gp20" /function="major tail protein" /locus tag="GalacticEye_20" /note=Original Glimmer call @bp 18858 has strength 15.14; Genemark calls start at 18858 /note=SSC: 18858-19877 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.707, -3.2495416658190135, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Gordonia phage Anamika] ],,ATW61117,100.0,0.0 SIF-HHPRED: Phage_tube_2 ; Phage tail tube protein,,,PF18906.3,91.7404,99.9 SIF-Syn: The function of this gene is major tail protein, as it is in Guillaume and Anamika. The upstream gene has NKF and is in pham 8730 and downstream gene is the tail assembly chaperone (pham 57139), which is also true of the upstream and downstream genes in Guillame and Anamika. /note=Primary Annotator Name: Li, Shally /note=Auto-annotation: Glimmer and GeneMark both call the start site to be 18858. /note=Coding Potential: Both the host trained and self trained GeneMarks have high coding potential throughout the gene. /note=SD (Final) Score: -3.250. This is the best final score on PECAAN. /note=Gap/overlap: 3bp gap. This is the smallest gap on PECAAN and the one that makes the most sense. /note=Phamerator: Pham 8388 as of 3/31/22. There are 73 members of the pham, 21 of which are drafts. 41/52 non-draft members are in cluster CS like GalacticEye. /note=Starterator: Start site 1 is called by 52/52 non-draft genes in the pham and is the start site that has the most MA for GalacticEye. This corresponds to position 18858, which is agreed on by both Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with start site 18858. The proposed start has a Z-score of 2.707, which is the highest Z-score on PECAAN. The start codon is ATG, and it is the longest open reading frame. /note=Function call: Major tail protein. The top 5 non-draft hits (Anamika, Guillaume, Hail2Pitt, Harambe, Lahirium) on phagesdb BLAST have the function major tail protein with e-values of 0 and scores of 670. HHPRED has several strong hits with the function of phage tail protein. The top two hits have probability >99%, coverage >91%, and e-values <10^-11. NCBI BLAST also has several strong hits with the function of major tail protein. The top three hits have identity >99%, 100% coverage, and e-values of 0. There were no hits in CDD. /note=Transmembrane domains: Both TmHmm and TopCons do not predict any transmembrane domains, indicating that this is not a membrane protein. /note=Secondary Annotator Name: Araque, Colette /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: May want to add some more supporting information regarding the chosen start site`s Z score, whether it encodes for a common start codon, whether it results in the LORF and/or a reasonable gene length, etc. tRNA 19955 - 20027 /gene="21" /product="tRNA-Asn(gtt)" /locus tag="GALACTICEYE_21" /note=tRNA-Asn(gtt) CDS 20076 - 20669 /gene="22" /product="gp22" /function="tail assembly chaperone" /locus tag="GalacticEye_22" /note=Original Glimmer call @bp 20076 has strength 16.98; Genemark calls start at 20076 /note=SSC: 20076-20669 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 2.57412E-141 GAP: 198 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.546, -3.525474288708562, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Gordonia phage Woes] ],,YP_009273413,100.0,2.57412E-141 SIF-HHPRED: SIF-Syn: Tail assembly chaperone, upstream gene is a major tail protein, downstream gene is NKF (pham 40702), just like phage Hail2Pitt Tail assembly chaperone, upstream gene is a major tail protein, downstream gene is NKF (pham 40702), just like phage Luker Tail assembly chaperone, upstream gene is a major tail protein, downstream gene is NKF (pham 40702), just like phage Newt Completed 4/1/2022 (in case pham numbers alter) /note=Primary Annotator Name: Patel, Rishi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 20076. Both call the start codon as ATG. /note=Coding Potential: The coding potential for this open reading frame (ORF) is only on the forward strand which shows that it is a forward gene. In the Self-Trained GeneMark, the coding potential is apparent within the called start site (20076) and stop site (20669), which is evidence that it is a real gene. Additionally, the Host-Trained GeneMark also shows coding potential in the region between the called start site and the stop site. /note=SD (Final) Score: The final score is the second best option of all the start sites listed (-3.525) – the best one listed is -3.430 so they are very close. Additionally, the Z score was also the second best at 2.546 with the best one being only very slightly higher at 2.948. The evidence for the called start site is still strong considering that the scores are high and nearly the best, however, a closer look at the start site with the slightly better scores is warranted. /note=Gap/overlap: The gap between the upstream gene (gene #20) is 198 base pairs which seems a bit large for a tightly compact phage genome, however, this gap seems to be conserved with other phages in cluster CS3. For example, phage Anamika also shows a 198 base pair gap with its upstream gene. Additionally, the phage Harambe also shows this 198 base pair gap, so it appears heavily conserved. Synteny is shown between gene equivalents in other Cluster CS3 phages. /note=Phamerator: Gene is found in pham number 57139 as of 4/1/2022. The gene is conserved when looking into phages Harambe, Anamika, and Hail2Pitt, which are all in the same cluster (CS3) as GalacticEye, thus it appears conserved in Cluster CS3 (evidence for real gene). This pham also contains phages from 10 sublusters – singleton, DF, DF1, DF3, DF2, CX, CS4, CS1, CS3, CS2. Phamerator does not call a function. /note=Starterator: The called start for my gene (gene 21) is start site #10 (20076). This start site is the most annotated start site. It is called in 42 out of 104 non-draft genes in the pham. It is also called in 42 out of the 52 genes that actually have the start site available including draft genes. /note=Location call: Considering all of the evidence described above, I believe that the start site should remain the one that is called (20076). Firstly, the start site called, 20076 (start site 10), is manually annotated as the start most often, in fact, when the start site is available, it is called 42/52 times. Additionally, the start site is called 26 times for Cluster CS3 which is, by far, the most called. The Z-score and Final Score is also really good. It is the second best (just worse than the best) among all of the called start sites. Finally, the coding potential is within the range of this called start site and the stop site and the gap upstream appears to be conserved when the gene starts at this site. Thus, the start site should remain 20076. /note=Function call: It is clear from the evidence that this gene is a tail assembly chaperone. There are significant hits in both NCBI Genebank Blast as well as PhagesDB Blast that both call the function as a tail assembly chaperone. The e-values of these calls are all extremely low as well which is a good sign. For NCBI, it shows gordonia phages Woes, Hail2Pitt, Harambe all called this gene as a tail assembly chaperone with an e value of 2.5e^-141. PhagesDB shows similar data. Neither HHpred nor CDD call a significant function, however, not all genes have useful hits in HHpred or CDD. The evidence in NCBI GeneBank and PhagesDB is enough to conclude that this gene has a function as a tail assembly chaperone. /note=Transmembrane domains: No transmembrane domains (TMDs) were predicted by either TMHMM or TOPCONS /note=Secondary Annotator Name: Bharadwaj, Shreya /note=Secondary Annotator QC: I agree with the primary annotator`s location and function calls. I would, however, take a look at the synteny box again because there seems to be some repetition and I don`t know if it is intentional. CDS join(20076..20657,20657..20944) /gene="23" /product="gp23" /function="tail assembly chaperone" /locus tag="GalacticEye_23" /note= /note=SSC: 20076-20944 CP: no SCS: neither ST: NI BLAST-Start: GAP: -594 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.546, -3.525474288708562, no F: tail assembly chaperone SIF-BLAST: SIF-HHPRED: SIF-Syn: Upstream gene is also tail assembly chaperone. /note=Manually annotated by AF, using Sticker17 (CS3) as guidance. CDS 20910 - 21050 /gene="24" /product="gp24" /function="hypothetical protein" /locus tag="GalacticEye_24" /note=Genemark calls start at 20910 /note=SSC: 20910-21050 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein PBI_HAIL2PITT_23 [Gordonia phage Hail2Pitt] ],,NCBI, q1:s1 100.0% 5.40409E-25 GAP: -36 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.621, -6.694648908462305, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PBI_HAIL2PITT_23 [Gordonia phage Hail2Pitt] ],,AVP43208,100.0,5.40409E-25 SIF-HHPRED: SIF-Syn: This gene interrupts usual synteny, but several other CS3 phages have the same arrangement. Upstream gene is tail assembly chaperone, downstream gene is tape measure protein, just like phages Luker, Newt, and Hail2Pitt. /note=Primary Annotator Name: Patel, Sahaj /note=Auto-annotation: Genemark only. /note=Coding Potential: High coding potential is found in the GeneMark Self, but low coding potential found in the GeneMark Host. /note=SD (Final) Score: -6.695, which was the lowest Final Score. The Z score is 1.621, which is the lowest highest Z score. /note=Phamerator: 5198. It is conserved, found in Hail2Pitt (CS), Harambe (CS), and Newt (CS). /note=Starterator: Start site 2 in Starterator was manually annotated in 3/3 non-draft genes in this Pham. Start 2 is 20910 in Powerpuff. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 20910. /note=Function call: No significant hits for function in any of the databases checked. /note=Transmembrane domains: ThHmm and Topcons predict there to be no transmembrane proteins present (predicted to be 0). /note=Secondary Annotator Name: Likwong, Chloe /note=Secondary Annotator QC: I agree with the primary annotator. CDS 21054 - 31022 /gene="25" /product="gp25" /function="tape measure protein" /locus tag="GalacticEye_25" /note=Original Glimmer call @bp 21054 has strength 9.18; Genemark calls start at 21054 /note=SSC: 21054-31022 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Gordonia phage Jormungandr]],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.263, -4.101958845833712, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Gordonia phage Jormungandr]],,QBP30302,99.7592,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,4.39494,99.9 SIF-Syn: This gene demonstrates synteny with other non-draft phages (Harambe, Hello, Newt, etc.). The gene upstream is in pham 40702, NKF, and does not show consistent synteny. It is missing in some published phages (Guillaume, Woes, etc.) but present in others (Hail2Pitt, Luker, etc.). The gene downstream is a minor tail protein in pham 8732 and shows synteny with other non-draft phages (Luker, Hello, Guillaume, etc.). /note=Primary Annotator Name: Lin, Yuri /note=Auto-annotation: Glimmer and GeneMark agree on a start site of 21054 bp (start codon TTG). The autoannotated start site also results in the LORF. /note=Coding Potential: All coding potential is contained within the ORF, but there are some gaps in coding potential sporadically in the host-trained GeneMark and a gap in coding potential from roughly 20600-20900 bp in the self trained GeneMark. /note=SD (Final) Score: The start site at 21054 bp has the eighth highest Final Score of -4.102, but all of the candidate start sites with better Final Scores have gaps of >2000 bp. The Z-score is 2.263, which is not the highest but a reasonable score, and the candidate start sites with better Z-scores also have the same issue of large gaps (>1000 bp). Since both values are reasonable in this context, they support the autoannotated start site. /note=Gap/overlap: The start site at 21054 has a gap of 3 bp, which is the smallest of all the candidate start sites. This supports the autoannotated start site, considering it is a gene in a series of forward genes and is not adjacent to any orientation switches. /note=Phamerator: As of 4/5/22, the pham number is 14531. The gene is conserved in 61 other phages, of which 15 are drafts. All other phages that have this gene with the same pham number are in cluster CS like GalacticEye, such as Luker and Jormungandr. /note=Starterator: Start site 2 in Starterator was manually annotated in 37/46 non-draft genes in this pham and is called 100% of the time in published annotations when present. Start site 2 is at position 21054 bp on GalacticEye. In phage Luker, start site 2 is also called and is at position 21066 bp. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 21054. /note=Function call: Tape measure protein. Two of the five top PhagesDB BLAST hits have the function of tape measure protein (e-value = 0.0), and the top two hits on NCBI BLAST were also tape measure protein (87% coverage, 99%+ identity, and E-value=0.0). The top hit in HHpred also calls the gene a tape measure protein, with a 99.94% probability, 6e-18 e-value, 93.65 score/coverage. The top hit in CDD was with a phage-related protein in the COG5412 super family (e-value 6.59e-15), with the second hit being a Phage-related minor tail protein in the PhageMin_Tail super family (e-value = 1.55e-13). It was noted that members of this family are found in putative phage tail tape measure proteins. /note=Transmembrane domains: Both TmHmm and TOPCONS predicted multiple TMHs. /note=Secondary Annotator Name: Pramana, Martin /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. However, changes should be made for the SD (Final) Score section because the final score of -4.102 is not the least negative. This SD score should be the eighth-highest among the SD score, not the highest score. The least negative final score for all potential start sites should be -2.996 @ 30693 bp. The Z score of 2.263 is the sixteenth highest score of all possible start sites. In the Phamerator section, mentioning some phage names from cluster CS would help clarify. Furthermore, in the starterator section, please include another non-draft phage that also has start site 2 and indicate its start site position such as Jormungandr has start site 2 at position 21086 bp. Since both TmHmm and TOPCONs call for multiple transmembrane domains, checking them both as evidence is recommended. [Edit: All concerns have been addressed by primary annotator] CDS 31019 - 33550 /gene="26" /product="gp26" /function="minor tail protein" /locus tag="GalacticEye_26" /note=Original Glimmer call @bp 31019 has strength 9.38; Genemark calls start at 31019 /note=SSC: 31019-33550 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage Anamika]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.61, -5.491506094349139, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Anamika]],,ATW61120,99.7627,0.0 SIF-HHPRED: Receptor-type tyrosine-protein phosphatase delta; Trans-synaptic complex, Synapse organizer, HYDROLASE-IMMUNE SYSTEM complex; HET: BMA, MAN, NAG; 4.4A {Mus musculus},,,4YH7_A,57.8885,99.6 SIF-Syn: minor tail protein, upstream: DNA tape measure protein; downstream: minor tail protein. Similar phages include: Anamika, Lahirium /note=Primary Annotator Name: Perez, Joshua /note=Auto-annotation: Gene was called by both Glimmer and Genemark, both starting at at 31019 and calls GTG. /note=Coding Potential: Yes, this gene has reasonable coding potential predicted within the second RF forward. This chosen start site does cover all this coding potential in both the Host and Self GeneMark. /note=SD (Final) Score: -5.492; a reasonable final score. The Z score present for this start is 1.61, which is decently high in terms of Z scores. Other start sites are variable in final score, but only 3 are less negative than this start site. /note=Gap/overlap: This start has a gap of -4 bp and seems to be conserved in the Pham Maps. This negative gap gives rise to the fact that it is part of an operon. This gap seems to be conserved in other phages, with a few being Anamika, Neovie, and Guillaume. /note=Phamerator: Pham 8732 4/05/22 is present in 74 members of the cluster CS. I used Austin, Diabla, and 5 others from cluster CS. A majority of the members (51) called a minor tail protein as their function. /note=Starterator: This is a reasonable conserved start site. The start site number is 1, and the coordinate base pair number is 31019. This is included as one of the most annotated genes, it has 51 manual draft annotations, is called 98.6% of the time when present, and has 74 members in its pham. This is good evidence that start site 1 with bp number 31019 is the correct start site. /note=Location call: The overall evidence shows that this gene is most likely real at start 1 and bp number 31019 due to high coding potential in its second RF forward and evidence from Starterator. Start site of 31019 is the most likely start site, as it has the highest ORF and lowest gap bp. This is most likely a real gene as it has coding potential. /note=Function call: The top 5 NCBI hits sorted by e-value had low e-values (0), and had a great % identity (around 99.5%). For example, phage Anamika has a 99.64% identity and an e-value of 0. The gene is said to be a minor tail protein, which must be confirmed with other programs. The first hits were minor tail proteins. HHpred contained good hit for a hydrolase protein, which could be a tail protein function. There is no data available from CDD. Overall, the high percent identities and evidence toward this gene’s function classify it as minor tail protein. /note=Transmembrane domains: No predicted TMDs by TOPCON or TMHMM. /note=Secondary Annotater: Pay, Iona /note=Secondary Annotater QC: Need to clarify auto annotation section -- you mention two different start codons for the same site? Would be nice to provide comparisons of other start sites in the SD Score section. Otherwise, this reads well, nicely done! CDS 33550 - 35289 /gene="27" /product="gp27" /function="minor tail protein" /locus tag="GalacticEye_27" /note=Original Glimmer call @bp 33547 has strength 9.03; Genemark calls start at 33550 /note=SSC: 33550-35289 CP: yes SCS: both-gm ST: SS BLAST-Start: [minor tail protein [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.439, -3.75071250087134, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Anamika] ],,ATW61121,100.0,0.0 SIF-HHPRED: Tail protein, 43 kDa; tail protein, structural genomics, PSI, MCSG, Protein Structure Initiative, Midwest Center for Structural Genomics, UNKNOWN FUNCTION; 2.1A {Neisseria meningitidis MC58} SCOP: b.106.1.1,,,3D37_B,54.2314,99.8 SIF-Syn: Minor tail protein, upstream gene is a minor tail protein and downstream gene is a minor tail protein, just like in phage Guillaume. /note=AF: chose 33550 based on more starterator data and the fact that -1 is most likely start site for an operon (more than -4) /note=Primary Annotator Name: Shah, Aayushi /note=Auto-annotation: Called in GeneMark at start site 33550, called in Glimmer at start site 33547 /note=Coding Potential: There is reasonable coding potential between the putative ORF, though there is a very little bit not covered upstream in the Genemark call. The Glimmer start site covers all the coding potential. /note=SD (Final) Score: The best final score is of -3.751, for the 33550 start site, while the second best final score is -4.721, for the start site of 33547. Based in the overlaps seen, this gene is in an operon, and therefore RBS score is not as relevant for a start call. /note=Gap/overlap: The 33550 start site has an overlap of 1, while the start site of 33547 has an overlap of 4. This overlap indicates an operon with either start site. The start site of 33547 creates the longest operon as well. /note=Phamerator: The pham as of 04/01/22 is 20894. A lot of other phages in this cluster have this pham present, as seen in Adgers, Butterball, and Gibbles. The function called in phamerator is a minor tail protein. /note=Starterator: Start site 2 is conserved across many manually annotated genomes in the pham, and represents the start site at bp 33550. 34/52 non-draft genes call this start site. Start site 1 is the second most conserved start site, representing bp 33547, with 11/52 non-draft genomes calling this start site. Though start site 1 is less conserved, the weight of other evidence means it is more likely to be the correct start site for this gene. /note=Location call: Based on evidence the gene is real and has a start site of 33547, which covers all coding potential and has a reasonable gap, the best final final score, as well as moderate evidence from starterator and phamerator as conserved within a pham. /note=Function call: Minor tail protein. The top 5 non-draft PhagesDB BLAST hits call the function as a minor tail protein with e values of 0 for all of them, constituting very strong evidence. In the NCBI BLAST, the top 5 calls sorted by e value called it as a minor tail protein, with e values of 0 for all, and over 99% coverage, also strong evidence. While no CDD hits were found, HHpred has a hit with an e value of 8.8e-16, 99.8% probability, and around 55% coverage that call it as a tail protein. There are no TMDs, which makes sense with this given function. Thus, there is strong evidence for this function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 35289 - 37313 /gene="28" /product="gp28" /function="minor tail protein" /locus tag="GalacticEye_28" /note=Original Glimmer call @bp 35289 has strength 10.93; Genemark calls start at 35289 /note=SSC: 35289-37313 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Gordonia phage Nimi13] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.729, -3.2026596621721617, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Gordonia phage Nimi13] ],,QBP31803,99.8516,0.0 SIF-HHPRED: b.18.1.7 (A:) Xylan-binding domain {Clostridium thermocellum [TaxId: 1515]},,,d1h6ya_,22.2552,98.8 SIF-Syn: The function of this gene is Minor Tail Protein, which is conserved in phages Anamika and Harambe, both of which are in subcluster CS3. Upstream gene is minor tail, as it is in both Anamika and Harambe. Downstream gene pham conserved in both Anamika and Harambe as well. /note=Primary Annotator Name: Villarreal, Alexia /note=Auto-annotation: Glimmer and Genemark Start sites are both at 35289. The Start codon is ATG. /note=Coding Potential: Coding Potential of this ORF is only on the forward strand, which indicates it is a forward gene. There is strong Coding Potential indicated by GeneMark Self and lower, but present, coding potential from GeneMark Host. Lots of synteny conservation observed of gene within other phages indicate high coding potential within this gene. /note=SD (Final) Score: The final score for this gene is -3.203, which is the best value listed indicating that this start site is the best and most probable. /note=Gap/overlap: Small overlap of 1 is in good standing and is preferred as it falls under 50 bp range; this start site contains the longest ORF and is in good standing supporting this start site. /note=Phamerator:pham:102778. Date 04/5/2022. It is conserved in other phages within the same cluster CS such as Anamika, Luker, and Newt. /note=Starterator: Start site 1 in Starterator was manually annotated in 22/22 non-draft genes in this pham. Start 1 is at 35289 in GalacticEye. This evidence agrees with the site predicted by Glimmer and GeneMark. This Start site is the most annotated start site within this pham. /note=Location call: Confidently confirm this gene’s start site at 35289. Would choose to keep this state site for the data suggested by guidelines regarding coding potential, same starting site predictions as delivered by Glimmer and GeneMark, high synteny with other phage data, most reasonable (smallest) overlap of 1 recorded between all other start sites, the start codon being ATG is highly probable as it is one of the more common initiation codons, as well as a good promising Final score of -3.203 and a good z-score of 2.729. Based on the above evidence, this is a real gene and the most likely start site is 35289. /note=Function call: According to the data obtained by PhagesDB Blastp and NCBI Blastp, suggested function of the gene is a minor tail protein as there is promising evidence with high query coverage (100%), high percentage of identity conservation (99%) as well as extremely low E-values of 0. Also hit in HHpred for xylan-binding domain; tail proteins often bind to carbohydrates on cell surface. /note=Transmembrane domains: TMHMM or TOPCONS both do not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Enos, Alex /note=Secondary Annotator QC: I have QCed this location call and agree with the primary annotator. CDS 37316 - 37891 /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="GalacticEye_29" /note=Original Glimmer call @bp 37316 has strength 5.24; Genemark calls start at 37316 /note=SSC: 37316-37891 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_GUILLAUME_28 [Gordonia phage Guillaume] ],,NCBI, q1:s1 100.0% 1.43333E-129 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.39, -6.0153754689637315, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_GUILLAUME_28 [Gordonia phage Guillaume] ],,QAX94312,100.0,1.43333E-129 SIF-HHPRED: SIF-Syn: NKF which is conserved in phages Teal, and Luker of the CS3 subcluster. The upstream gene is a minor tail protein, same as the upstream gene in the genomes of both Teal and Luker. The downstream gene is no known function which is the same as the downstream gene in the genomes of both Teal and Luker. /note=Primary Annotator Name: Araque, Colette /note=Auto-annotation: Both Glimmer and GeneMark called the same start site of 37316. Start site 37316 has a corresponding start codon of ATG. /note=Coding Potential: The gene has coding potential predicted within the putative ORF according to both Genemark Self and Host. The chosen start site covers all the corresponding coding potential. /note=SD (Final) Score: For the 37316 start site, there is a SD score of -6.015 and a Z score of 1.39. -6.015 is not a strong SD score as it is very negative. It is also not the best score compared to the other start sites, one having an SD score of -4.151. 1.39 is not strong either since it’s below 2. One start site has a Z score as high as 2.21. /note=Gap/overlap: There is a 2bp overlap which is below 4bp and is thus highly ideal. There is no coding potential in this gap nor would there be any space for another gene to exist in this gap. This overlap/gap is the smallest compared to the other start sites. The start site of 37754 has the best SD and Z score, but has a gap of 440bp which is extremely long and could fit more than one gene. The length of the gene with the start site of 37316 is 576bp long which is acceptable as it is well over the 120bp minimum and is the LORF. /note=Phamerator: As of 4/3/2022 this gene is found in Pham 13565. There are 47 total members in this pham (31 non-drafts and 16 drafts). This pham is present in other members of the same cluster (CS) which my phage belongs to. Some phages used for comparison are Beaver, Butterball, and Newt. The Phams database did not have a function called for this gene of the GalacticEye phage, but one of the members of pham 13565 did have a function called: minor tail protein. This function is consistent and found within the approved function list. /note=Starterator: There is a reasonable start site choice that is conserved among the members of pham 13565. The number of this conserved start site is 1 which corresponds with the 37316 base pair coordinate of my phage (site predicted by Glimmer and GeneMark). This start site is found in 47 of 47 genes in pham 13565, and when it is present, it is called 93.6% of the time. There are 30 manual annotations of this start site. This evidence highly supports the start site predicted by both Glimmer and GeneMark. /note=Location call: Gene 27 of the GalacticEye phage appears to be a real gene due to it being conserved in phamerator and it having considerable coding potential. The 37316 start site seems the most likely as it has the smallest gap/overlap, results in the LORF, encompasses all the coding potential, is called 93.6% of the time when present in genes of the same pham, and has 30 MA’s. While the SD and Z scores are not ideal, I think the aforementioned reasons are more weighty in showing the potential of this start site. /note=Function call: Gene 27 most likely encodes for a protein of an unknown function. There are multiple phagesDB BLAST hits that suggest function unknown and those hits have a strong e value of 1e-10910. There are also multiple NCBI BLAST hits that call function unknown with strong identity percentages (99.5% and 100%) and e values (1.43e-129 and 1.49e-128). /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs meaning that this is not a transmembrane protein. /note=Secondary Annotator Name: Maraziti, Gabriela /note=Secondary Annotator QC: I have QC`ed this function call and agree with the first annotator both on the start site and the function call. Side note: please select and add the best NCBI BLAST hits as evidence. CDS 37901 - 38071 /gene="30" /product="gp30" /function="hypothetical protein" /locus tag="GalacticEye_30" /note=Original Glimmer call @bp 37901 has strength 13.12; Genemark calls start at 37901 /note=SSC: 37901-38071 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ANAMIKA_29 [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 1.64121E-32 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.127, -4.325675514574479, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ANAMIKA_29 [Gordonia phage Anamika] ],,ATW61124,100.0,1.64121E-32 SIF-HHPRED: SIF-Syn: NKF. The upstream gene is in pham 13565 and has NKF similar to Anamika. The downstream gene is in pham 102988 and has NKF similar to Anamika. /note=Primary Annotator Name: Bharadwaj, Shreya /note=Auto-annotation: GeneMark and Glimmer both call start sites of 37901 /note=Coding Potential: The coding potential of this ORF is on the forward strand, which means that this is a forward gene. I found coding potential for this gene in both GeneMark Self and Host. /note=SD (Final) Score: The final score was -4.326. This was the best final score according to PECAAN. /note=Gap/overlap: 9 which indicates a gap of 9 bp, this is slightly outside of the -7 to +7 range for gaps/overlaps but it is the best of all of the suggested start sites and goes with the start site with the best SD score and Z-score. This gap has also been conserved in non-draft genomes for Hail2Pitt and Harambe. /note=Phamerator: 70203 was the pham of the gene as of 4/5/22. It has also been found in phages Hello_29 and Newt_30. /note=Starterator: Start site 2 was manually annotated 31 times in non-draft genomes. The start site 2 corresponds to start coordinate 37901 in GalacticEye which agreed with the Glimmer and GeneMark auto-annotation. /note=Location call: Based on the evidence specified above, this is a real gene with a start site of 37901. /note=Function call: NKF. All of the PhagesDB Blast hits indicate with a high e-value (5e-27) that the function is unknown for this gene. HHPRED hits indicated that it could be a hydrolase but the e-value is very high (55) and the %coverage was very low (48.4%). NCBI BLAST hits indicate a “hypothetical protein” with a low e-value of >1.7e-15. CDD had no relevant hits. /note=Transmembrane domains: TMHMM and TOPCONS do not predict any TMDs, so this is not a membrane protein. /note=Secondary Annotator Name: Nelson, Shiloh /note=Secondary Annotator QC: thorough work on the annotation — great detail in your function call! I am in agreement with this annotation & all evidence considered CDS 38068 - 38970 /gene="31" /product="gp31" /function="hypothetical protein" /locus tag="GalacticEye_31" /note=Original Glimmer call @bp 38068 has strength 7.78; Genemark calls start at 38068 /note=SSC: 38068-38970 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_HELLO_30 [Gordonia phage Hello] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.343, -5.479471976039214, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_HELLO_30 [Gordonia phage Hello] ],,QAX95299,100.0,0.0 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF in pham 70203, downstream is NKF in pham 11109, just like in phages Newt and Luker /note=Primary Annotator Name: Likwong, Chloe /note=Auto-annotation: Glimmer and GeneMark predicted the same starting sites at Start@38068. /note=Coding Potential: The GeneMark Host map does not depict any coding potential. In the GeneMark Self Map, Start@38068 covers the coding potential present and there is no violation of the guiding principles. /note=SD (Final) Score: Start@38068 has a Final Score of -5.479 and a Z-score of 2.343. It is important to note, the Final Score of Start@38068 is not the most positive and the Z-score score is not the highest. /note=Gap/overlap: There is a 4bp overlap with the upstream gene–the gene might be part of an operon. /note=Phamerator: This gene is in pham 102242 as of the date 3/31/2022. There are 675 members, 66 of which are non-final drafts. GalacticEye is found in cluster CS with members like FelixAlejandro and Hello. There are multiple clusters present in this phamerator, including clusters like CR, B, and N among many others. Majority of the Final Draft genes list “minor tail protein” as the function. /note=Starterator: In pham 102242, GalacticEye does not have the “Most Annotated” start. It lists Start 173 @38068 with 22 MA’s. /note=Location call: The gene seems to be a real gene given that in the GeneMark Self map, Start@38068 covers the coding potential. It also has an overlap of 4 and has 22 MA’s done compared to the other potential Start sites that have a bigger gap from the upstream gene; hence, the location call is at Start @38068. It is important to note that Start@38068 does not have the most positive Final Score or the highest Z-score. /note=Function call: The function is unknown. In PhagesDB, several hits with e-values of 1e-177 listed no known function. Similarly, in NCBI BLASTP, majority of the hits list “hypothetical function;” the top three hits in NCBI BLASTP list a value >99% for %coverage, %identity, and %alignment. For HHpred, the e-value of 790 is a weak hit. /note=Transmembrane domains: No transmembrane domains detected by TMHMM or TOPCONS, indicating that the gene is not a membrane protein. /note=Secondary Annotator Name: Ruiz, Paola /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. As of 4/12/22, pham has changed to 102988 and Galactic Eye now lists Start 199 @38068 with 22 MA’s. CDS 38967 - 39215 /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="GalacticEye_32" /note=Original Glimmer call @bp 38985 has strength 2.82; Genemark calls start at 38967 /note=SSC: 38967-39215 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein SEA_HELLO_31 [Gordonia phage Hello] ],,NCBI, q1:s1 100.0% 5.82219E-53 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.393, -4.867713808513585, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_HELLO_31 [Gordonia phage Hello] ],,QAX95300,100.0,5.82219E-53 SIF-HHPRED: SIF-Syn: NKF (pham 11109), upstream gene is NKF (pham: 101660), downstream is NKF (pham: 102988), similarly found in phage Luker and Newt. /note=Primary Annotator Name: Pramana, Martin /note=Auto-annotation: GeneMark and Glimmer did not agree on the same start site. GeneMark (Self and Host) calls for start site 38967. While Glimmer calls for start site 38985. /note=Coding Potential: There is a reasonable coding potential in the ORF with a forward direction (located at the third frame). This coding potential does not include all possible start sites. This coding potential is found in both Glimmer and GeneMark. /note=SD (Final) Score: Start site 38985 has an SD score of -6.345, which is the most negative Final Score on PECAAN. This start site has a relatively low Z score of 1.666. Start site 38967 has an SD score of -4.868, which is the second lowest SD score on PECAAN. This start site has a better Z score (>2) than the other sites which is 2.393. /note=Gap/overlap: Start site 38985 has an upstream gap of 14 bp. While start site 38967 has an upstream gap of -4 bp which indicates that this gene contains an operon. Phages Guillaume and Jormungandr have a 20 bp gap, while phage Newt has a similar gap of 14 bp. /note=Phamerator: As of 4/3/2022 the gene is found in Pham 11109. It is conserved in cluster CS3 (Newt and Harambe) but also found in other clusters such as CR2 (Kabluna and MerCougar). /note=Starterator: Starterator annotated that start site 15 starts at 38985 bp. However it is conserved in only 1/70 non-draft members. Therefore the start site is adjusted to start site 12 (39747bp) which is was manually annotated in 33/70 non-draft members. The start site agrees with GeneMark but does not agree with Glimmer. /note=Location call: Based on the evidence above, this is a real gene and the start site is most likely 38967 bp. Starterator agrees with GeneMark. Starterator does not agree with Glimmer. /note=Function call: NKF. Both NCBI and PhageDB BLAST’s top 2 results state that this gene doesn`t have a known function. The protein matches with phages Hello, Teal, and Luker have high query coverage (100%), high % identities (>98.78%), and low E-values (<2.99e-52). CDD shows no matches/ hits for this gene. While HHpred has no good hits in the databank. The first match from HHpred, has a high E-value (76), a low probability (38.2%), and low coverage (25.609%). Therefore, there are no relevant hits from CDD or HHpred. /note=Transmembrane domains: Both TMHMM and TOPCONS did not predict any TMDs, so it is not a transmembrane protein. /note=Secondary Annotator Name: Juarez, Sabrina /note=Secondary Annotator QC: QC complete, I agree with the annotation and location call with the primary annotator. Good Job! Maybe include the number of manual annotations for start site 12 for the starerator section, and don`t forget to include the transmembrane domain section, even if no TMDs are present. For the synteny box, include the pham numbers for all genes. CDS 39217 - 39582 /gene="33" /product="gp33" /function="hypothetical protein" /locus tag="GalacticEye_33" /note=Original Glimmer call @bp 39217 has strength 3.84; Genemark calls start at 39217 /note=SSC: 39217-39582 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ANAMIKA_32 [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 2.65614E-78 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.496, -4.077881745592325, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ANAMIKA_32 [Gordonia phage Anamika] ],,ATW61127,100.0,2.65614E-78 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Pay, Iona /note=Auto-annotation: Glimmer and GeneMark call the same start site (39217). /note=Coding Potential: Host-trained GeneMark displays a small amount of coding potential in the second frame from the bottom. Self-trained Genemark displays several overlapping regions of coding potential in the first frame from the top and second frame from the bottom, which requires further investigation. This gene does consistently display synteny with other phages from CS3, including Guillame and Harambe. /note=SD (Final) Score: The Z-score for this site is the best available, as it is closest to 3 (2.496). The final score for this site is the best available (-4.078), as all other values are smaller. /note=Gap/overlap: Gene starts with ATG, and has a small gap of 1bp. This is a reasonable gap. /note=Phamerator: This gene is conserved consistently in Phamerator, such as in other CS3 phages Neoevie and Harambe. /note=Starterator: Most annotated start site is site 93 (39217bp), which is called in 381/568 non-draft genes, and 99% of the time when present; GalacticEye calls this start site. /note=Location call: Taken together, the evidence suggests that this gene is a real gene, starting at 39217. Z scores and final scores are favourable, and supported by Phamerator. It shows some synteny with other phages. /note=Function call: NCBI BLASTp returned only hypothetical Gordonia phage proteins with no assigned functions. These proteins did have high % identity and % coverage. This is encouraging, as it confirms that this is a real gene, but not helpful for assigning function. HHPRED only returned a single non-draft hit -- a cryptic loci regulator protein – with low probability, similarity and % identity, indicating this is not a good fit. CDD returned no hits. /note=Transmembrane domains: TmHmm and TopCons returned no significant hits. CDS 39653 - 40090 /gene="34" /product="gp34" /function="holin" /locus tag="GalacticEye_34" /note=Original Glimmer call @bp 39653 has strength 10.04; Genemark calls start at 39653 /note=SSC: 39653-40090 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Gordonia phage RoyalG]],,NCBI, q1:s19 93.1034% 3.10233E-76 GAP: 70 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.76, -3.077027835973012, yes F: holin SIF-BLAST: ,,[membrane protein [Gordonia phage RoyalG]],,QRI45617,74.1176,3.10233E-76 SIF-HHPRED: Phage_holin_7_1 ; Mycobacterial 2 TMS Phage Holin (M2 Hol) Family,,,PF16081.8,42.7586,96.4 SIF-Syn: Just two genes upstream of Lysin A genes. Based on synteny, HHpred, and TMDs, calling holin. /note=3 TMDs in TMHMM, 4 in SOSUI. Just two genes upstream of Lysin A genes. /note=Primary Annotator Name: Cho, Emily /note=Auto-annotation: Both Glimmer and GeneMark call 39,653 as the start site. /note=Coding Potential: The host-trained GeneMark shows reasonable coding potential predicted within the putative ORF, and the chosen start site 39,653 covers all this coding potential. /note=SD (Final) Score: -3.077, The highest (best) value among 8 start site candidates /note=Gap/overlap: Upstream gap of 70, not drastically out of the reasonable range (<50bp). The length of the gene (438 bp) is also reasonable (>120bp). /note=Phamerator: Found in Pham 40732 as o Aug 11, 2022. The pham is in other members of the cluster CS but mostly found in different cluster (K, CQ). The function called for CS cluster was putative membrane protein or holin, while other clusters called holin or putative holin. /note=Starterator: The start number called the most often in the published annotations is 48, it was called in 107 of the 314 non-draft genes in the pham. GalacticEye did not have the most called start site. Two other common options: (Start: 39 @39599 has 13 MA`s; called 27% of time when present), (Start: 57 @39653 has 36 MA`s). 57 (39,653) is a reasonable start site, called 77% of time when present (all in CS phages). /note=Location call: The gene highly seems to be real based on the starterator and phamerator with a start site of 39,653 and good coding potential, all covered by the called start site. Although it does not show the longest ORF and has a large gap upstream, the gene also shows synteny with many other annotated genomes like Anamika and Harambe, and the gap seems to be conserved also. It is also the common ATG start site with the best final score and Z score of 2.76, which is close to 2. Although the start site was not the most conserved one in the pham, it was still called by some genomes in the same cluster and had 28 manual annotations. /note=Function call: Good BLASTp hits for both NKF and holin. There were no NCBI Conserved Domain Database hits. A few HHpred hits corresponding to holin and had probability greater than 90%, but it had e-value much greater than 10e-3 (0.03, 0.093) and very low % coverage (30-40%), so there were no meaningful hits. However, there are transmembrane domains and high structural similarity to holin (HHPred). NCBI Blast showed some hits with low e value and high coverage that called membrane protein. Therefore, the protein seems to be a membrane protein. /note=Transmembrane domains: 3 TMHs reported from TmHmm. 4 TMDs found by SOSUI. It makes sense as HHpred predicts this peptide query has similar structure to holin. /note=Secondary Annotator Name: Patel, Rishi /note=Secondary Annotator QC: I agree that the start site is 39653. The coding potential is apparent within the called start site and stop site and the gaps appear conserved. However, I disagree that Starterator is not informative. Starterator shows that the start site that was called is heavily conserved among other CS cluster phages such as CS3. Even though it is not the most called start site, it does not have the most called start site. I also do not agree with the function call. I believe that this is a membrane protein of sorts. With three called membrane domains by TMHMM (enough to check as evidence so do not forget to do that), this is evidence for a membrane protein or perhaps a holin. Additionally, NCBI Blast calls membrane proteins with really low e-values as well, thus, I think there is enough evidence to call this a membrane protein or a holin since these are proteins of the membrane (more research needed). CDS 40166 - 40486 /gene="35" /product="gp35" /function="hypothetical protein" /locus tag="GalacticEye_35" /note=Original Glimmer call @bp 40166 has strength 13.25; Genemark calls start at 40166 /note=SSC: 40166-40486 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp79 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 5.37009E-73 GAP: 75 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.939, -4.72126161785304, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp79 [Gordonia phage Woes] ],,YP_009273424,100.0,5.37009E-73 SIF-HHPRED: SIF-Syn: Downstream gene is lysin A, N-acetylmuramoyl-L-alanine amidase domain (pham 101314). This is similar to other cluster CS phages including Luker, Hello, Anamika, and Newt. /note=Primary Annotator Name: Enos, Alex /note=Auto-annotation: Glimmer and GeneMark both call 40166 as the start site. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The chosen start site covers all coding potential. /note=SD (Final) Score: The Final score of -4.721 was the best for the possible start sites. All others were more negative then -5 /note=Gap/overlap: The gap is 75 which is a bit large, but is conserved in other phages such as Anamika and Hello. There is also a 63 gap start site but that has a much smaller Final Score and was, therefore, less desireable. /note=Phamerator: Pham 11085 as of 3/31/22. It is conserved and found in Anamika (CS) and Hello (CS). /note=Starterator: Start site 9 in Starterator was manually annotated in 40/48 non-draft genes in this pham. GalacticEye @40166 has 38 MA`s. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the consistent evidence from Glimmer, GeneMark, and Starterator, this is a real gene and the most likely start site is 40166. /note=Function call: Unknown Function. All hits from phagesDB BLAST called unknown function with small E-values <6^-60. All NCBI BLAST hits also call "hypothetical protein" (100% coverage, 99%+ identity, and E-value <10^-73). HHpred also called an unknown protein but had very low probability and coverage. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Patel, Sahaj /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 40483 - 41118 /gene="36" /product="gp36" /function="lysin A, N-acetylmuramoyl-L-alanine amidase domain" /locus tag="GalacticEye_36" /note=Original Glimmer call @bp 40483 has strength 5.12; Genemark calls start at 40483 /note=SSC: 40483-41118 CP: yes SCS: both ST: SS BLAST-Start: [lysin A, N-acetylmuramoyl-L-alanine amidase domain [Gordonia phage Teal] ],,NCBI, q1:s1 100.0% 3.07933E-155 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.024, -2.583959800616441, yes F: lysin A, N-acetylmuramoyl-L-alanine amidase domain SIF-BLAST: ,,[lysin A, N-acetylmuramoyl-L-alanine amidase domain [Gordonia phage Teal] ],,QDF16895,100.0,3.07933E-155 SIF-HHPRED: d.118.1.1 (A:1-157) N-acetylmuramoyl-L-alanine amidase PlyG {Anthrax bacillus (Bacillus anthracis) [TaxId: 1392]},,,d1yb0a1,70.6161,99.7 SIF-Syn: Lysin A, N-acetylmuramoyl-L-alanine amidase domain in pham 107652; upstream gene is NKF in pham 11085, downstream gene is lysin A in pham 3871, like in phage Luker. /note=Primary Annotator Name: Maraziti, Gabriela. /note=Auto-annotation: Glimmer and GeneMark both call the start at 40483. /note=Coding Potential: The gene has reasonable coding potential within the ORF and the start includes all typical and atypical coding potential. /note=SD (Final) Score: -2.584; this is the least negative score, and the Z-score is the highest at 3.024. /note=Gap/overlap: 4 bp overlap, indicating this gene is part of an operon. This start site also creates the longest possible ORF. /note=Phamerator: Pham 101314 as of 3/31/2022. The gene is conserved in many other members of the same cluster, CS, such as Adgers_35 and Anamika_35. This gene is also found in clusters DF and DM. The function call for this gene is a lysin A, N-acetylmuramoyl-L-alanine amidase domain and this function is largely conserved across Phamerator. /note=Starterator: 46/50 non-draft genes in the pham call start site 6, which corresponds to position 40483 in the GalacticEye genome. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this gene is real and has a start site at 40483. /note=Function call: Lysin A, N-acetylmuramoyl-L-alanine amidase domain. The top 3 non-draft phagesdb BLAST hits call the lysin A, N-acetylmuramoyl-L-alanine amidase domain function (e-values <10-126, score ~ 448), and the top 6 NCBI BLAST hits also call the same function ((e-values <10-154, score ~ 435) in genes from the same pham. HHPred has two hits for N-acetylmuramoyl-L-alanine amidase proteins (99% probability, e-value <10-14, >70% coverage). CDD had one hit which was an N-acetylmuramoyl-L-alanine amidase domain hit (e-value <10-8, 60% coverage). /note=Transmembrane domains: No TMDs predicted by TMHMM or TOPCONS, therefore it is not a membrane protein. /note=Secondary Annotator Name: Lin, Yuri /note=Secondary Annotator QC: I have QC`d and agree with the primary annotator. CDS 41115 - 41948 /gene="37" /product="gp37" /function="lysin A, protease M23 domain" /locus tag="GalacticEye_37" /note=Original Glimmer call @bp 41115 has strength 9.38; Genemark calls start at 41115 /note=SSC: 41115-41948 CP: yes SCS: both ST: SS BLAST-Start: [lysin A, protease M23 domain [Gordonia phage Harambe] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.646, -5.337115709971144, no F: lysin A, protease M23 domain SIF-BLAST: ,,[lysin A, protease M23 domain [Gordonia phage Harambe] ],,QAX94643,100.0,0.0 SIF-HHPRED: ToxR-activated gene (TagE); M23B family metallopeptidase, Heterodimer, HYDROLASE; 2.27A {Helicobacter pylori},,,5J1L_A,52.3466,98.4 SIF-Syn: Upstream gene another subunit of Lysin A in several other phages. /note=Primary Annotator Name: Nelson, Shiloh /note=Auto-annotation: The gene is 834 bp long, with coding potential in both glimmer and genemark. Both genemark and glimmer call the same start site, at 41115. As this start site is highly conserved, this shows that the gene has a greater chance of being ‘real.’ /note=Coding Potential: Glimmer and genemark show that there is coding potential, without violation of the basic guiding principles. All criteria are met — this is a real gene. The gene is 834 basepairs long, well over the 40 codon minimum length listed in the guiding principles, the gene shows synteny with non-draft phage genes harambe and neoevie. There are no switches in gene orientation. The genemark self annotation map shows that the gene is in the 3rd ORF in the forward direction.There is also coding potential in the 2nd ORF in the reverse direction, but this is extremely slight. /note=SD (Final) Score: The Final Score is -5.337 for the start site 41115. This is not the most negative score, however it is a very reasonable score — as it is more negative than -2. /note=Gap/overlap: The gap is -4 for the upstream gene. As this value is negative, this denotes an overlap of 4 bp. As -4 is within the range of -1 to -4, this denotes an operon. This gap is also present in the downstream gene. /note=Phamerator: This gene is in pham 3871 — as of 3/25/2022. All of the genes belong to the CS cluster. /note=Starterator: The start site of 41115 is highly conserved, in 60 of the 61 nondraft genes in this pham. This is the most annotated start site. /note=Location call: This is a real gene, as it has good coding potential, which is conserved in the phamerator. From the evidence collected, the start site is 41115. /note=Function call: lysin A,M23 domain — there are multiple phagesDB BLAST hits with the suggested function lysin A, protease M23 domain, with small e values of 1e-162. HHPRED has hits that correspond to unique SEA-PHAGES requirements for this gene. NCBI Blast has hits that largely call it as a lysin A, with e-values of 1e-145 or lower, showing 100% coverage and identity and alignment values of 80% or greater. /note=Transmembrane domains: zero — neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Perez, Joshua /note=Secondary Annotator QC: Amazing work! For Phamerator, did it turn any of similar functions? Could be worth noting. For gap/overlap, maybe include a couple phages that has a similar gap and denote conservation of this gap. For location call, include the evidence from Starterator as well. You have a lot of good evidence for this start site. This was great work, really really awesome job! /note=— Shiloh — secondary annotator`s comments considered on 4.13.22 — downstream gene in pham maps was also a lysin A, but the upstream gene was a different type of protein CDS 41954 - 42406 /gene="38" /product="gp38" /function="membrane protein" /locus tag="GalacticEye_38" /note=Genemark calls start at 41954 /note=SSC: 41954-42406 CP: yes SCS: genemark ST: SS BLAST-Start: [membrane protein [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 1.02588E-104 GAP: 5 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.613, -3.4470622626190464, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage Anamika] ],,ATW61132,100.0,1.02588E-104 SIF-HHPRED: SIF-Syn: Function is membrane protein. The Pham number is 99978, upstream Pham number 3871 with function lysin A , downstream Pham number is 103694 and a membrane protein; just like in phages Hello and Harambe. /note=Primary Annotator Name: Ruiz, Paola /note=Auto-annotation: GeneMark only and calls start site at 41954. /note=Coding Potential: Coding potential spans only GeneMark Self-Trained. The hashes are accurate as there is an upward hash around 41954 and downward at 42406. Overall, the gene does have good coding potential. /note=SD (Final) Score: The final score is -3.447 and is the second best. The z score is 2.613 which is the best on PECAAN. /note=Gap/overlap: There is a 5 bp gap which is good. It is conserved in other phages such as Hello and Lahirium. There is no coding potential in the overlap that may be a new gene. /note=Phamerator: The pham number is 99978 as of 4/5/2022. It is conserved; found in Lahirium and Neoevie which are all in cluster CS3. /note=Starterator: Start site number is 35 which correlates to start 41954 bp for GalacticEye. Start 35 is the most called start number. It was called in 34 of the 163 non-draft genes in the pham which includes GalacticEye. /note=Location call: Based on the above evidence, this is a real gene with start site 41954; starterator agrees with GeneMark. /note=Function call: Function is membrane protein The top 12 phagesdb BLAST hits have unknown function (e value 3e-84, 100% identity, 100% positives). For NCBI BLAST, the top 6 hits predict membrane protein and have e value ranging from -104 to -64. CDD and HHPRED were not helpful in determining function. CDD had no hits and HHPRED top hits had poor e values and many were domain of unknown function. /note=Transmembrane domains: TMHMM and Topcons predicts 4 TMDs, based on this evidence and NCBI BLAST, this is likely a membrane protein. /note=Secondary Annotator Name: Shah, Aayushi /note=Secondary Annotator QC: The starterator analysis claims "21/163 times for cluster CS3" which is confusing as before it says there are total 163 non-draft genes in the pham. Otherwise I agree with this annotation and functional call. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score. CDS 42394 - 42942 /gene="39" /product="gp39" /function="membrane protein" /locus tag="GalacticEye_39" /note=Original Glimmer call @bp 42499 has strength 4.98; Genemark calls start at 42613 /note=SSC: 42394-42942 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_ANAMIKA_38 [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 9.22433E-130 GAP: -13 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.771, -3.0541649530135073, no F: membrane protein SIF-BLAST: ,,[hypothetical protein SEA_ANAMIKA_38 [Gordonia phage Anamika] ],,ATW61133,100.0,9.22433E-130 SIF-HHPRED: SIF-Syn: Membrane protein. This gene is Pham 99984 and is conserved in phages, Newt and Neovie. Genome architecture is broadly conserved in other phages such as Luker, Teal, and Anamika, with slight changes in gene gaps and overlaps, the upstream gene is listed as a membrane protein in Pham 99978 and the downstream gene is listed as a membrane protein in Pham 22494. /note=LOCATION NOTES FOR TRICKY GENE: Based on the evidence below, start site @ 42394. The location call for this gene is difficult to make because there is evidence that could support several start sites, including 42394, 42403, and 42499. Start 42394 provides the longest ORF among reasonable calls at 549 bp,and the most favorable z-score and final score of the reasonable calls. It also has a more likely start codon, ATG, than the other two. The 13 bp overlap is a little longer than typical but shows synteny with several other phages within this cluster. This start site also has the most manual annotations for this Pham, although it is just 2 more calls than 42403. /note=Primary Annotator Name: Juarez, Sabrina /note=Auto-annotation: Glimmer calls 42499 as the start site while GeneMark calls start site 42613. /note=Coding Potential: Coding potential is found in both Self- and Host-Trained GeneMark in the forward reading frame. All coding potential is covered by the site called by Glimmer (42499). The site called by GeneMark does not cover all of the atypical coding potential in GeneMark Self but does cover all typical coding potential in Self- and Host-Trained GeneMark. Start sites at 42394 and 42403 cover all coding potential in both Self- and Host-Trained GeneMark. /note=SD (Final) Score: The Start site at 42499 begins with a TTG start codon and has a z-score of 1.884 and a final score of -4.977. The start site at 42403 begins with a TTG start codon and has a more favorable z-score, 2.571, and a final score, -4.301. The start site at 42394 begins with an ATG start codon and has the most favorable scores for a reasonable overlap, a z-score of 2.771, and a final score, of -3.054. /note=Gap/overlap: Start site 42613 has a 206 bp gap with the upstream gene which is not supported by synteny with other phages. Start site 42499 has a 92 bp gap with the upstream gene which is conserved in phages Luker and Newt. The start site at 42404 has a 4 bp overlap and this slight overlap is conserved in phages Teal and Woes. The start site at 42394 has a 13 bp overlap and this slight overlap is conserved in several phages in this cluster, including Hello and Anamika. /note=Phamerator: The Pham number as of 04/01/2022 is 99984. It is conserved in other phages within subcluster CS3 and is represented in 19 clusters. /note=Starterator: (as of 4/1/22) GalacticEye does not contain the most annotated start. Based on the Starterator report, three start sites are likely. Start 36 at 42394 has 22 manual annotations and was manually annotated 8 times in cluster CS3. Start 38 at 42403 has 15 manual annotations and was manually annotated 3 times in cluster CS3. Start 63 at 42499 has 3 manual annotations and was manually annotated 2 times in cluster CS3. /note=Function call: No Known Function. The top phagesDB BLAST hits have the function listed as function unknown in Gordnia phage Neovie (E-value of e-101 and 100% identity) and Gordonia phage Lahirium (E-value of e-101 and 100% identity). The top five NCBI BLAST hits are listed as hypothetical proteins. The top result was Gordonia phage Anamika (100% coverage, 100% identity, and E-value of 9e-130). The following hits are also Gordonia phages (80%+ coverage, 98.6%+ identity, and E-values <6e-103). CDD and HHpred had no relevant hits. /note=Transmembrane domains: TMHMM predicts four transmembrane domains. Based on this evidence, this gene can be assumed to have a real TDM and can be listed as a “membrane protein.” /note=Secondary Annotator Name: Shah, Aayushi /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score, and an understandable NKF call. CDS 42935 - 43267 /gene="40" /product="gp40" /function="membrane protein" /locus tag="GalacticEye_40" /note=Original Glimmer call @bp 42935 has strength 4.79; Genemark calls start at 42935 /note=SSC: 42935-43267 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Gordonia phage SteveFrench] ],,NCBI, q1:s1 100.0% 8.80022E-49 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.674, -3.240215147495286, no F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage SteveFrench] ],,YP_009824805,84.5455,8.80022E-49 SIF-HHPRED: SIF-Syn: This gene is a membrane protein and belongs to pham 22494. The homologous genes in Anamika and Guillaume belong to the same pham. The upstream gene is in pham 99984 and the downstream gene is in pham 1172, as it is in Anamika and Guillaume. /note=Primary Annotator Name: Li, Shally /note=Auto-annotation: Both Glimmer and GeneMark call the start site to be 42935. /note=Coding Potential: There is strong coding potential in the self-trained GeneMark on the second forward reading frame. The proposed start site covers the entire high coding potential region. In the host-trained GeneMark, there is a narrow peak of coding potential in the middle of the gene in the same reading frame in the forward direction. This indicates that the gene is a forward gene. /note=SD (Final) Score: -3.240. This is the best reading score on PECAAN. /note=Gap/overlap: 8bp overlap (-8bp). This is the most feasible overlap/gap in PECAAN. /note=Phamerator: The pham is 22494 as of 4/3/22. There are 51 members of the pham, 38 of which are non-drafts, and all are in cluster CS like GalacticEye. /note=Starterator: Start number 3 is called by 38/38 of the non-draft genes in the pham. This is the start site with the most MAs for GalacticEye and corresponds to position 42935, which was agreed upon by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with start site 42935. /note=Function call: Membrane protein. NCBI BLAST has one hit with the function of membrane protein with 67% identity, 100% coverage, and an e-value of 8.8e-49. TmHmm and TopCons both predict that there is a transmembrane domain. Phagedb BLAST and HHPRED did not have any feasible hits with a known function, and there were no hits in CDD. /note=Transmembrane domains: Both TmHmm and TopCons predict 1 transmembrane domain. /note=Secondary Annotator Name: Villarreal, Alexia /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (43264 - 43509) /gene="41" /product="gp41" /function="membrane protein" /locus tag="GalacticEye_41" /note=Original Glimmer call @bp 43413 has strength 4.78; Genemark calls start at 43509 /note=SSC: 43509-43264 CP: yes SCS: both-gm ST: SS BLAST-Start: [membrane protein [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 1.90869E-53 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.024, -2.970161406017234, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Gordonia phage Anamika] ],,ATW61135,100.0,1.90869E-53 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Patel, Rishi /note=Auto-annotation: Glimmer and GeneMark. They disagree on the start site. Glimmer calls the start at 43413 with a start codon of ATG, while GeneMark calls the start at 43509 also with a start codon of ATG. (In these situations, Glimmer is the more accurate program so that will be kept in account) /note=Coding Potential: The coding potential for this open reading frame (ORF) is only on the reverse strand which shows that it is a reverse gene. In the Host-Trained GeneMark, the coding potential is not apparent within the called Glimmer start site (43413) and stop site (43264) nor the called GeneMark start site (43509) and the stop site (43264). Additionally, the Self-Trained GeneMark also shows small peaks of typical coding potential in the region between the GeneMark start site (43509) and the stop site, and there appears to be atypical potential that fits between the stop site and the called GeneMark start site (43509). This is evidence that the gene is real (small amounts of typical potential means more evidence is needed but it is present), however, it may be evidence for the GeneMark start site. /note=SD (Final) Score: For the Glimmer called start site (43413), the final score and the Z score are not good. The Z score is below 2 at 1.634 and the final score is -5.441, both of which are not even close to the best scores for the called start sites. On the other hand, for the GeneMark called start site (43509), the final score and Z score are the best at -2.970 and 3.024 respectively. This is additional evidence for the GeneMark start site. /note=Gap/overlap: The gap between the stop site and the downstream gene (Gene #38) appears to be conserved across phages in the cluster (CS3) at a fairly lengthy overlap. However, the gap between the Glimmer start site and the upstream gene (Gene #40) does not appear to be conserved. The called start site shows a gap of 97 bp, however, when looking at phages of the same cluster, all of them show only a gap of 1bp with gene #40. For example, phage Anamika has a 1 bp gap with its respective upstream gene and these gaps are also shown in phages Hello and Guillaume. Thus, this may be an indication that the Glimmer called start site is not right. The start site called by GeneMark, however, does have a 1bp gap with the upstream gene, which is great evidence that 43509 is the correct start. Additionally, all of the phages that I looked at (Harambe, Hello, Guillaume, Hail2Pitt) showed synteny between their respective gene and gene #39 of GalacticEye, showing that this is likely a real gene since it appears conserved in the genomes of Cluster CS3 phages. /note=Phamerator: Gene is found in pham number 1172 as of 4/1/2022. The gene is conserved when looking into phages Harambe, Hello, and Guillaume, which are all in the same cluster (CS3) as GalacticEye, thus it appears conserved in Cluster CS3 (evidence for real gene). This pham also contains phages from both clusters CS3 and CS2. Phamerator does not call a function. /note=Starterator: The called start site (Glimmer) for my gene is Start site #10 (43413). Among the 47 members of the pham, of which 37 are non-draft, none of them call this start site. In fact, Start site #1, which is the called start site for GeneMark (43509), is called in 36/37 manual annotations of non-draft genes in the pham. Additionally, for the Cluster CS3 specifically, Start site #1 is called or manually annotated for 12/13 members of this cluster. The other call is Start site #6, so it is not even Start Site #1. Thus, there appears to be overwhelming evidence for the start site called by GeneMark (43509) over the start site called by Glimmer (43413). /note=Location call: Considering the evidence above, the Glimmer start site (43413) is not the right start site for the gene. It has one of the worst Z scores and Final scores. It is not manually annotated at all for the entire cluster (CS3) nor is it manually annotated for the entire pham. Additionally, the coding potential seems to extend past the start site. Finally, the gap with the upstream gene (gene #40) is not conserved among other phages within the cluster. On the other hand, the GeneMark start site (43509) has the best Z score and Final score. It is also manually annotated in 36/37 non-draft phages in the pham. The coding potential fits within the start and stop site, and the gap with the upstream gene (gene #40) is conserved (1bp gap) among other phages in the cluster. Thus, the start site will be altered to the GeneMark call – 43509. (LORF and reasonable gene length) /note=Function call: PhagesDB Blast does come up with significant hits with e-values of 10^-45, however, the functions are unknown for all of these hits. NCBI Blast, on the other hand, produces significant hits that actually call a function. NCBI Blast calls the gene as a membrane protein with significant e-values of 10^-53 to 10^-38. It shows that this gene relates significantly to an equivalent membrane protein gene in Gordonia phages Newt, Woes, Hello, etc. Neither HHpred nor CDD calls a significant function, however, not all genes have useful hits in these two programs. The fact that NCBI Blast calls the function of membrane protein with such significance is evidence for that function. Evidence is furthered via TMHMM and TOPCONS analyses of transmembrane domains. Thus, the function of this gene is likely a membrane protein. /note=Transmembrane domains: TMHMM does provide a single transmembrane prediction and TOPCONS also calls transmembrane helices. Thus, it is confirmed that this gene is likely a membrane protein, and this does agree with the functional evidence gathered above. /note=Secondary Annotator Name: Araque, Colette /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: May want to add whether the chosen start site results in the LORF and/or a reasonable gene length. CDS complement (43511 - 44713) /gene="42" /product="gp42" /function="RecB-like exonuclease/helicase" /locus tag="GalacticEye_42" /note=Original Glimmer call @bp 44713 has strength 14.0; Genemark calls start at 44713 /note=SSC: 44713-43511 CP: yes SCS: both ST: SS BLAST-Start: [Cas4 family exonuclease [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.103, -2.338663114094478, yes F: RecB-like exonuclease/helicase SIF-BLAST: ,,[Cas4 family exonuclease [Gordonia phage Anamika] ],,ATW61136,100.0,0.0 SIF-HHPRED: Exonuclease V; HYDROLASE; HET: EDO; 2.5A {Homo sapiens},,,7LW7_A,66.75,99.8 SIF-Syn: /note=Primary Annotator Name: Patel, Sahaj /note=Auto-annotation: Gene Mark and Glimmer called the start site to be at 44, 713. Additionally, it is a gene that runs in the reverse direction. /note=Coding Potential: Yes. Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. High coding potential is found in the GeneMark Self, but low coding potential found in the GeneMark Host. Therefore, there will be coding potential in this gene since the GeneMark Self supports the gene being active. /note=SD (Final) Score: -2.339, which was the highest Final Score. The Z score is 3.103, which is the highest Z score. /note=Gap/overlap: The gap value is 175, which signifies no overlap and the possibility of the gene not being part of an operon. This overlap is conserved in other phages (compared to Hello and Harmabe, amongst others), so it, therefore, makes sense. There is a gap present after the gene which is about a 100 base pairs long, and there is no atypical activity present in the gap. The gap is not present in other complete viral genomes (such as Hello and Harambe), which questions the existence of the gap, and warrants further investigation (the possible insertion of a gene). /note=Phamerator: 1172. Date 2/04/2022. It is conserved, found in Hail2Pitt (CS), Harambe (CS), and Woes (CS). /note=Starterator: Start site 10 in Starterator was manually annotated in 50/52 non-draft genes in this Pham. Start 10 is 44713 in Powerpuff. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 44713. /note=Function call: RecB-like exonuclease/helicase. HHpred evidence for both functions. Secondary option is Cas4 family exonuclease, but since helicase present in HHpred, calling RecB. /note=Transmembrane domains: ThHmm and Topcons predict there to be no transmembrane proteins present (predicted to be 0). /note=Secondary Annotator Name: Bharadwaj, Shreya /note=Secondary Annotator QC: I agree with the primary annotator`s location and function call. For the auto-annotation, I don`t think you need to add the stop site coordinate because that is a given. All you need is the start site called by glimmer and genemark. For the final score, you are looking for the one that is the least negative and for the z-score you want the one that is the closest to 2. I would edit that in your "final score" section. The gap/overlap section is a little confusing because first you are talking about a gap and then an overlap so I wasn`t sure? CDS complement (44713 - 44814) /gene="43" /product="gp43" /function="WhiB family transcription factor" /locus tag="GalacticEye_43" /note= /note=SSC: 44814-44713 CP: yes SCS: neither ST: NI BLAST-Start: [WhiB family transcription factor [Gordonia phage Diabla]],,NCBI, q1:s1 100.0% 2.07911E-13 GAP: 74 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.884, -4.835658240213919, yes F: WhiB family transcription factor SIF-BLAST: ,,[WhiB family transcription factor [Gordonia phage Diabla]],,QOP65370,96.9697,2.07911E-13 SIF-HHPRED: Transcriptional regulator WhiB1; Iron-sulfur cluster, transcription regulation, redox-sensing, TRANSCRIPTION; HET: MSE, SF4; 1.85A {Mycobacterium tuberculosis H37Rv},,,6ONU_C,93.9394,99.3 SIF-Syn: /note=Added gene. Strong CP on GM-self. None on GM-host. Fills gap. Gene also found in phages Hello, Anamika. CDS complement (44889 - 45311) /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="GalacticEye_44" /note=Original Glimmer call @bp 45311 has strength 2.85; Genemark calls start at 45311 /note=SSC: 45311-44889 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp70 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 2.40579E-93 GAP: 32 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.024, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp70 [Gordonia phage Woes] ],,YP_009273433,100.0,2.40579E-93 SIF-HHPRED: SIF-Syn: This gene demonstrates synteny and is also NKF in other non-draft phages (Harambe, Hello, Newt, etc.). The gene upstream is NKF, pham 17482, and shows synteny with other published phages (Hello, Newt, etc.). The gene downstream is a Cas4 Family Exonuclease, pham 2853, that is conserved in published phages (Hello, Newt, etc.). However, other published phages have an additional gene downstream (NKF, pham 21374) that GalacticEye does not currently have (Hello, Newt, Harambe, Guillaume, etc.). /note=Primary Annotator Name: Lin, Yuri /note=Auto-annotation: Glimmer and GeneMark agree on a start site of 45311 bp (start codon GTG). /note=Coding Potential: There is good coding potential within the ORF and all coding potential is contained within the autoannotated start site. /note=SD (Final) Score: The start site at 45311 bp has the least negative Final Score of -2.523, which supports the autoannotated start site. The Z-score is 3.024 for this start site, which is high and further supports the autoannotated start site. /note=Gap/overlap: The start site at 45311 has a gap of 32 bp, which is the smallest of all the candidate start sites. This supports the autoannotated start site, considering it is a gene in a series of forward genes and is not adjacent to any orientation switches. Gaps for other candidate start sites are >50 bp. /note=Phamerator: As of 4/5/22, the pham number is 97788. The gene is conserved in 61 other phages, of which 15 are drafts. All other phages that have this gene with the same pham number are in cluster CS like GalacticEye. /note=Starterator: Start site 3 in Starterator was manually annotated in 31/46 non-draft genes in this pham and is called 87.2% of the time when present. Start site 3 is at position 45311 bp on GalacticEye. This is strong support for the autoannotated start site. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 45311. /note=Function call: NKF - All of the top PhagesDB BLAST hits reported no known function, and NCBI BLAST only returned hypothetical proteins. HHpred only returned one extremely unlikely hit with an e-value of 150, and CDD returned no results. /note=Transmembrane domains: TmHmm and TOPCONS predicted 0 TMHs. /note=Secondary Annotator Name: Likwong, Chloe /note=Secondary Annotator QC: I agree with the primary annotator. Glimmer and GeneMark both call Start@45311--this start also has the most positive Z-score and Final-score, as well as a small gap. Coding potential is also covered by the Start site. Starterator lists Start@45311 with 31 MA`s, while other candidates did not have any. CDS complement (45344 - 45505) /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="GalacticEye_45" /note=Original Glimmer call @bp 45505 has strength 8.97; Genemark calls start at 45505 /note=SSC: 45505-45344 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp69 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 4.60862E-31 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.095, -4.744199388040119, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp69 [Gordonia phage Woes] ],,YP_009273434,100.0,4.60862E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Perez, Joshua /note=Auto-annotation: Gene was called by both Glimmer and Genemark, and starts at the same site with a start site at 45505 and calls ATG. /note=Coding Potential: Yes, this gene has reasonable coding potential predicted within the first RF reverse. This chosen start site does cover all this coding potential in both the Self GeneMark, but does not cover all of the coding potential in Host Genemark. /note=SD (Final) Score: -4.744; has the lowest negative final score present. The Z score present for this start is 2.905, which is very high in terms of Z scores compared to 2. /note=Gap/overlap: Glimmer start has a gap of 0bp and seems to be conserved in the Pham Maps. This short gap is most likely due to being part of an operon (and is in between many other genes). This gap seems to be conserved in other phages, with a few being Anamika, Neovie, and Guillaume. /note=Phamerator: Pham 17482 4/05/22 is present in other members of the cluster CS. I used Anamika, Diabla, and 5 others from cluster CS. /note=Starterator: The start site number is 9, and the coordinate base pair number is 45505. The most annotated start site (10) is not found in GalacticEye. Start site 9 (45505) is calls 21/49 times (100% when present). /note=Location call: The overall evidence shows that this gene is most likely real at start 9 and bp number 45505 due to high coding potential in its first RF reverse and evidence from Starterator. Glimmer start site of 45505 is the most likely start site, as it has the highest ORF and lowest gap bp. This is most likely a real gene as it has decent coding potential. /note=Function call: The top 3 NCBI hits sorted by e-value had low e-values (0), and had a good % identity (around 95%). For example, phage Nimi13 has a 98.11% identity and an e-value of 0. However, the gene has all hypothetical proteins, which must be confirmed with other programs. The first hits were NKF. There is no data available from CDD. HHpred also did not provide good evidence due to the high e-values. Overall, the lack of evidence toward this gene’s function classifies it as NKF. /note=Transmembrane domains: No predicted TMDs by TOPCON or TMHMM, which leads me to believe that absence of TMDs indicates that this gene is not a membrane protein and that the function remains unknown. /note=Secondary Annotator Name: Pramana, Martin /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. In the starterator section, mention another non-draft phage that shares a start site 9 and its coordinate such as the phage anamika with start site 9 has a coordinate of 45822 bp. In the function call section, please mention that there are no good hits from HHpred due to high e-values. CDS complement (45506 - 45646) /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="GalacticEye_46" /note=Original Glimmer call @bp 45640 has strength 3.17 /note=SSC: 45646-45506 CP: yes SCS: glimmer-cs ST: NI BLAST-Start: [hypothetical protein SEA_HARAMBE_45 [Gordonia phage Harambe] ],,NCBI, q1:s1 100.0% 6.1578E-24 GAP: 16 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.152, -5.374608596945346, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_HARAMBE_45 [Gordonia phage Harambe] ],,QAX94651,100.0,6.1578E-24 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shah, Aayushi /note=Auto-annotation: Called in Glimmer only at start site 45640. /note=Coding Potential: No coding potential is found in the GMHost but the GMSelf contains some coding potential given the start site of 45640. /note=SD (Final) Score: The Final score of all the start sites are below -5 for any gene longer than the 120 bp minimum. With the start site of 45640, the final score is -6.395, which is low, and the Z-score is 1.642, which is below the threshold of 2. The start site of 45646 has a Z-score of 2.152, the only one over 2, and a final score of -5.375, the least negative final score of the sites over 120 bp. /note=Gap/overlap: For the autoannotated startsite, there’s a gap of 22, and for the start site of 45646, there’s a gap of 16, which is better. The longest ORF has an overlap of 23, which is too large. /note=Phamerator: The pham as of 04/01/22 is 17867. A lot of other phages in this cluster have this pham present, as seen in Adgers_47, Hello_45, and Butterball_45. The function called in Phamerator is not given. /note=Starterator: Diverse pham in terms of manual choices. Start site 4 (45646) called in 10/43 genes; 52% of time when present. Start site 6 (45640) is called most often (13/43 non-draft genes; 36% of time when present). (Start: 2 @45685 has 10 MA`s) - this results in a -23bp gap which is large. Based on the strong Z-scores and final scores for start site 4, that seems the most likely start site. /note=Location call: Start site 45646. It has the best final score and Z-score, one of the smallest gaps, and has strong evidence in Starterator. /note=Function call: No known function. There are no hits in PhagesDB BLAST. In the NCBI BLAST, the top 5 calls sorted by e value called it as a hypothetical protein, with e values of 6.1578e-24 for all, and 100% coverage, pointing to no known function. No CDD hits were found. HHpred has no hits with an e value below 0.011, and had no hits with strong coverage. There are no TMDs. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Pay, Iona /note=Secondary Annotator QC: Nicely done! Love the reasoning you`ve provided in each section, it`s very clear for the reader and illuminates an otherwise confusing data set. CDS complement (45663 - 46382) /gene="47" /product="gp47" /function="DNA methyltransferase" /locus tag="GalacticEye_47" /note=Original Glimmer call @bp 46382 has strength 10.3; Genemark calls start at 46382 /note=SSC: 46382-45663 CP: yes SCS: both ST: SS BLAST-Start: [DNA methylase [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 1.19357E-177 GAP: 122 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.593, -4.398340293094632, no F: DNA methyltransferase SIF-BLAST: ,,[DNA methylase [Gordonia phage Anamika] ],,ATW61141,100.0,1.19357E-177 SIF-HHPRED: c.66.1.11 (A:) Methyltransferase mboII {Moraxella bovis [TaxId: 476]},,,d1g60a_,95.8159,100.0 SIF-Syn: /note=Primary Annotator Name: Villarreal, Alexia /note=Auto-annotation:Glimmer and Genemark Start sites are both at 46382. The Start codon is ATG. /note=Coding Potential:Coding Potential of this ORF is only on the forward strand, which indicates it is a forward gene. There is strong Coding Potential indicated by GeneMark Self and strong Coding Potential from GeneMark Host. Lots of synteny conservation observed of gene within other phages indicate high coding potential within this gene. /note=SD (Final) Score: The final score for this gene is -4.398, which isn’t the best value listed, however is still in good standing when considering the rest of the values, indicating that this start site is good and very probable. /note=Gap/overlap: Large gap of 122, however remains in good standing and is preferred as it is the smallest gap available from other possible start sites and a similar gap is observed amongst other phages with the conserved gene; this start site contains the longest ORF and remains to be in good standing supporting this start site. /note=Phamerator:pham:97122. Date 04/5/2022. It is conserved in other phages within the same cluster CS such as Anamika_8, Adgers_48, and Newt_47. /note=Starterator: Start site 136 in Starterator was manually annotated in 53/239 non-draft genes in this pham (100% of time when called). Also in agreement with the site predicted by Glimmer and GeneMark.. /note=Location call: Confidently confirm this gene’s start site at 46382. Would choose to keep this state site for the data suggested by guidelines regarding coding potential, same starting site predictions as delivered by Glimmer and GeneMark, high synteny with other phage data, most reasonable (smallest) gap of 122 recorded between all other start sites, the start codon being ATG is highly probable as it is one of the more common initiation codons, as well as a good Final score of -4.398 and a good z-score of 2.593. GalacticEye does not call the most annotated start site, however, based on the above evidence, this is a real gene and the most likely start site is 46382. /note=Function call: According to the data obtained by PhagesDB Blastp and NCBI Blastp, as well as HHpred, suggested function of the gene is a DNA methyltransferase as there is promising evidence with high query coverage (100%), high percentage of identity conservation (100%-99%) as well as extremely low E-values such as 1e-177, 2e-177, and 7e-177 . /note=Transmembrane domains:TMHMM or TOPCONS both do not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Cho, Emily /note=Secondary Annotator QC: I have QCed this location call and agree with the primary annotator. NOTE: you cannot use “suggested start” for starterator dropdown menu when the start site is not the most conserved start site. For the transmembrane domains section, be sure to follow the annotation manual and explain why having no TMDs make sense in terms of the protein’s function. Fill in synteny box. Good explanation of large gap upstream and analysis of starterator. CDS complement (46505 - 46774) /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="GalacticEye_48" /note=Original Glimmer call @bp 46774 has strength 19.05; Genemark calls start at 46774 /note=SSC: 46774-46505 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp66 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.01755E-47 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.948, -3.2535385006449706, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp66 [Gordonia phage Woes] ],,YP_009273437,100.0,1.01755E-47 SIF-HHPRED: SIF-Syn: This gene encodes for a protein of an unknown function , which is conserved in phages Teal, and Luker of the CS3 subcluster. The upstream gene is in pham 98658 and has no known function which is the same as the upstream gene in the genomes of both Teal and Luker. The downstream gene is in pham 97122 and is a DNA methylase which is the same as the upstream gene in the genomes of both Teal and Luker. /note=Primary Annotator Name: Araque, Colette /note=Auto-annotation: Both Glimmer and GeneMark called the same start site of 46774. Start site 46774 has a corresponding start codon of ATG. /note=Coding Potential: The gene has coding potential predicted within the putative ORF according to both Genemark Self and Host. The chosen start site covers all the corresponding coding potential. /note=SD (Final) Score: For the 46774 start site, there is a SD score of -3.254 and a Z score of 2.948. -3.254 is strong and is the best SD score as it is the least negative compared to the other scores. 2.948 is very strong since it’s above 2 and is the highest compared to the other potential start sites. /note=Gap/overlap: There is a 2bp overlap which is below 4bp and is thus highly ideal. There is no coding potential in this gap nor would there be any space for another gene to exist in this gap. This overlap/gap is the smallest compared to the other start sites. There are two other potential start sites, but both have worse SD and Z scores and have larger gaps. The length of the gene with the start site of 46774 is 270bp long which is acceptable as it is well over the 120bp minimum and is the LORF. /note=Phamerator: As of 4/3/2022 this gene is found in Pham 56719. There are 51 total members in this pham (41 non-drafts and 10 drafts). This pham is present in other members of the same cluster (CS) which my phage belongs to. Some phages used for comparison are Beaver, Butterball, and Newt. The Phams database did not have a function called for this gene of the GalacticEye phage nor did it have a function called for any other genes within this pham. /note=Starterator: There is a reasonable start site choice that is conserved among the members of pham 56719. The number of this conserved start site is 8 which corresponds with the 46774 base pair coordinate of my phage (site predicted by Glimmer and GeneMark). This start site is found in 47 of 51 genes in pham 56719, and when it is present, it is called 100% of the time. There are 37 manual annotations of this start site. This evidence highly supports the start site predicted by both Glimmer and GeneMark. /note=Location call: Gene 45 of the GalacticEye phage appears to be a real gene due to it being conserved in phamerator and it having considerable coding potential. The 46774 start site seems the most likely as it has strong SD and Z scores, has the smallest gap/overlap, results in the LORF, encompasses all the coding potential, is called100% of the time when present in genes of the same pham, and has 37 MA’s. /note=Function call: Gene 45 most likely encodes for a protein of an unknown function. There are multiple phagesDB BLAST hits that suggest function unknown and those hits have a strong e value of 4e-42. There is also an NCBI BLAST hit that calls function unknown with a strong identity percentage (100%) and e value (1.e-47). /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs meaning that this is not a transmembrane protein. /note=Secondary Annotator Name: Enos, Alexander (Alex) /note=Secondary Annotator QC: I have QCed this location call and agree with the primary annotator. CDS complement (46777 - 47004) /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="GalacticEye_49" /note=Original Glimmer call @bp 47004 has strength 17.24; Genemark calls start at 47004 /note=SSC: 47004-46777 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp65 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 2.2127E-45 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.506, -3.5923013867905444, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp65 [Gordonia phage Woes] ],,YP_009273438,100.0,2.2127E-45 SIF-HHPRED: SIF-Syn: NKF. The downstream gene is in pham 56719 and has NKF similar to Anamika. The upstream gene is in pham 98173 and encodes a MazG-like nucleotide pyrophosphohydrolase similar to Neoevie. /note=Primary Annotator Name: Bharadwaj, Shreya /note=Auto-annotation:GeneMark and Glimmer both call start sites of 47004 /note=Coding Potential: The coding potential of this ORF is on the reverse strand, which means that this is a reverse gene. I found coding potential for this gene in both GeneMark Self and Host. /note=SD (Final) Score: The final score was -3.592. This was the best final score according to PECAAN. /note=Gap/overlap:-4, which is well within the guidelines and indicates an overlap of 4 bp which is fairly good. An overlap of 4 bp is indicative of an operon which is highly favorable. /note=Phamerator: 98658 was the pham of the gene as of 4/5/22. It has also been found in phages Hail2PItt_47 and Harambe_48. /note=Starterator: Start site 4 was manually annotated 37 times in non-draft genomes. The start site 4 corresponds to start coordinate 47004 in GalacticEye which agreed with the Glimmer and GeneMark auto-annotation. /note=Location call: Based on the evidence specified above, this is a real gene with a start site of 47004. /note=Function call: NKF. All of the PhagesDB Blast hits indicate with a low e-value (2e-36) that the function is unknown for this gene. HHPRED hits indicated that it could be a VbhA antitoxin but the e-value is very high (11-29) and the %coverage was very low (24-34.7%). NCBI BLAST hits indicate a “hypothetical protein” with a low e-value of >9.87e-31. CDD had no relevant hits. /note=Transmembrane domains: TOPCONS and TMHMM do not call any transmembrane domains, so this is not a membrane protein. /note=Secondary Annotator Name: Maraziti, Gabriela /note=Secondary Annotator QC: I have QC`ed this function call and agree with the first annotator both on the start site and the function call. Notes: The 4bp overlap indicates that this gene is part of an operon, which is highly favorable. Also, an e-value of 2e-36 as referred to in the function call section is rather low, not high. Finally, for the transmembrane section, I believe you mean TOPCONS and TMHMM do not call any transmembrane domains. CDS complement (47001 - 47504) /gene="50" /product="gp50" /function="MazG-like nucleotide pyrophosphohydrolase" /locus tag="GalacticEye_50" /note=Original Glimmer call @bp 47504 has strength 10.96; Genemark calls start at 47504 /note=SSC: 47504-47001 CP: yes SCS: both ST: SS BLAST-Start: [MazG-like nucleotide pyrophosphohydrolase [Gordonia phage Harambe] ],,NCBI, q1:s1 100.0% 2.79297E-120 GAP: 10 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.024, -2.970161406017234, yes F: MazG-like nucleotide pyrophosphohydrolase SIF-BLAST: ,,[MazG-like nucleotide pyrophosphohydrolase [Gordonia phage Harambe] ],,QAX94655,100.0,2.79297E-120 SIF-HHPRED: MazG-like ; MazG-like family,,,PF12643.10,57.485,98.6 SIF-Syn: MazG-like nucleotide pyrophosphohydrolase, downstream gene is NKF in pham 98658, upstream is RusA-like resolvase in pham 55747, just like in phages Hello and Anamika /note=Primary Annotator Name: Likwong, Chloe /note=Auto-annotation: Glimmer and GeneMark predicted the same starting sites at Start@47504. /note=Coding Potential: Via GeneMark, Start@47504 covers the coding potential found in both the host and self maps, and there is no violation of the guiding principles. /note=SD (Final) Score: Final Score of -2.970 and a Z-score of 3.024 for the start site @47504. The values for the Final Score and Z-Score for Start@13542 are the most positive compared to the other start site candidates. /note=Gap/overlap: There is a 10bp gap with the upstream gene. The length of 10bp is also small–a functional gene must at least be 120bps long. /note=Phamerator: This gene is in pham 98173 as of the date 3/31/2022. There are 138 members, 30 of which are non-final drafts. GalacticEye is found in cluster CS with members like GMA7 and Ekhein. There are multiple clusters present in this phamerator, including clusters like EG and DF among many others. Majority of the Final Draft genes list “MazG-like nucleotide pyrophosphohydrolase” as the function. /note=Starterator: In pham 98173, GalacticEye has the “Most Annotated” start, which is Start site 46 @47504 that has 37 MA’s. /note=Location call: The gene seems to be a real gene given that Start@47504 covers the coding potential, has a gap of 10bp, and has 37 MA’s done compared to the other potential Start sites; hence, the location call is at Start @47504, and also has the most positive Z-score and Final Score compared to the other start site candidates. /note=Function call: The function listed is MazG-like nucleotide pyrophosphohydrolase. In PhagesDB, several hits with e-values of 2e-96 listed MazG-like nucleotide pyrophosphohydrolase as the function. Similarly, in NCBI BLASTP, majority of the hits are strong, depicting e-values as low as 2.79e-120, and list MazG-like nucleotide pyrophosphohydrolase as the function; the top two hits in NCBI BLASTP list a value >99% for %coverage, %identity, and %alignment. /note=Transmembrane domains: No transmembrane domains detected by TMHMM or TOPCONS, indicating that the gene is not a membrane protein. /note=Secondary Annotator Name: Nelson, Shiloh /note=Secondary Annotator QC: terrific work! very thorough! I agree with the primary annotator and have vetted evidence checks CDS complement (47515 - 48084) /gene="51" /product="gp51" /function="RusA-like resolvase (endonuclease)" /locus tag="GalacticEye_51" /note=Original Glimmer call @bp 48084 has strength 6.41; Genemark calls start at 48084 /note=SSC: 48084-47515 CP: yes SCS: both ST: SS BLAST-Start: [RusA-like resolvase [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 3.83748E-137 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.718, -3.0856802691973226, yes F: RusA-like resolvase (endonuclease) SIF-BLAST: ,,[RusA-like resolvase [Gordonia phage Woes] ],,YP_009273440,100.0,3.83748E-137 SIF-HHPRED: RusA ; Endodeoxyribonuclease RusA,,,PF05866.14,67.1958,99.8 SIF-Syn: RusA-like resolvase, upstream gene is MazG-like nucleotide pyrophosphohydrolase, downstream is nucleoside deoxyribosyltransferase, similarly found in phage Jormungandr. /note=Primary Annotator Name: Pramana, Martin /note=Auto-annotation: Both GeneMark (Self and Host) and Glimmer calls for the same start site of 48084 /note=Coding Potential: There is a reasonable coding potential in the ORF with a reverse direction (located at the sixth frame). This coding potential includes all possible start sites and is found in both GeneMark Self, but not in GeneMark Host. /note=SD (Final) Score: -3.086. It is the best Final Score on PECAAN. Start site 48084 also has the best Z score of 2.718. /note=Gap/overlap: There is an upstream overlap of -8 bp. This indicates that there is an overlap of 8 bp with the upstream gene. This overlap is reasonable because it is conserved in other phages such as Harambe and Anamika. This start site is the longest ORF. /note=Phamerator: As of 3/31/2022 the gene is found in Pham 55747. It is conserved in cluster CS3 (Anamika and Harambe) but also found in other clusters such as CS2 (Diabla and RoyalG). /note=Starterator: Start site 29 is conserved in 18/52 non draft members. Start site 29 has a position of 48116 bp in Jormungandr. The start site agrees with both Glimmer and GeneMark. GalacticEye does not have the most annotated start site of 30 for pham 55747. /note=Location call: Based on the evidence above, this is a real gene and the start site is most likely 48084 bp. Starterator agrees with both Glimmer and GeneMark. /note=Function call: RusA-like resolvase. The top 2 matches from PhagesDB BLASTp function is RusA-like resolvase. Both hits have high % identity (100%), low E values (1e-108), and high query coverage. Similarly, the 2 hits from NCBI BLASTp call for the same function of RusA-like resolvase, with low E values (<1.5278e-101), reasonable % identity (>76.72%), and high query coverage. CDD resulted in 1 match, which had low % coverage(24.8677%), low % probability(0.00699), and high E-values (0.00699). Therefore there are no significant hits from CDD. The top 3 matches from HHpred all have high probability (>99.7), high % coverage (>67.1958%), and low E-values (<2e-16). /note=Transmembrane domains: Both TMHMM and TOPCONS did not predict any TMDs, so it is not a transmembrane protein. /note=Secondary Annotator Name: Ruiz, Paola /note=Secondary Annotator QC: I have QC’ed this location call and do not agree with first annotator. Coding potential spans entire length in GeneMark Self-Trained but not for Host. I would mention that this start site has the best z score at 2.718. There is an 8 bp overlap not gap! Start number called most often is 30 which does not include GalacticEye. GalacticEye has start number 29 which calls for start number 48084. You list location call twice, delete one. You also mention that the start site is 47515, this is that stop site. Given all the above evidence, that start site is most likely 48084. Starterator agrees with both Glimmer and GeneMark. CDS complement (48077 - 48442) /gene="52" /product="gp52" /function="nucleoside deoxyribosyltransferase" /locus tag="GalacticEye_52" /note=Original Glimmer call @bp 48442 has strength 8.92; Genemark calls start at 48442 /note=SSC: 48442-48077 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp62 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 4.63798E-83 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.725, -3.9787615777791516, yes F: nucleoside deoxyribosyltransferase SIF-BLAST: ,,[hypothetical protein BH793_gp62 [Gordonia phage Woes] ],,YP_009273441,100.0,4.63798E-83 SIF-HHPRED: c.23.14.1 (A:9-160) Nucleoside 2-deoxyribosyltransferase {Trypanosome (Trypanosoma brucei) [TaxId: 5691]},,,d2f62a2,84.2975,99.7 SIF-Syn: The function of this gene is nucleoside deoxyribosyltransferase, which is conserved in phages Lahirium and Luker, both of which are in subcluster CS3. Downstream gene is a Rush-like resolves (endonuclease) in Pham 55747, as it is in both Lahirium and Luker. Upstream gene is a DNA Pol III sliding clamp (beta) in Pham 100851, which is also the same in Lahirium and Luker. /note=Primary Annotator Name: Pay, Iona /note=Auto-annotation: Glimmer and GeneMark both call the start site at 48442. This site begins with ATG, a common start codon. /note=Coding Potential: Host-trained Genemark displays coding potential in the fourth frame, as does the self-trained GeneMark. It consistently displays synteny with other phage genomes. /note=SD (Final) Score: Start site 48422 has the lowest final score (-3.979), and the highest Z score (2.725), indicating that this is likely the correct start site. This start site also presents the longest ORF at 366 bp. /note=Gap/overlap: This gene does not display a shift in direction, and has a 1 bp overlap, indicating that there may be an operon at this location. /note=Phamerator: This gene is a member of Pham 97723. Phamerator shows small overlaps with other adjacent genes, potentially indicating an operon. /note=Starterator: Start 49 (48442) has 23 manual annotations, and was manually annotated for this phage’s cluster (CS3) 13 times. This start is also called almost 70% of the time when present, lending credence to this start site. /note=Location call: All evidence suggests that this is a real gene starting at 48422 – favourable Z and final scores, conservation in Phamerator and high levels of menial annotation at this start site. /note=Function call: Top NCBI BLAST hits all describe hypothetical Gordonia phage proteins, only some of which have a tentative function listed as nucleoside deoxyribosyltransferase. There is a high % of identity and coverage (100% each for top hits, which are hypothetical), but this alone is not enough to ascribe function. CDD does not elucidate function either, returning only 2 results (both with low % identity and alignment). However, HHPred returns many draft and confirmed results for nucleoside deoxyribosyltransferases, most meeting or exceeding the thresholds of high probability, low E values and high scores. With this in mind, we could tentatively label this protein a nucleoside deoxyribosyltransferase. /note=Transmembrane domains: TmHmm does not predict any transmembrane domains, nor does Topcons. /note=Secondary Annotator Name: Juarez, Sabrina /note=Secondary Annotator QC: QC complete, I agree with the location and function call with the primary annotator. Specify the values for the hits on NCBI. Don`t forget to check the boxes and complete drop-down menus for evidence, as well as, the synteny box. CDS complement (48442 - 49575) /gene="53" /product="gp53" /function="DNA polymerase III sliding clamp (Beta)" /locus tag="GalacticEye_53" /note=Original Glimmer call @bp 49575 has strength 14.43; Genemark calls start at 49575 /note=SSC: 49575-48442 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase III sliding clamp [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 0.0 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.787, -3.019720185458058, no F: DNA polymerase III sliding clamp (Beta) SIF-BLAST: ,,[DNA polymerase III sliding clamp [Gordonia phage Woes] ],,YP_009273442,100.0,0.0 SIF-HHPRED: DNA polymerase III, beta subunit; DNA clamp, Structural Genomics, Joint Center for Structural Genomics, JCSG, Protein Structure Initiative, PSI-BIOLOGY, TRANSFERASE; HET: ACT, MSE, EDO, PGR; 2.26A {Eubacterium rectale},,,3T0P_A,98.6737,100.0 SIF-Syn: DNA polymerase III sliding clamp (Beta). upstream gene is nucleoside deoxyribosyltransferase (pham 97723, 3/31/22), downstream is NKF (pham 12505, 3/31/22), just like in phage Luker. Many other final genomes have NKF of pham 97723 (3/31/22) upstream. /note=Primary Annotator Name: Cho, Emily /note=Auto-annotation: Both Glimmer and GeneMark called 49,575 as the start site /note=Coding Potential: The host-trained GeneMark’s reverse frame shows reasonable coding potential predicted within the putative ORF, and the chosen start site 49,575 covers all this coding potential. /note=SD (Final) Score: -3.020, The third highest (best) value among 32 start site candidates. Candidates with higher final scores have unreasonable gaps (787, 922 bp). /note=Gap/overlap: Upstream gap of 1, not out of the reasonable range (<50bp) and yields the longest ORF. The length of the gene (1134 bp) is also reasonable (>120bp). /note=Phamerator: Found in Pham 100851 as of 31th March, 2022. The pham is in other members of the cluster CS only and a few singletons. The function called for all of the CS cluster members was DNA polymerase III sliding clamp, while one singleton called DnaN. /note=Starterator: The start number called the most often in the published annotations is 3, it was called in 46 of the 46 non-draft genes in the pham. Starterator called the suggested start. Start site number 3 (bp 49,575) is a reasonable start site that is conserved in other pham members and most manually annotated start number. /note=Location call: The gene highly seems to be real based on the starterator (which showed the most conserved and manually annotated start site) and phamerator with a start site of 49,575 and good coding potential, all covered by the called start site. The gene also shows synteny with many other annotated genomes like Anamika and Harambe. It is also the common GTG start site with the third best final score and Z score of 2.787, which is close to 2. The start site also yields the longest open reading frame. /note=Function call: Top 7 hits from BLASTp from PhagesDB sorted by e-value (0.0) suggested no known function with score of 751 bits, and it showed perfect query coverage and % identity. However, the next several hits with high identities (99%) and score of 750 bits called function of DNA polymerase III sliding clamp with e-value of 0.0. Top 3 hits from BLASTp from NCBI also sorted by e-value (0.0) suggested function to be DNA polymerase III sliding clamp, and it showed 100% query coverage and high % identity (>99.20%). HIts below (lower % identity) still called DNA polymerase III sliding clamp. Therefore, the NCBI and BLASTp showed that the expected function would be DNA polymerase III sliding clamp based on amino acid sequence. This function call also had the highest frequency in the subcluster CS2. There were no NCBI Conserved Domain Database hits. On HHpred, there were several hits with e value much smaller than 10e-3, 100% probability and coverage greater than 90% that called DNA polymerase III clamp and beta subunit clamp. Therefore, the gene seems to have the function of DNA polymerase III sliding clamp. /note=Transmembrane domains: No TMHs reported from TmHmm. No TMDs reported from Topcon. This protein is likely a DNA polymerase III sliding clamp, which does not function near a membrane. It would be part of the DNA replication machinery, so it would not need a transmembrane domain. /note=Secondary Annotator Name: Li, Shally /note=Secondary Annotator QC: I agree with the above location and function calls. Remember to check evidence boxes for phagesdb. CDS complement (49577 - 49774) /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="GalacticEye_54" /note=Original Glimmer call @bp 49774 has strength 8.11; Genemark calls start at 49774 /note=SSC: 49774-49577 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_NEOEVIE_53 [Gordonia phage Neoevie]],,NCBI, q1:s1 100.0% 1.76847E-33 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.441, -3.8079208184165134, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_NEOEVIE_53 [Gordonia phage Neoevie]],,QAX95414,93.8462,1.76847E-33 SIF-HHPRED: SIF-Syn: The downstream gene is DNA polymerase III sliding clamp (Beta)(pham 100851), upstrean gene is DnaQ-like (DNA polymerase III subunit) (pham 95256). This is similar to other cluster CS phages including Luker, Hello, Anamika, and Newt. /note=Primary Annotator Name: Enos, Alex /note=Auto-annotation: Glimmer and GeneMark both call 49774 as the start site. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. The chosen start site covers all coding potential. /note=SD (Final) Score: The Final score of -3.808 was the second best for the possible start sites with the first being -3.747 .However, the better final score had an undesirable gap overlap /note=Gap/overlap: The gap is 2 which is a very desirable gap, and is conserved in other phages such as Anamika and Hello. /note=Phamerator: Pham 12505 as of 4/5/22. It is conserved and found in Anamika (CS) and Hello (CS). /note=Starterator: Start site 12 in Starterator was manually annotated in 12/12 non-draft genes in this pham. Start Start 12 is 49774 in GalacticEye. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the consistent evidence from Glimmer, GeneMark, and Starterator, this is a real gene and the most likely start site is 49774. /note=Function call: Unknown Function. All hits from phagesDB BLAST called unknown function with small E-values <5^-26. All NCBI BLAST hits also call "hypothetical protein" (100% coverage, 90%+ identity, and E-value <10^-33). HHpred also called an unknown protein but had very low probability and coverage. CDD had no relevant hits. /note=Transmembrane domains: either TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Patel, Rishi /note=Secondary Annotator QC: I agree with these annotations. The start site of 49774 is heavily conserved among other genes within the pham. Additionally, the coding potential is within the putative ORF and the gaps are conserved. The function call is also correct as there is no evidence for any sort of function on any program, thus, there is no known function at this time. CDS complement (49777 - 50403) /gene="55" /product="gp55" /function="DnaQ-like (DNA polymerase III subunit)" /locus tag="GalacticEye_55" /note=Original Glimmer call @bp 50403 has strength 12.53; Genemark calls start at 50403 /note=SSC: 50403-49777 CP: yes SCS: both ST: SS BLAST-Start: [DnaQ-like DNA polymerase III subunit [Gordonia phage Guillaume] ],,NCBI, q1:s1 100.0% 4.63561E-153 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.024, -2.523003374675015, yes F: DnaQ-like (DNA polymerase III subunit) SIF-BLAST: ,,[DnaQ-like DNA polymerase III subunit [Gordonia phage Guillaume] ],,QAX94337,100.0,4.63561E-153 SIF-HHPRED: DNA polymerase III subunit epsilon; DNA editing Proofreading Exonuclease Polymerase, DNA Binding protein; 6.7A {Escherichia coli K12},,,5M1S_D,95.1923,99.8 SIF-Syn: DnaQ-like (DNA polymerase III subunit) in pham 105386; upstream gene is a helix-turn-helix DNA binding domain in pham 105936; downstream gene is NKF in pham 12505, like in phage Harambe. /note=Primary Annotator Name: Maraziti, Gabriela /note=Auto-annotation: Glimmer and GeneMark both call the start at 50403. /note=Coding Potential: The gene has reasonable coding potential within the ORF for both the self and host models, and the start includes all typical and atypical coding potential. /note=SD (Final) Score: -2.523; this is not the most negative score, but the Z-score is the highest by far at 3.024. /note=Gap/overlap: 4 bp overlap, indicating this gene is part of an operon. /note=Phamerator: Pham 95256 as of 4/5/2022. The gene is conserved in many other members of the same cluster, CS, such as Adgers_59 and Anamika_54. This gene is also found in cluster DF and a couple singletons. The most common function call for this gene is DnaQ-like DNA polymerase III subunit, and this function is largely conserved across Phamerator for cluster CS. /note=Starterator: 45/52 non-draft genes in the pham call start site 32, which corresponds to position 50403 in the GalacticEye genome. This start is called 100% of the time when it is present. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this gene is real and has a start site at 50403. /note=Function call: DnaQ-like (DNA polymerase III subunit). The first 13 non-draft phagesdb BLAST hits call this function (e-values <10-119, score >416). The top 3 NCBI BLAST hits call the same function (e-values <10-150, ~100% identity, ~100% aligned) in genes from the same pham. HHPred has several strong hits for the same function (~100% probability, >95% coverage, e-value < 10-16). CDD has one strong hit for the same function (e-value 10-5, 85% coverage); most other hits corresponded to various exonuclease domains, which makes sense for the given function. /note=Transmembrane domains: No TMDs predicted by TMHMM or TOPCONS, therefore it is not a membrane protein. /note=Secondary Annotator Name: Patel, Sahaj /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS complement (50400 - 50594) /gene="56" /product="gp56" /function="helix-turn-helix DNA binding domain" /locus tag="GalacticEye_56" /note=Original Glimmer call @bp 50594 has strength 10.68; Genemark calls start at 50594 /note=SSC: 50594-50400 CP: yes SCS: both ST: NI BLAST-Start: [helix-turn-helix DNA binding protein [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 5.75042E-39 GAP: 267 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.957, -2.6623718267059564, yes F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding protein [Gordonia phage Woes] ],,YP_009273445,100.0,5.75042E-39 SIF-HHPRED: Regulatory protein cox; helix-turn-helix, DNA binding, VIRAL PROTEIN; 2.401A {Enterobacteria phage P2},,,4LHF_A,84.375,99.1 SIF-Syn: helix-turn-helix DNA binding protein — pham 67437 as of 3.25.22 — since changed to pham 103045 as of 4.1.22 1.) for 103045 — upstream gene is 99535, downstream gene is 95256, akin to phage bianmat 2.) for 103045 — upstream gene is 99535, downstream gene is 95256, akin to phage lidong 3.) for 103045 — upstream gene is 99535, downstream gene is 95256, akin to phage minos synteny filed 4.8.22 /note=Primary Annotator Name: Nelson, Shiloh /note=Auto-annotation: The gene is 195 bp long, with coding potential in both glimmer and genemark. Both genemark and glimmer call the same start site, at 50594. As this start site is highly conserved, this shows that the gene has a greater chance of being ‘real.’ /note=Coding Potential: Glimmer and genemark show that there is coding potential, without violation of the basic guiding principles. All criteria are met — this is a real gene. The gene is 195 basepairs long, well over the 40 codon minimum length listed in the guiding principles, the gene shows synteny with non-draft phage genes anamika and teal. There are no switches in gene orientation. The genemark self annotation map shows that the gene is in the 3rd ORF in the reverse direction. /note=SD (Final) Score: The Final Score is -2.662 for the start site 50594. This is a strong score — as it is more negative than -2 and close to 0. /note=Gap/overlap: The gap is 267 for the downstream gene (gap not conserved in other phages). /note=Phamerator: This gene is in pham 67437 — as of 3/25/2022. All of the genes belong to the CS cluster. /note=Starterator: The start site of 50594 is highly conserved, in all 60 of the 60 nondraft genes in pham 67437. /note=Location call: This is a real gene, as it has good coding potential, which is conserved in the phamerator. From the evidence collected, the start site is 50594. /note=Function call: helix-turn-helix DNA binding protein — there are multiple phagesDB BLAST hits with the suggested function helix-turn-helix DNA binding protein, with small e values of 6e-31. HHPRED has hits that correspond to unique SEA-PHAGES requirements for this gene. NCBI Blast has hits that largely call it as a helix-turn-helix DNA binding protein, (with e-values of 6e-16 and 2e-11 for the first 2 hits and about 90% coverage). There are 2 CDD hits. Both CDD hits list the function as a DNA binding domain, with nearly 75 percent coverage. /note=Transmembrane domains: zero — neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. More specifically, it is a helix-turn-helix DNA binding protein. /note=Secondary Annotator Name: Lin, Yuri /note=Secondary Annotator QC: Make sure to select dropdowns under the Starterator and "All GM Coding Capacity". For final score, the best score is the least negative, not the most. Gap seems really large--maybe add if it`s conserved in other phages? For the Starterator, what was the start site number? Did CDD return any hits? Even if not, add a brief note saying so. Lastly, don`t forget to check evidence for the PhagesDB/NCBI BLASTs and HHpred in PECAAN (and CDD if applicable). /note=— Shiloh — secondary annotator`s qc addressed as of 4.13.22 and applicable edits/revisions made CDS complement (50862 - 51161) /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="GalacticEye_57" /note=Original Glimmer call @bp 51161 has strength 10.58; Genemark calls start at 51161 /note=SSC: 51161-50862 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp57 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.01722E-65 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.76, -3.1379842619144376, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp57 [Gordonia phage Woes] ],,YP_009273446,100.0,1.01722E-65 SIF-HHPRED: SIF-Syn: Function unknown. The Pham number is 99535, downstream Pham number 103045 and a helix-turn-helix DNA binding protein, upstream Pham number is 21372 and a DNA helicase. Pham is conserved just like in phages Teal and Newt. /note=Primary Annotator Name: Ruiz, Paola /note=Auto-annotation: Called by both Glimmer and GeneMark at 51161. /note=Coding Potential: For both GeneMark Host and Self-Trained the coding potential in this ORF is mainly on the reverse strand, indicating that this is a reverse gene. It accurately shows an upward hash around 51161 and with a downward hash at 50862 for both Host and Self-Trained. In Self Trained and Host Genemark, the typical and atypical coding potential also spans the length of the start and end sight. Overall, the gene has good coding potential. /note=SD (Final) Score: The final score is -3.18 and the z score is 2.76 which are the best scores on PECAAN. /note=Gap/overlap: There is a 4 bp overlap indicating that it may be part of an operon. It is conserved in other phages such as Hello and Guillaume. There is no coding potential in the overlap that may be a new gene. /note=Phamerator: The pham number is 99535 as of 4/5/2022. It is conserved; found in Anamika and Hail2Pitt which are all in cluster CS3. /note=Starterator: Start site number is 1 which correlates to start site 51161 bp for GalacticEye. Start 1 is the most called start number. It was called for in 13/13 non-draft genes in the pham. /note=Location call: Based on the above evidence, this is a real gene with start site 51161. Starterator agrees with Glimmer and Genemark. /note=Function call: Function is unknown. The top 8 phagesdb BLAST hits have unknown function (e value 2e-50, 100% identity, 100% positives). For NCBI BLAST, there were only 3 hits and all were hypothetical proteins. The e values were 1e-65, 2e-65 and 1e-64; all >98% identity and positives. CDD and HHPRED were not helpful in determining function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Perez, Joshua /note=Secondary Annotator QC: I have QC`ed this gene and agree with the primary annotation. I have a couple of suggestions: For coding potential, mention that it is the 1st RF (reverse) to be a bit more clear which ORF it is. Great job on gap explanation. If Phamerator turned out any functions, those could be useful to note. For Starterator, it could also be helpful to note what % of the time it was called. For location call, give a bit more detail about what shows this: Starterator most annotated start site, high coding potential in 1st RF reverse, etc. For the synteny box, make sure to input the upstream and downstream functions as well. Good job! CDS complement (51158 - 52570) /gene="58" /product="gp58" /function="DNA helicase" /locus tag="GalacticEye_58" /note=Original Glimmer call @bp 52570 has strength 11.98; Genemark calls start at 52699 /note=SSC: 52570-51158 CP: yes SCS: both-gl ST: SS BLAST-Start: [DNA helicase [Gordonia phage Teal] ],,NCBI, q1:s1 100.0% 0.0 GAP: 73 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.948, -2.6013996449736907, yes F: DNA helicase SIF-BLAST: ,,[DNA helicase [Gordonia phage Teal] ],,QDF16916,100.0,0.0 SIF-HHPRED: Helicase SWR1; Chromatin, Remodeller, ATPase, Histone, NUCLEAR PROTEIN; HET: ADP;{Saccharomyces cerevisiae (strain ATCC 204508 / S288c)},,,6GEJ_M,93.8298,100.0 SIF-Syn: DNA helicase. This gene is Pham 21372 and is conserved in phages, Anamika and Luker. Genome architecture is conserved in other phages such as Newt and Neoevie, the upstream gene is in Pham 15176 and the downstream gene is Pham 99535. /note=Primary Annotator Name: Juarez, Sabrina /note=Auto-annotation: Glimmer calls 52570 as the start site while GeneMark calls the start site 52699. /note=Coding Potential: Good coding potential is found in both Self- and Host-Trained GeneMark in the first reverse reading frame. Coding potential is covered by the site called by GeneMark, 52699. The start site called Glimmer, 52570, does not cover all of the coding potential in either GeneMark Self or Host. /note=SD (Final) Score: Start site 52699 has the third-best z-score, 2.174, and the fourth best final score, -4.367. This site provides a longer ORF of 1542 bp. Start site 52570 has the best z-score, 2.948, and the best final score, -2.601. This call site does not result in the longest ORF but is 1413 bp. /note=Gap/overlap: Start site 52699 has a 56 bp overlap with the upstream gene. Genes do not typically overlap by more than a few bp. This significant overlap was allowed in the final annotation of Gordonia phage Woes. Start site 52570 has a 73 bp gap with the upstream gene that is conserved in all other non-draft phages within this cluster including, Anamika and Newt. /note=Phamerator: The Pham number as of 04/01/2022 is 21372. It is conserved in other phages within subcluster CS3 and is represented in 9 other clusters. In cluster CS, this Pham is listed as a DNA helicase or helicase protein for most phages within this cluster. /note=Starterator: The most annotated start site is not present in this gene. Starterator predicts Start 35 (52570), found in 34/74 genes in this Pham and is called about 50% of the time. Start 35 (52570) has 14 manual annotations and Start 22 (52699) has 1 manual annotation, which as mentioned previously is phage Woes. /note=Location call: Based on the evidence, this is a real gene and likely starts at 52570. Despite not including all initial coding potential at the beginning of this gene, start site 52570 predicted by Glimmer seems to have more evidence in terms of synteny with other phages and history with starterator manual annotation calls. Start site 52570 has the best combination of scores, results in a long ORF, 1413 bp, though not the longest, and has a typical start codon, GTG. /note=Function call: DNA Helicase. The top non-draft phagesDB BLAST hits have the function listed as DNA helicase in Gordonia phage Teal (E-value of 0 and 100% identity) and Gordonia phage Newt (E-value of 0 and 100% identity). The top seven NCBI BLAST hits are listed as DNA helicase. These results list Gordonia phages, as the top hit in Gorodnia phage Teal (100% coverage, 99%+ identity, an E-value of 0). The top hit in CDD is for a Superfamily II DNA/RNA helicase with 94.68% coverage and an E-value of 0, the identity (14.09%) is rather low. The top two hits in HHPRED are a Nuclear protein STH1/NPS1 (94.3% coverage, 100% probability, and E-value of 0) and a Helicase SRW1 (93.8% coverage, 100% probability, and E-value of 0). /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore /note=it is not a membrane protein. /note=Secondary Annotator Name: Perez, Joshua /note=Secondary Annotator QC: I have QC`ed this gene and agree with the primary annotation. A few suggestions: For coding potential, denote what RF it is (I believe 1st reverse?). Good job on gap conservation. If Phamerator returned some functions, those might be helpful to include (only if it did). Great job overall! CDS complement (52644 - 52925) /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="GalacticEye_59" /note=Original Glimmer call @bp 52925 has strength 4.29; Genemark calls start at 52925 /note=SSC: 52925-52644 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp55 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.0242E-60 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.021, -4.54905795997258, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp55 [Gordonia phage Woes] ],,YP_009273448,100.0,1.0242E-60 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Li, Shally /note=Auto-annotation: Both Glimmer and GeneMark call the start site to be 52925. /note=Coding Potential: There is strong coding potential in the self-trained GeneMark in the second reading frame of the reverse direction, however the proposed start of the gene does not have high coding potential. There is very little coding potential in the host-trained GeneMark. /note=SD (Final) Score: -4.549. This is the second best final score on PECAAN. However, the best final score on PECAAN would result in an overlap of 28bp, which is not very likely /note=Gap/overlap: 4bp overlap (-4bp). This indicates that the gene may be a part of an operon. /note=Phamerator: The pham is 15176 as of 4/4/22. There are 47 members of the pham, 37 of which are non-drafts. All members of the pham belong to cluster CS like GalacticEye /note=Starterator: Start number 6 is called in 34/37 non-draft genes in the pham. This corresponds to start site 52925 in GalacticEye, which is called by both Glimmer and GeneMark. The start site with the second most MAs is start number 3, corresponding to 52931 in GalacticEye. However, this start site has a low Z-score (0.817) and final score (-7.157) in PECAAN, making it unlikely to be the correct start site for GalacticEye. /note=Location call: Based on the above evidence, this is a real gene with start site 52925. /note=Function call: NKF. There are no hits in Phagesdb BLAST with a known function, and there are no feasible HHPRED hits. All HHPRED hits have extremely high e-values (>9.7). NCBI BLAST also has no hits with a known function, and there are no CDD hits. /note=Transmembrane domains: There are no TMDs as detected by TmHmm and Topcons, indicating that this is not a membrane protein. /note=Secondary Annotator Name: Shah, Aayushi /note=Secondary Annotator QC: I agree with this annotation and functional call. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score, and an understandable NKF call. CDS complement (52922 - 53278) /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="GalacticEye_60" /note=Original Glimmer call @bp 53212 has strength 4.87; Genemark calls start at 53212 /note=SSC: 53278-52922 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_ANAMIKA_59 [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 1.10852E-81 GAP: 50 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.053, -4.482213237254482, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ANAMIKA_59 [Gordonia phage Anamika] ],,ATW61154,100.0,1.10852E-81 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Patel, Rishi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 53212. Both call the start codon as ATG. /note=Coding Potential: The coding potential for this open reading frame (ORF) is only on the reverse strand which shows that it is a reverse gene. In the Host-Trained GeneMark, the coding potential is not apparent within the called start site (53212) and stop site (52922), in fact, it does not appear to show coding potential in the region at all. The Self-Trained GeneMark, however, shows coding potential in the region. The typical coding potential fits within the range of the called start site and the stop site, however, it is worth noting that the atypical coding potential extends outside of the range of the start site (this is slight evidence against the called start site). However, it is clear that the gene appears real from the analysis of the Self-Trained GeneMark. /note=SD (Final) Score: The final score is the best option of all the start sites listed (-3.598). Additionally, the Z score is the second best out of all of the called start sites (2.474) – the best one being 2.87. The scores for the called start site are pretty good, however, more evidence is still needed to come to a conclusion on the start site. /note=Gap/overlap: The gap between the stop site and the downstream gene (Gene #56) appears to be conserved across phages in the cluster (CS3) at -4, most likely indicating an operon. However, the gap between the start site and the upstream gene (Gene #58) does not appear to be conserved in all of the phages. The called start site shows a gap of 116 bp (large for a compact phage genome), however, when looking at phages of the same cluster, some of them show gaps of around 50 bp. For example, phage Harambe has a 51 bp overlap with its upstream gene and there are equal gaps in phage Hello and Jormungandr. Thus, this may be an indication of an incorrect start site considering one of the other called start sites (53278) has a gap of 50 bp which is what we see in some phages of cluster CS3 phages. After closer examination, it appears that the gap with the upstream gene is conserved in a lot of draft phages like Bianmat and Sticker17, however, more final calls show the 50 bp gap. All of the phages that I looked at (Harambe, Hello, Jormungandr, Anamika) showed synteny with this gene #57 from GalacticEye, showing that this is likely a real gene. /note=Phamerator: Gene is found in pham number 40827 as of 4/1/2022. The gene is conserved when looking into phages Harambe, Hello, and Anamika, which are all in the same cluster (CS3) as GalacticEye, thus it appears conserved in Cluster CS3 (evidence for real gene). This pham also only contains phages from cluster CS3. Phamerator does not call a function. /note=Starterator: (Start: 1 @53278 has 9 MA`s), (Start: 2 @53212 has 9 MA`s) - equal number of calls among all non-draft genes. /note=Location call: Considering the evidence above, this gene is most definitely a real gene. The evidence points to start site 53278 based on a multitude of evidence. The strongest evidence being that Starterator calls 53278 (start site #1) the most times (8/13 times among non-draft genes in the pham), meaning that this is the most conserved start site within the cluster itself. it also closes the gap. Although the Z score and final score for the start site at 53278 is not the best, this form of evidence does not impact as much as conservation of the start site. The Z score and Final score are also not the worst with the Z score still being better than 2 and the Final score only less than the called start site by a small margin. Finally, typical coding potential lies within range for both start sites, however, it appears that atypical coding potential does extend beyond the called start site but fits into start site 53278. Ultimately, I believe that the start site should be altered to 53278 due to gathered evidence and the differentiation between draft phages and non-draft phages. /note=Function call: Neither NCBI GeneBank Blast nor PhagesDB Blast garnered any evidence regarding a function call for the gene. Out of all of the results listed, none of them showed a function beyond “no known function” or “hypothetical protein.” Additionally, when looking at HHpred, there were no hits listed. The only program that listed a function was CDD as it called this gene to be an outer membrane protein. However, the e-value was too high, thus, it was not good evidence and should not be considered. Additionally, neither TMHMM nor TOPCONS calls a membrane protein, so this is further evidence that this CDD hit should be dismissed. Thus, it is safe to say, at the moment, that gene #57 of GalacticEye has no known function (NKF). /note=Transmembrane domains: No transmembrane domains (TMDs) were predicted by either TMHMM or TOPCONS, thus, it appears that my gene is not considered a “membrane protein.” At this point, this is in agreement with previous function calls for my gene as there were no sufficient calls for the function of the gene. As of now, this gene has no known function. /note=Secondary Annotator Name: Villarreal, Alexia /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (53329 - 54393) /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="GalacticEye_61" /note=Original Glimmer call @bp 54393 has strength 15.03; Genemark calls start at 54393 /note=SSC: 54393-53329 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ANAMIKA_60 [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 0.0 GAP: 104 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.117, -4.487636275856737, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ANAMIKA_60 [Gordonia phage Anamika] ],,ATW61155,100.0,0.0 SIF-HHPRED: SIF-Syn: Synteny: NKF, downstream gene is in Pham 40827 (possible function being NKF), upstream gene is in Pham 11095 (Possible Function being helix-turn-helix DNA binding Protein), just like phages Woes and Harambe. /note=Primary Annotator Name: Patel, Sahaj /note=Auto-annotation: Gene Mark and Glimmer called the start site to be at 54393 and the stop site to be at 53329. Additionally, it is a gene that runs in the reverse direction. /note=Coding Potential: Yes. Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. High coding potential is found in the GeneMark Self and GeneMark Host. Therefore, there will be coding potential in this gene since the GeneMark Self and GeneMark Host supports the gene being active. /note=SD (Final) Score: -4.488, which was the highest Final Score. The Z score is 2.117, which is the highest Z score. This supports the existence of the gene, as well as it having the correct start and stop sites. /note=Gap/overlap: The gap value is 104, which signifies no overlap and the possibility of the gene not being part of an operon. This overlap is conserved in other phages (compared to Hello and Harmabe, amongst others), so it, therefore, makes sense. There is a gap present after the gene which is about a 100 base pairs long, and there is no atypical activity present in the gap. The gap is present in other complete viral genomes (such as Hello and Harambe). /note=Phamerator: 19660. Date 2/04/2022. It is conserved, found in Hail2Pitt (CS), Harambe (CS), and Lahirium (CS). /note=Starterator: Start site 3 in Starterator was manually annotated in 46/46 non-draft genes in this Pham. Start 3 is 54393 in Powerpuff. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 54393. /note=Function call: The function of this protein cannot be determined for numerous reasons. Phages DB Blast states that this protein has no function, and this is reflected by the hits this database presented. They were found in the phages Teal and Nimi13, and the hits had high scores and low e-values (compared to the rest of the data presented). Additionally, NCBI BLAST reflected the same findings as to the Phages DB Blast, with the hits coinciding with two phages Anamika and Woes, and also calling that the protein has no function. No data was gathered from CDD. Lastly, HHPred called a possible function to the protein (scaffold protein), but the value was extremely high (48) and the probability was below 90% (33.95%). This signifies that this protein indeed has no function. /note=Transmembrane domains: ThHmm and Topcons predict there to be no transmembrane proteins present (predicted to be 0). /note=Secondary Annotator Name: Araque, Colette /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: May want to add some more supporting information like whether the chosen start site encodes for a common start codon and whether it results in the LORF and/or a reasonable gene length. CDS complement (54498 - 54821) /gene="62" /product="gp62" /function="helix-turn-helix DNA binding domain" /locus tag="GalacticEye_62" /note=Original Glimmer call @bp 54821 has strength 4.28; Genemark calls start at 54821 /note=SSC: 54821-54498 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding protein [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.78005E-74 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.189, -8.395393723254939, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding protein [Gordonia phage Woes] ],,YP_009273451,100.0,1.78005E-74 SIF-HHPRED: Putative DNA-binding protein; BldC, S. coelicolor, developmental switch, MerR-like, DNA BINDING PROTEIN-DNA complex; 3.09A {Streptomyces venezuelae},,,6AMA_A,68.2243,98.8 SIF-Syn: This gene shows synteny as a helix-turn-helix binding domain in other published phages (Harambe, Hello, Newt, etc.). The gene upstream is NKF, pham 55772, and shows synteny with other published phages (Hello, Newt, etc.). The gene downstream is also NKF, pham 19660, and shows synteny in other published phages (Hello, Newt, etc.). /note=Primary Annotator Name: Lin, Yuri /note=Auto-annotation: Glimmer and GeneMark agree on a start site of 54821 bp (start codon ATG). The autoannotated start site also has the LORF, which supports this location call. /note=Coding Potential: There is good coding potential within the ORF and all coding potential is contained within the autoannotated start site. Self-trained and host gene marks both indicate that this is a reverse gene. /note=SD (Final) Score: The start site at 54821 bp has a very negative Final Score of -8.395 and a Z-score of 0.189, both of which are poor values and indicate there may be better candidates. /note=Gap/overlap: The start site at 54821 has an overlap of 4 bp, which indicates it may be part of an operon. It is in a long sequence of reverse genes, so this is a possibility. /note=Phamerator: As of 4/7/22, the pham number is 11095. The gene is conserved in 61 other phages, of which 15 are drafts. All other phages that have this gene with the same pham number are in cluster CS like GalacticEye, with the exception of one (DocB7) which is a singleton. /note=Starterator: Start site 9 in Starterator was manually annotated in 37/46 non-draft genes in this pham and is called 94.1% of the time when present. Start site 9 is at position 54821 bp on GalacticEye. This is strong support for the autoannotated start site. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 54821. Even though the Final score and Z-scores are poor for this start site, the conservation and manual annotation of this start site and the 4 bp overlap should be weighed above these values. /note=Function call: Helix-turn-helix DNA binding domain - Three of the five top PhagesDB BLAST hits were helix-turn-helix DNA binding proteins (identity 100%, e-value = 4e-57), and the top two hits on NCBI BLAST were also helix-turn-helix DNA binding proteins (coverage 100%, e-value < 8e-61, percent identity > 80.37%). CDD returned no results but HHpred returned multiple hits, of which the top two were a putative DNA-binding protein (98.8% probability, e-value = 4.1e-8, score = 59) and a helix-turn-helix, DNA binding protein (98.57% probability, e-value = 8.8e-7, score = 57.29). /note=Transmembrane domains: Neither TmHmm nor TOPCONS predicted any TMHs. /note=Secondary Annotator Name: Bharadwaj, Shreya /note=Secondary Annotator QC: I have QC`ed this gene and agree with the location and function call. I would add a note on the coding potential about whether or not self/host trained gene mark indicate that this is a reverse gene. You could also add a line about the 54821 start site having the longest open reading frame as support for your location call.Don`t forget to fill out your synteny box. [Edit: All concerns addressed by primary annotator] CDS complement (54818 - 55177) /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="GalacticEye_63" /note=Original Glimmer call @bp 55177 has strength 8.69; Genemark calls start at 55177 /note=SSC: 55177-54818 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ANAMIKA_62 [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 2.71254E-84 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.343, -4.701320725655571, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ANAMIKA_62 [Gordonia phage Anamika] ],,ATW61157,100.0,2.71254E-84 SIF-HHPRED: SIF-Syn: NKF, upstream NKF with Pham 57406, downstream helix-turn-helix DNA binding domain. Similar phages: Anamika, Lahirium /note=Primary Annotator Name: Perez, Joshua /note=Auto-annotation: Gene was called by both Glimmer and Genemark, and starts at the same site at 55177 and calls GTG. /note=Coding Potential: Yes, this gene has decent coding potential predicted within the first RF reverse. This chosen start site does cover all this coding potential in both the Self GeneMark, but does not cover all of the coding potential in Host Genemark. /note=SD (Final) Score: -5.80; has the second lowest negative final score present. The Z score present for this start is 2.343, which is high in terms of being close to 2. /note=Gap/overlap: Glimmer start has a gap of 2 bp and seems to be conserved in the Pham Maps. This small gap is most likely due to being part of an operon. This gap seems to be conserved in other phages, with a few being Anamika, Neovie, and Guillaume. /note=Phamerator: Pham 55772 4/05/22 is present in other members of the cluster CS. I used Austin, Diabla, and 5 others from cluster CS. /note=Starterator: This is a reasonable conserved start site. The start site number is 4, and the coordinate base pair number is 55177. This is included as one of the most annotated genes, it has 37 manual draft annotations, is called 92.2% of the time when present, and has 51 members in its pham. This is good evidence that start site 4 with bp number 55177 is the correct start site. /note=Location call: The overall evidence shows that this gene is most likely real at start 4 and bp number 55177 due to decent coding potential in its first RF reverse and evidence from Starterator. Glimmer start site of 55177 is the most likely start site, as it has the second highest ORF and lowest gap bp. This is most likely a real gene as it has coding potential. /note=Function call: The top 5 NCBI hits sorted by e-value had low e-values (0), but had a high % identity (around 74%). For example, phage Guillaume has a 99.16% identity and an e-value of 0. However, the gene is said to have NKF, which must be confirmed with other programs. The first hits were NKF. There is no data available from CDD. Overall, the lack of evidence toward this gene’s function classifies it as NKF. /note=Transmembrane domains: No predicted TMDs by TOPCON or TMHMM, where absence of TMDs indicates that this gene is not a membrane protein and that the function remains unknown. /note=Secondary Annotator Name: Likwong, Chloe /note=Secondary Annotator QC: I agree with the primary annotator. Despite not having the most positive Final Score and Z-score. Starterator lists that Start@55177 has 37 MA`s, while other start site candidates have none. Glimmer and GeneMark also call the same start site. Start@55177 also covers the coding potential. CDS complement (55180 - 55476) /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="GalacticEye_64" /note=Original Glimmer call @bp 55476 has strength 2.39; Genemark calls start at 55476 /note=SSC: 55476-55180 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp50 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 2.47622E-66 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.007, -3.5800013779028776, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp50 [Gordonia phage Woes] ],,YP_009273453,100.0,2.47622E-66 SIF-HHPRED: SIF-Syn: No known function, downstream gene is NKF in pham 55772 and upstream gene is NKF in pham 56384, just like in phage Guillaume. /note=Primary Annotator Name: Shah, Aayushi /note=Auto-annotation: Called in Glimmer and GeneMark, both at start site 55476. /note=Coding Potential: There is reasonable coding potential between the putative ORF, for the GeneMark Self, though the autoannotated start site of 55476 doesn’t fully cover all the coding potential. There is no coding potential found in the GeneMark Host in this region. All the coding potential is not covered but there are no potential start sites that could make this longer to cover all the coding potential, and there is no coding potential in the GeneMark host at all. This could be explained through the start site showing an operon, however. /note=SD (Final) Score: The best final score is of -3.850, for the 55476 start site. Based on the overlaps seen, this gene is in an operon, and therefore RBS score is not as relevant for a start call but it also has the best Z-score, of over 3. /note=Gap/overlap: The start site of 55476 has an overlap of 4. This overlap indicates an operon with either start site. The start site of 55476 creates the longest operon as well. /note=Phamerator: The pham as of 04/05/22 is 57406. A lot of other phages in this cluster have this pham present, as seen in Adgers_68, Hello_63, and Butterball_68. The function called in phamerator is not given. /note=Starterator: Start site 9 is conserved across many manually annotated genomes in the pham, and represents the start site at bp 55476. 35/37 non-draft genes call this start site, which is strong evidence. It is the most annotated start site. /note=Location call: Based on evidence the gene is real and has a start site of 55476, which covers most all of the coding potential found and has a reasonable overlap placing it in an operon, the gene has strong synteny, that start site has the best final final score, as well as strong evidence from starterator and phamerator as conserved within a pham. /note=Function call: No known fucntion. The top 5 non-draft PhagesDB BLAST hits call the function as function unknown with e values of 3e-55 for all of them. In the NCBI BLAST, the top 5 calls sorted by e value called it as a hypothetical protein, with e values of 6.1578e-24 for all, and 100% coverage, pointing to no known function. No CDD hits were found, and HHpred has no hits with an e value below 0.0027, and had no hits with strong coverage. There are no TMDs. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Pramana, Martin /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (55473 - 55601) /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="GalacticEye_65" /note=Original Glimmer call @bp 55601 has strength 15.36; Genemark calls start at 55601 /note=SSC: 55601-55473 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp49 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 8.46739E-18 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.729, -3.588861267572955, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp49 [Gordonia phage Woes] ],,YP_009273454,100.0,8.46739E-18 SIF-HHPRED: SIF-Syn: There is no known function for this gene, which is observed in phages Anamika and Harambe, both of which are in subcluster CS3. Upstream gene is in pham 57406, as it is in both Anamika and Harambe. Downstream gene is in pham 102768 and is conserved in both Anamika and Harambe as well. /note=Primary Annotator Name: Villarreal, Alexia /note= /note=Auto-annotation: Glimmer and Genemark Start sites are both at 55601. The start codon is ATG. /note= /note=Coding Potential: Coding Potential of this ORF is only on the reverse strand, which indicates it is a reverse gene. There is strong Coding Potential indicated by GeneMark Self and strong Coding Potential from GeneMark Host. Lots of synteny conservation observed of gene within other phages indicate high coding potential within this gene. /note=SD (Final) Score: The final score for this gene is -3.589, which is the best value of the two listed, indicating that this start site is good and very probable. /note=Gap/overlap: Small overlap of 1 is in good standing and is preferred as it falls under 50 bp range; this start site is not the longest ORF however the 1 nucleotide overlap is optimal and supports this starting site being likely. /note=Phamerator: pham:56384. Date 04/07/2022. It is conserved in other phages within the same cluster CS such as Anamika_64, Diabla_71, and Butterball_69. /note=Starterator: Start site 9 in Starterator was manually annotated in 12/37 non-draft genes in this pham. Start 9 is at 55601 in GalacticEye. Although this evidence is not the most annotated start site, it is an alternative start site that is collectively called 90.9% of the time when present as compared to the other most annotated start site which is only called 57.4% of the time when present. It is reasonable to still consider 55601 as the probable start site within this gene. /note=Location call: Confidently confirm this gene’s start site at 55601. Would choose to keep this state site for the data suggested by guidelines regarding coding potential, same starting site predictions as delivered by Glimmer and GeneMark, high synteny with other phage data, most reasonable (smallest) overlap of 1 recorded between all other start sites, the start codon being ATG is highly probable as it is one of the more common initiation codons, as well as a good Final score of -3.589 and a good z-score of 2.729. Based on the above evidence, this is a real gene and the most likely start site is 55601. /note=Function call: According to the data obtained by PhagesDB Blastp and NCBI Blastp, as well as HHpred, suggested function of the gene is a hypothetical protein as there is evidence with high query coverage (100%), high percentage of identity conservation (100%-95.348%) as well as low E-values such as 8.46739e-18, 1.80433e-17, and 1.93072e-17 . /note=Transmembrane domains:TMHMM or TOPCONS both do not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Pay, Iona /note=Secondary Annotator QC: Need to add information about Z score in SD Score section, not just in location call, and fill in the synteny box. Otherwise, nicely done! CDS complement (55601 - 56227) /gene="66" /product="gp66" /function="hypothetical protein" /locus tag="GalacticEye_66" /note=Original Glimmer call @bp 56227 has strength 8.51; Genemark calls start at 56227 /note=SSC: 56227-55601 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein BH793_gp48 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 2.37348E-153 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.516, -3.50848394673489, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp48 [Gordonia phage Woes] ],,YP_009273455,100.0,2.37348E-153 SIF-HHPRED: SIF-Syn: This gene encodes for a protein of an unknown function , which is conserved in phages Teal, and Luker of the CS3 subcluster. The upstream gene is in pham 102610 and has no known function which is the same as the upstream gene in the genomes of both Teal and Luker. The downstream gene is in pham 56384 abd has no known function which is the same as the upstream gene in the genomes of both Teal and Luker. /note=Primary Annotator Name: Araque, Colette /note=Auto-annotation: Both Glimmer and GeneMark called the same start site of 56227. Start site 56227 has a corresponding start codon of ATG. /note=Coding Potential: The gene has coding potential predicted within the putative ORF according to both Genemark Self and Host. The chosen start site covers all the corresponding coding potential. /note=SD (Final) Score: For the 56227 start site, there is a SD score of -3.508 and a Z score of 2.516. -3.508 is strong and is the best SD score as it is the least negative compared to the other scores. 2.516 is very strong since it’s above 2 and is the second highest compared to the other potential start sites. The highest Z score (2.593) belongs to start site 56137, but this start site has a more negative SD score and results in a 101 bp gap which is quite large. /note=Gap/overlap: There is a 11bp gap which is longer than what is ideal, but can still be considered reasonable as it lies below the 50bp maximum.There is no coding potential in this gap nor would there be any space for another gene to exist in this gap. This overlap/gap is the smallest compared to the other start sites. There are no other good alternative start site options. The length of the gene with the start site of 56227 is 627bp long which is acceptable as it is well over the 120bp minimum and is the LORF. /note=Phamerator: As of 4/5/2022 this gene is found in Pham 102768. There are 61 total members in this pham (49 non-drafts and 12 drafts). This pham is present in other members of the same cluster (CS) which my phage belongs to. Some phages used for comparison are Beaver, Butterball, and Newt. The Phams database did not have a function called for this gene of the GalacticEye phage nor did it have a function called for any other genes within this pham. /note=Starterator: There is a reasonable start site choice that is conserved among the members of pham 102768. The number of this conserved start site is 15 which corresponds with the 56227 base pair coordinate of my phage (site predicted by Glimmer and GeneMark). This start site is found in 48 of 61 genes in pham 56719, and when it is present, it is called 91.7% of the time. There are 33 manual annotations of this start site. This evidence highly supports the start site predicted by both Glimmer and GeneMark. /note=Location call: Gene 63 of the GalacticEye phage appears to be a real gene due to it being conserved in phamerator and it having considerable coding potential. The 56227 start site seems the most likely as it has strong SD and Z scores, has the smallest gap/overlap, results in the LORF, encompasses all the coding potential, is called 91.7% of the time when present in genes of the same pham, and has 33 MA’s. /note=Function call: Gene 63 most likely encodes for a protein of an unknown function. There are multiple phagesDB BLAST hits that suggest function unknown and those hits have strong e values of 1e-123 and 1e-122. There is also an NCBI BLAST hit that calls function unknown with strong identity percentages (99.3%-99.5%) and e values (2.4e-153 and 4.34e-153). /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs meaning that this is not a transmembrane protein. /note=Secondary Annotator Name: Cho, Emily /note=Secondary Annotator QC: I have QCed this location call and agree with the primary annotator. NOTE: you cannot use “suggested start” for starterator dropdown menu when the start site is not the most conserved start site, so clarify if the start site was the most conserved start site. Do not check draft genomes as evidence for BLAST hits. Good explanation of upstream gap. CDS complement (56239 - 56787) /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="GalacticEye_67" /note=Original Glimmer call @bp 56787 has strength 6.53; Genemark calls start at 56754 /note=SSC: 56787-56239 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_ANAMIKA_66 [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 2.62108E-131 GAP: -44 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.957, -2.6623718267059564, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ANAMIKA_66 [Gordonia phage Anamika] ],,ATW61161,100.0,2.62108E-131 SIF-HHPRED: SIF-Syn: NKF. The downstream gene is in pham 102768 and has NKF similar to Anamika. The upstream gene is in pham 4187 and has NKF similar to Anamika. /note=Primary Annotator Name: Bharadwaj, Shreya /note=Auto-annotation: GeneMark calls a start site of 56754 and Glimmer calls a start site of 56787. /note=Coding Potential: The coding potential of this ORF is on the reverse strand, which means that this is a reverse gene. I found coding potential for this gene in both GeneMark Self and Host. /note=SD (Final) Score: The final score for start site 56787 is -2.662. This is the best final score according to PECAAN and this start site agrees with the one called by Glimmer. /note=Gap/overlap: -44 for start site 56787, which is indicative of a 44 bp overlap. This is a relatively large overlap which does not fall within the -7 to +7 range. However, this overlap has been conserved across the genomes for phages Harambe and Guillame. Additionally, start site 56787 has the best final score, z-score, and start codons out of all of the other start sites. /note=Phamerator: 102610 was the pham of the gene as of 4/5/22. It has also been found in phages Guillame_66 and Harambe_66. /note=Starterator: Start site 15 was manually annotated 22 times in non-draft genomes. The start site 15 corresponds to start coordinate 56787 in GalacticEye which agreed with the Glimmer auto-annotation but not the GeneMark auto-annotation. /note=Location call: Based on the above evidence, this is a real gene with a likely start site of 56787. Glimmer called this start site in the auto-annotation while GeneMark did not. Even though these two programs do not agree, there is strong evidence for the 56787 start site as it has a well-conserved overlap, 22 manual annotations, as well a strong final score of -2.662. /note=Function call: NKF. All of the PhagesDB Blast hits indicate with a high e-value (1e-104) that the function is unknown for this gene. HHPRED hits indicated that it could be a coat protein but the e-value is very high (87) and the %coverage was very low (26.9%). NCBI BLAST hits indicate a “hypothetical protein” with a low e-value of >2.27e-127. CDD had no relevant hits. /note=Transmembrane domains: TMHMM and TOPCONS do not predict any TMDs, so this is not a membrane protein. /note=Secondary Annotator Name: Enos, Alex /note=Secondary Annotator QC: I have QCed this location call and agree with the primary annotator. CDS complement (56744 - 56995) /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="GalacticEye_68" /note=Original Glimmer call @bp 56995 has strength 4.43; Genemark calls start at 56995 /note=SSC: 56995-56744 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp46 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.30452E-54 GAP: 14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.884, -5.3628583592769, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp46 [Gordonia phage Woes] ],,YP_009273457,100.0,1.30452E-54 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Likwong, Chloe /note=Auto-annotation: Glimmer and GeneMark predicted the same starting sites at Start@56995. /note=Coding Potential: Via GeneMark, Start@56995 covers the coding potential found in both the host and self maps, and there is no violation of the guiding principles. /note=SD (Final) Score: Final Score of -5.363 and a Z-score of 1.884 for the start site @56995. It is important to note, the Final Score of Start@56995 is not the most positive and the Z-score score is not the highest. /note=Gap/overlap: There is a 14bp gap with the upstream gene. The length of 14bp is also small–a functional gene must at least be 120bps long. /note=Phamerator: This gene is in pham 4187 as of the date 3/31/2022. There are 61 members, 12 of which are non-final drafts. GalacticEye is found in cluster CS with members like Diabla and Beaver. As of now, there is one singleton and the only cluster is CS. Majority of the Final Draft genes list no known function. /note=Starterator: In pham 4187, GalacticEye has the “Most Annotated” start, which is Start site 8 @56995 that has 37 MA’s. /note=Location call: The gene seems to be a real gene given that Start@56995 covers the coding potential, has a gap of 14bp, and has 37 MA’s done compared to the other potential Start sites; hence, the location call is at Start @56995. It is important to note that the Final Score of Start@56995 is not the most positive and the Z-score score is not the highest. /note=Function call: The function is unknown. In PhagesDB, several hits with e-values of 9e-45 that list no known function. Similarly, in NCBI BLASTP, the top hits, with the strongest being 1.30e-54, list hypothetical protein as the function; the top two hits in NCBI BLASTP list a value >89% for %coverage, %identity, and %alignment.. In HHPRED, the hits have weak e-values. /note=Transmembrane domains: No transmembrane domains detected by TMHMM or TOPCONS, indicating that the gene is not a membrane protein. /note=Secondary Annotator Name: Maraziti, Gabriela /note=Secondary Annotator QC: I have QC`ed this function call and agree with the first annotator. All sections have been checked. Note: please adjust the synteny box to reflect the functions of the upstream and downstream genes as opposed to their phams. CDS complement (57010 - 57903) /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="GalacticEye_69" /note=Original Glimmer call @bp 57903 has strength 9.65; Genemark calls start at 57909 /note=SSC: 57903-57010 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_JORMUNGANDR_68 [Gordonia phage Jormungandr] ],,NCBI, q1:s1 100.0% 0.0 GAP: -41 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.516, -3.859592806742189, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_JORMUNGANDR_68 [Gordonia phage Jormungandr] ],,QBP30345,99.6633,0.0 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF, downstream is NKF, similarly found in phage Jormungandr and Lahirium. /note=Primary Annotator Name: Pramana, Martin /note=Auto-annotation: GeneMark (Self and Host) class for start site 57909. Glimmer calls for start site 57903. /note=Coding Potential: There is a reasonable coding potential in the ORF with a reverse direction (located at the sixth frame). This coding potential does not include all possible start sites. This coding potential is found in both GeneMark Self and Host. /note=SD (Final) Score: -3.860. It is the third-best Final Score on PECAAN. /note=Gap/overlap: There is a large upstream gap of 438 bp. This gap is reasonable because it is conserved in other phages such as Jormungandr, Guillaume, and Anamika. This start site is not the longest ORF. /note=Phamerator: As of 4/4/2022 the gene is found in Pham 3928. It is conserved in cluster CS3 (Anamika and Guillaume) but also found in other clusters such as CS2 (Diabla and Monty). /note=Starterator: Start site 24 is conserved in 14/37 non-draft members (called 44% of time when present). Start site 24 has a position of 57903 in GalacticEye. The start site agrees with Glimmer but not with GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the start site is most likely 57903 bp. Starterator agrees with Glimmer. Starterator does not agree with GeneMark. /note=Function call: NKF. Both NCBI and PhageDB BLAST’s top 2 results state that this gene doesn`t have a known function. The protein matches with phages Jormungandr, Lahirium, and Nimi13 have high query coverage of 100%, high % identities (>99.326%), and low E-values (0). CDD shows no matches/ hits for this gene. While HHpred has no good hits in the databank. The first match from HHpred has a high E-value (6.6), a high probability (74.4%), and low coverage (7.744%). Therefore, there are no relevant hits from CDD or HHpred. /note=Transmembrane domains: Both TMHMM and TOPCONS did not predict any TMDs, so it is not a transmembrane protein. /note=Secondary Annotator Name: Nelson, Shlioh /note=Secondary Annotator QC: great job! nice incorporation of data sets on location and function calls CDS complement (57863 - 58339) /gene="70" /product="gp70" /function="hypothetical protein" /locus tag="GalacticEye_70" /note= /note=SSC: 58339-57863 CP: yes SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_NEWT_70 [Gordonia phage Newt]],,NCBI, q1:s1 100.0% 2.33402E-108 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.506, -3.6709865796924017, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_NEWT_70 [Gordonia phage Newt]],,QDH48740,100.0,2.33402E-108 SIF-HHPRED: SIF-Syn: /note=Added gene. CP on Genemark-Self. None on GM host. /note=This region between the upstream/downstream genes has three different possibilities. Some CS3 phages call no gene and leave a large gap. Some CS3 phages (Lidong, Luker, MrWormie) call a very short gene. Others call a much bigger gene (Newt, Sticker17 are CS3 examples; other members of this pham are in other CS cluster phages, though they are all ~100 bp shorter). Ultimately went with this larger ORF because there were more genes in this pham, the gene fills the gap better, and the coding potential does happen throughout the ORF. Also chose this start site based on Newt/Sticker17, smaller gap, and best RBS score. CDS complement (58342 - 58587) /gene="71" /product="gp71" /function="hypothetical protein" /locus tag="GalacticEye_71" /note=Original Glimmer call @bp 58587 has strength 7.13; Genemark calls start at 58587 /note=SSC: 58587-58342 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp44 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.56026E-50 GAP: 60 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.257, -2.0162541296952132, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp44 [Gordonia phage Woes] ],,YP_009273459,100.0,1.56026E-50 SIF-HHPRED: SIF-Syn: The function of this gene is unknown, which is reflected in phages Lahirium and Luker, both of which are in subcluster CS3. Downstream gene has NKF and is in Pham 3928, as it is in both Lahirium and Luker. Upstream gene also has NKF and is in Pham 4758, which is also the same in Lahirium and Luker. /note=Primary Annotator Name: Pay, Iona /note=Auto-annotation: Glimmer and GeneMark agree that this gene begins at 58587. However, this start site uses an unusual codon, TTG. /note=Coding Potential: This gene displays coding potential in the 6th frame of the self-trained GeneMark, with a very slight overlap in the frame above. The host-trained GeneMark does not have this overlap, and only displays potential in the corresponding 6th frame. /note=SD (Final) Score: 58587 has the highest Z score (3.257) and least negative final score (-2.016), marking it as the best start site despite the TTG start codon. /note=Gap/overlap: This gene is a reasonable length of 246 bp, with a gap of 60 bp. Other phages from the same cluster also report a similar sized gap prior to the gene. /note=Phamerator: This gene is a member of 11306 (4/5/22), and is also annotated in Cluster CS3 phages Sticker17, Neoevie, and Teal. A function is not listed. /note=Starterator: Start 3 (58587, the same as the predicted start) is found in all genes of the Pham, has been manually annotated 35 out of 37 times, and is called more than 95% of the time when present. This is almost certainly the start of the gene. /note=Location call: All evidence being taken into account, this gene appears to be a real gene starting at 58587. Excellent data from Starterator, very favourable Z and final scores, and GeneMark maps all support this start site. /note=Function call: NCBI BLAST returns only one hit with acceptably high % identity and alignment; however, this protein is a hypothetical protein with no proposed function. CDD returned no hits. HHPred returned no hits with acceptable E values (all results were 30 or greater). /note=Transmembrane domains: TmHmm predicted no transmembrane domains, and neither did Topcons. /note=Secondary Annotator Name: Ruiz, Paola /note=Secondary Annotator QC: I have QC’ed this location call and agree with first annotator. List the final and z score and specify the values for top hits for NCBI BLAST. Look at phagesDB to determine function call (dont forget to check evidence in PECAAN) for PhagesDB BLAST and NCBI BLAST. Complete synteny box. CDS complement (58648 - 59385) /gene="72" /product="gp72" /function="hypothetical protein" /locus tag="GalacticEye_72" /note=Original Glimmer call @bp 59385 has strength 13.25; Genemark calls start at 59385 /note=SSC: 59385-58648 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ANAMIKA_70 [Gordonia phage Anamika] ],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.707, -3.188585239877588, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ANAMIKA_70 [Gordonia phage Anamika] ],,ATW61165,100.0,0.0 SIF-HHPRED: SIF-Syn: NKF, downstream gene is NKF pham 11306, 3/31/22), upstream is NKF (pham 83026, 3/31/22), just like in phage Anamika and Harambe. /note=Primary Annotator Name: Cho, Emily /note=Auto-annotation: Both Glimmer and GeneMark call 59,385 as the start site. /note=Coding Potential: The host-trained GeneMark shows reasonable coding potential predicted within the putative ORF, and the chosen start site 59,385 covers all this coding potential. /note=SD (Final) Score: -3.189, The highest (best) value among 15 start site candidates /note=Gap/overlap: Upstream overlap of 1. In the reasonable range (<50bp). May be evidence of an operon.The length of the gene (738) is also reasonable (>120bp). /note=Phamerator: Found in Pham 4758 as of 31th March, 2022. The pham is in other members of the cluster CS only, and there was no function called for any of the members. /note=Starterator: The start number called the most often in the published annotations is 8, it was called in 25 of the 46 non-draft genes in the pham. Starterator was non informative. Suggested start for GalacticEye was 9 (59,385), which is a reasonable start site that is conserved in other pham members, although it was not the most called start number. The most called site was 8 which was called by only about half of the genomes, and start number 9 also seems to be conserved in other genomes with 13 manual annotations. Also, this start site was called 100.0% of time when present and was called by other CS cluster genomes like Harambe and Anamika. /note=Location call: The gene highly seems to be real based on the starterator and phamerator with a start site of 59,385 and good coding potential, all covered by the called start site. It shows the longest ORF and has a reasonable overlap upstream (-1). The gene also shows synteny with many other annotated genomes like Anamika and Harambe. It is also the common ATG start site with the best final score and Z score of 2.707, which is close to 2. Although the start site was not the most conserved one in the pham, it was still called by some genomes in the same cluster and had 13/46 manual annotations. /note=Function call: The top 5 PhagesDb BLASTp hits, sorted by E-value, suggested no known function, with a score of 519, 100 % identity, and low E-value of e-147. The first hit that called a function of HTH DNA binding domain protein had an e-value of 3e-05 but had low % identity (26%). The Phagesdb function frequency of calling this function is also only once out of five times and also in different clusters, so it seems irrelevant. The top 4 NCBI BLASTp hits, sorted by E-value, suggested NKF (hypothetical protein) also, with 100% query coverage, high % identity (>98.37%), and low E-values ranging from 2e-167 to 2e-169. There were no NCBI Conserved Domain Database hits. Several HHpred hits called functions of various proteins and had probability greater than 80%, but it had e-value much greater than 10e-3 (1.6, 3.7, etc.) and very low % coverage (<20%), so there were no meaningful hits. Therefore, the gene seems to have NKF. /note=Transmembrane domains: No TMHs reported from TmHmm. No TMDs reported from Topcon. This protein with no known function does not seem to have a transmembrane domain. /note=Secondary Annotator Name: Juarez, Sabrina /note=Secondary Annotator QC: QC complete, I agree with the annotation and location call with the primary annotator. I think the starterator information would be considered informative because the location call does agree with the suggested start in starterator even though it is not the most annotated start. CDS complement (59385 - 59561) /gene="73" /product="gp73" /function="hypothetical protein" /locus tag="GalacticEye_73" /note=Original Glimmer call @bp 59561 has strength 8.16; Genemark calls start at 59561 /note=SSC: 59561-59385 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PBI_HAIL2PITT_70 [Gordonia phage Hail2Pitt] ],,NCBI, q1:s1 100.0% 5.16816E-32 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.212, -5.2479250502251915, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PBI_HAIL2PITT_70 [Gordonia phage Hail2Pitt] ],,AVP43255,100.0,5.16816E-32 SIF-HHPRED: SIF-Syn: The downstream gene is NKF (pham 4758), upstream gene is NKF (pham 55577). This is similar to other cluster CS phages including Luker, Hello, Anamika, and Newt. /note=Primary Annotator Name: Enos, Alex /note=Auto-annotation: Glimmer and GeneMark both call 59561 as the start site. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. The chosen start site covers all coding potential. /note=SD (Final) Score: The Final score of -5.248 was the second best for the possible start sites with the first being -2.479.However, the better final score had an undesirable gap of 155 /note=Gap/overlap: The gap is -1 which indicates an operon, and is conserved in other phages such as Anamika and Hello. /note=Phamerator: Pham 83026 as of 4/5/22. It is conserved and found in Anamika (CS) and Hello (CS). /note=Starterator: (Start: 5 @59579 has 9 MA`s), (Start: 8 @59561 has 9 MA`s). Calling 59561 based on likely operon. /note=Location call: Based on the consistent evidence from Glimmer, GeneMark, and Starterator, this is a real gene and the most likely start site is 59561. /note=Function call: Unknown Function. All hits from phagesDB BLAST called unknown function with small E-values <4^-29. All NCBI BLAST hits also call "hypothetical protein" (100% coverage, 90%+ identity, and E-value <5^-32). HHpred called a protein but had very low probability and coverage and an E value in the 100s. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Li, Shally /note=Secondary Annotator QC: I agree with the above location and function calls. It is also of interest to note that in Starterator, the most annotated start site in the pham (start site 9) is not present in GalacticEye. CDS complement (59561 - 59788) /gene="74" /product="gp74" /function="hypothetical protein" /locus tag="GalacticEye_74" /note=Original Glimmer call @bp 59788 has strength 11.93; Genemark calls start at 59788 /note=SSC: 59788-59561 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp41 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.98849E-47 GAP: 32 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.948, -4.207781010084296, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp41 [Gordonia phage Woes] ],,YP_009273462,100.0,1.98849E-47 SIF-HHPRED: SIF-Syn: NKF in pham 55577; upstream gene is NKF in pham 20935, downstream gene is NKF in pham 107947, like in phage Harambe. /note=Primary Annotator Name: Maraziti, Gabriela /note=Auto-annotation: Glimmer and GeneMark both call the start at 59788. /note=Coding Potential: The gene has reasonable coding potential within the ORF for both the self and host models, and the start includes all typical and atypical coding potential. /note=SD (Final) Score: -4.208; this is the least negative score, and the Z-score is the highest at 2.948. /note=Gap/overlap: 32 bp gap. This is the second smallest gap of the possible start sites and creates the second longest possible ORF. /note=Phamerator: Pham 55577 as of 4/5/2022. /note=The gene is conserved in many other members of the same cluster, CS, such as Harambe_72 and Anamika_72. There is no function called for any of the genes of this pham. /note=Starterator: 13/14 non-draft genes in the pham call start site 4, which corresponds to position 59788 in the GalacticEye genome. This start is called 90% of the time when it is present. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this gene is real and has a start site at 50403. /note=Function call: NKF; phagesdb BLAST, NCBI BLAST, HHPred and CDD did not return any informative hits. /note=Transmembrane domains: No TMDs predicted by TMHMM or TOPCONS, therefore it is not a membrane protein. /note=Secondary Annotator Name: Patel, Rishi /note=Secondary Annotator QC: I agree with these annotations. The start site 58788 is the most conserved start site among other genes in the pham. Additionally, the gaps are conserved and the coding potential is within the putative ORF. The function call is also correct as there is no evidence of a function across all programs, thus, there is no known function at this time. CDS complement (59821 - 60225) /gene="75" /product="gp75" /function="membrane protein" /locus tag="GalacticEye_75" /note=Original Glimmer call @bp 60225 has strength 11.34; Genemark calls start at 60225 /note=SSC: 60225-59821 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp40 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.85715E-91 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.87, -2.827683592113848, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein BH793_gp40 [Gordonia phage Woes] ],,YP_009273463,100.0,1.85715E-91 SIF-HHPRED: Mycobact_memb ; Mycobacterium membrane protein,,,PF05423.16,76.8657,99.8 SIF-Syn: /note=Primary Annotator Name: Nelson, Shiloh /note=Auto-annotation: The gene is 405 bp long, with coding potential in both glimmer and genemark. Both genemark and glimmer call the same start site, at 60225. As this start site is highly conserved, this shows that the gene has a greater chance of being ‘real.’ /note=Coding Potential: Glimmer and genemark show that there is coding potential, without violation of the basic guiding principles. All criteria are met — this is a real gene. The gene is 405 basepairs long, well over the 40 codon minimum length listed in the guiding principles, the gene shows synteny with non-draft phage genes anamika and teal. There are no switches in gene orientation. The genemark self annotation map shows that the gene is in the 1st ORF in the reverse direction, with atypical coding potential in the 1st forward direction open reading frame. /note=SD (Final) Score: The Final Score is -2.828 for the start site 60225. This is not the most negative score, however it is a reasonable score — as it is more negative than -2. /note=Gap/overlap: The gap is 1, for the downstream gene. /note=Phamerator: This gene is in pham 20935 — as of 3/25/2022. There are 15 clusters represented in this pham: DO, DL, DQ, DF, DC2, DE2, DE4, singleton, DZ, DV, CS4, CS2, DR, CS3, CT. 72 draft genes belong to GalacticEye, with 20 of them sharing a single track, track 7. /note=Starterator: The start site 25 (corresponding to 60225), is found in about 30 percent of the 102 genes in this cluster. /note=Location call: This is a real gene, as it has good coding potential, which is conserved in the phamerator. From the evidence collected, the start site is 60225. /note=Function call: membrane protein /note=Transmembrane domains: one predicted — by TMHMM and SOSUI /note=Secondary Annotator Name: Patel, Sahaj /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS complement (60227 - 60550) /gene="76" /product="gp76" /function="membrane protein" /locus tag="GalacticEye_76" /note=Original Glimmer call @bp 60550 has strength 13.79; Genemark calls start at 60550 /note=SSC: 60550-60227 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp39 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 2.31414E-63 GAP: 149 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.024, -2.442961286954254, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein BH793_gp39 [Gordonia phage Woes] ],,YP_009273464,100.0,2.31414E-63 SIF-HHPRED: SIF-Syn: Function is membrane protein. The Pham number is 56091, downstream Pham number 20935, upstream Pham number 56259, just like in phages Neoevie and Nimi13. Upstream and downstream phams have no known function. /note=Primary Annotator Name: Ruiz, Paola /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 60550. /note=Coding Potential: For GeneMark Self-Trained, there is coding potential in both the forward and reverse strands but is more concentrated in the reverse strand. In GeneMark Host, the coding potential is mainly focused on the reverse strand. In both, the coding potential spans the length of the start and end sight. It accurately shows an upward hash around 60550 and with a downward hash at 60227 for both. Overall, the gene has good coding potential indicating that this is a reverse gene. /note=SD (Final) Score: The final score is -2.4436 and z score is 3.024. Both are the best scores on PECAAN. /note=Gap/overlap: There is a 149 bp gap which is large and is the smallest of all other gene candidates. There is no coding potential in the gap that may be a new gene. /note=Phamerator: The pham number is 556091 as of 4/6/2022. It is conserved; found in Harambe and Luker which are all in cluster CS3. /note=Starterator: Start site number is 3 which correlates to start site 60550 bp for GalacticEye. Start 3 is the most called start number and was called for in 13/14 of non-draft genes in the pham including GalacticEye. /note=Location call: Based on the above evidence, this is a real gene who most likely starts at 60550. Starterator agrees with Glimmer and Genemark. /note=Function call: Function is a membrane protein. The top 12 non-draft phagesdb BLAST hits have unknown function (e value 8e–57, 100% identity, 100% positives). For NCBI BLAST, the top 2 hits were hypothetical proteins with e values of 1e-19, had 100% identity and positives. CDD was not helpful in determining function. HHPRED’s second hit with e value 1.4 (although high) suggested membrane domain as the function. /note=Transmembrane domains: Both TMHMM and Topcons call for one TMD suggesting that this may be a membrane protein. /note=Secondary Annotator Name: Lin, Yuri /note=Secondary Annotator QC: I have QC`d and agree with the primary annotator. CDS complement (60700 - 61125) /gene="77" /product="gp77" /function="hypothetical protein" /locus tag="GalacticEye_77" /note=Original Glimmer call @bp 61125 has strength 13.24; Genemark calls start at 61125 /note=SSC: 61125-60700 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp38 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 8.78468E-97 GAP: 121 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.948, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp38 [Gordonia phage Woes] ],,YP_009273465,100.0,8.78468E-97 SIF-HHPRED: SIF-Syn: NKF. This gene has no known function and Pham 56259 is conserved in phages, Guillaume and Luker. Genome architecture is conserved in other phages such as Newt and Neoevie, the upstream gene is in Pham 104066 and the downstream gene is listed as a membrane protein (Pham 56901). /note=Primary Annotator Name: Juarez, Sabrina /note=Auto-annotation: Both Glimmer and GeneMark both call a start site at 61125. /note=Coding Potential: Good coding potential is found in both Self- and Host-Trained GeneMark in the reverse reading frame. The chosen start site does not seem to include all of the coding potential for the gene, missing a few base pairs of the initial coding region in both Self- or Host-Trained, but this is the longest ORF for this gene. /note=SD (Final) Score: The final score is the best option, -2.601. The z-score is also the best option, 2.948. This start has the best combination of scores and covers the most coding potential of possible start sites. /note=Gap/overlap: This gene has a 121 bp gap, which is the smallest possible gap for the given start sites. It is conserved in other phages, such as Guillaume and Nimi13. /note=Phamerator: The Pham number as of 04/01/2022 is 56259. It is conserved in other phages within subcluster CS3, such as Anamika and Guillaume. It is represented in clusters CS1,CS2, CS4 as well. /note=Starterator: Start site 11 in Staterator was manually annotated in 45/46 non-draft genes in this Pham. Start site 11 is 61125 in GalacticEye. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence, this is a real gene, and most likely starts at 61125. /note=Function call: No Known Function. The top two phagesDB BLAST hits have the function listed as unknown function in Gordonia phage Woes (E-value of 5e-75 and 100% identity) and Gordonia phage Teal (E-value of 5e-75 and 100% identity). The top five NCBI BLAST hits are listed as hypothetical proteins, the top results are Gordonia phages (97%+ coverage, 72%+ identity, and E-value of <2e-67). CDD and HHpred had no relevant hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Lin, Yuri /note=Secondary Annotator QC: Small typo in SD Score section, I think you meant to say Z-score in the second sentence. CDS complement (61247 - 61810) /gene="78" /product="gp78" /function="hypothetical protein" /locus tag="GalacticEye_78" /note=Original Glimmer call @bp 61810 has strength 11.12; Genemark calls start at 61810 /note=SSC: 61810-61247 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp37 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 5.12716E-135 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.948, -2.66371296573402, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp37 [Gordonia phage Woes] ],,YP_009273466,100.0,5.12716E-135 SIF-HHPRED: SIF-Syn: There is NKF for this gene. It belongs to pham 102488. Its homologous genes in Guillaume and Anamika are in the same pham. The upstream gene is in pham 102637, and the downstream gene is in pham 56259 which is also true of its homologous genes in Guillaume and Anamika. /note=Primary Annotator Name: Li, Shally /note=Auto-annotation: Both Glimmer and GeneMark agree on the start site 61810. /note=Coding Potential: The host-trained GeneMark has peaks of high coding potential throughout the gene in the first reading frame in the reverse direction. The self-trained GeneMark has high coding potential throughout the entire gene. The proposed start site encompasses all of the coding potential. There is no high coding potential in any other reading frame, suggesting that this is a reverse gene. /note=SD (Final) Score: -2.664. This is the best final score on PECAAN. /note=Gap/overlap: 3bp gap. This is the most probable gap in PECAAN. /note=Phamerator: This gene belongs to pham 102488 as of 4/5/22. There are 61 members of the pham, 46 of which are non-drafts. 36 of them belong to cluster CS. /note=Starterator: The most annotated start site is start number 6, it was called by 34/46 non-draft genes in the pham. In the 12 genes that it is not called, the start is not present, so start 6 is called 100% of the time that it is present. In GalacticEye, it corresponds to position 61810, which was the auto-annotated start site agreed upon by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with start site 61810. /note=Function call: NKF. Phagesdb and NCBI BLAST do not have any hits with a known function. HHPRED does not have any reasonable hits (e-values all >15). There are no hits in CDD. /note=Transmembrane domains: There are no transmembrane domains as listed by TmHmm and TopCons, indicating that this is not a membrane protein. /note=Secondary Annotator Name: Perez, Joshua /note=Secondary Annotator QC: I have QC`ed this gene and agree with the primary annotation. Here are a few suggestions: make sure to denote the Z score in coding potential. For the gap, make sure to input a few other phages it is conserved with and say it is conserved (or if not say that too). For location call, discuss the evidence from Starterator a bit more. Great job overall! CDS complement (61814 - 62053) /gene="79" /product="gp79" /function="hypothetical protein" /locus tag="GalacticEye_79" /note=Original Glimmer call @bp 62053 has strength 10.35; Genemark calls start at 62053 /note=SSC: 62053-61814 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp36 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.35907E-50 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.132, -4.376977241479107, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp36 [Gordonia phage Woes] ],,YP_009273467,100.0,1.35907E-50 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Patel, Rishi /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 62053. Both call the start codon as ATG. /note=Coding Potential: The coding potential for this open reading frame (ORF) is only on the reverse strand which shows that it is a reverse gene. In the Host-Trained GeneMark, the coding potential is apparent within the called start site (62053) and stop site (61814). The Self-Trained GeneMark also shows coding potential in the region. The typical coding potential as well as the atypical coding potential both fit within the range of the called start site and the stop site, which is good evidence for the start site that was called. It is also clear that the gene appears real from the analysis of the Self-Trained GeneMark and Host-Trained GeneMark. /note=SD (Final) Score: The final score is the best option of all the start sites listed (-4.377). Additionally, the Z score is also the best out of all of the called start sites (2.132). The scores for the called start site are really good and the best out of all called start sites, which is also good evidence for the start site. /note=Gap/overlap: The gap between the stop site and the downstream gene (Gene #74) appears to be conserved across phages in the cluster (CS3) at -5. The gap between the start site and the upstream gene (Gene #76) also appears to be conserved in all of the phages within the cluster. The called start site shows an overlap of 8 bp and, when looking at phages of the same cluster, they all show overlaps of 8 bps exactly. For example, phage Harambe has an 8 bp overlap with its upstream gene and there are equal gaps in phage Hello and Hail2Pitt. Thus, this is an indication that the start site is correct as it appears to be heavily conserved among phages of the same cluster. All of the phages that I looked at (Harambe, Hello, Hail2Pitt, Jormungandr) showed synteny with this gene #75 from GalacticEye, showing that this is likely a real gene. /note=Phamerator: Gene is found in pham number 102637 as of 4/1/2022. The gene is conserved when looking into phages Harambe, Hello, and Hail2Pitt, which are all in the same cluster (CS3) as GalacticEye, thus it appears conserved in Cluster CS3 (evidence for real gene). This pham also contains phages from cluster CS1 and CS2 along with CS3. Phamerator does not call a function. /note=Starterator: The called start site for my gene is Start site #4 (62053), called in 13/38 non-draft phages within the pham. Despite not being the most called start site within the pham, Start site #4 is heavily conserved among Cluster CS3 phages like GalacticEye. Moreover, it appears that 13/13 non-draft genes of CLuster CS3 call Start site #4 to be the start site. Thus, of the CS# phages within the pham, 100% of them call Start site #4. This is excellent evidence for Start site #4 (62053) to be the correct start site for this particular gene (gene #75) of Galactic Eye. /note=Location call: Considering the evidence above, this gene is most definitely a real gene. Additionally, the called start site appears to be correct (62053). The evidence points to start site 62053 based on a multitude of evidence. The strongest evidence being that Starterator calls 62053 (Start site #4) for all genes for Cluster CS3 – 13/13 non-draft genes. Even though it is not the most called start site in the pham, none of the Cluster CS3 phages contain the most called start site, but Start site #4 (62053) appears conserved throughout the entirety of the cluster. Additionally, the Final score and Z score of the called start site is the best out of all of the called start sites. Moreover, the coding potential lies completely within the stop site and called start site. Finally, the gaps between the upstream and downstream genes for gene #75 are heavily conserved, which is also great evidence for the start site. Ultimately, I believe that the called start site (62053) is the correct start site for this gene. /note=Function call: Neither NCBI GeneBank Blast nor PhagesDB Blast garnered any evidence regarding a function call for the gene. Out of all of the results listed, none of them showed a function beyond “no known function” or “hypothetical protein.” Additionally, when looking at HHpred, there were no significant hits listed. Finally, when looking at CDD, this program also did not have any significant hits listed. Additionally, neither TMHMM nor TOPCONS calls a membrane protein, so this is further evidence for no known function. Thus, it is safe to say, at the moment, that gene #75 of GalacticEye has no known function (NKF). /note=Transmembrane domains: No transmembrane domains (TMDs) were predicted by either TMHMM or TOPCONS, thus, it appears that my gene is not considered a “membrane protein.” At this point, this is in agreement with previous function calls for my gene as there were no sufficient calls for the function of the gene. As of now, this gene has no known function. /note=Secondary Annotator Name: Shah, Aayushi /note=Secondary Annotator QC: I agree with this annotation and functional call. All of the evidence categories have been considered, including coding potential, starterator, gap/overlap, synteny, and final score, and an understandable NKF call. CDS complement (62046 - 62294) /gene="80" /product="gp80" /function="hypothetical protein" /locus tag="GalacticEye_80" /note=Original Glimmer call @bp 62294 has strength 13.95; Genemark calls start at 62294 /note=SSC: 62294-62046 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp35 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.29248E-49 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.603, -3.9797067239234645, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp35 [Gordonia phage Woes] ],,YP_009273468,98.7805,1.29248E-49 SIF-HHPRED: SIF-Syn: NKF, downstream gene is in Pham 102637 (possible function being NKF), upstream gene is in Pham 24824 (NKF), just like phages Woes and Harambe. /note=Primary Annotator Name: Patel, Sahaj /note=Auto-annotation: Gene Mark and Glimmer called the start site to be at 62294 and the stop site to be at 62046. Additionally, it is a gene that runs in the reverse direction. /note=Coding Potential: Yes. Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. High coding potential is found in the GeneMark Self and GeneMark Host. Therefore, there will be coding potential in this gene since the GeneMark Self and GeneMark Host supports the gene being active. /note=SD (Final) Score: -3.980, which was the second highest Final Score. The Z score is 2.603, which is the second highest Z score. This supports the existence of the gene, as well as it having the correct start and stop sites. /note=Gap/overlap: The gap value is -4, which signifies slight overlap and the possibility of the gene being part of an operon. This overlap is conserved in other phages (compared to Hello and Harmabe, amongst others), so it, therefore, makes sense. There are no gaps present, and there is no atypical activity present. The gap is present in other complete viral genomes (such as Hello and Harambe). /note=Phamerator: 97983. Date 2/04/2022. It is conserved, found in Hail2Pitt (CS), Harambe (CS), and Lahirium (CS). /note=Starterator: Start site 1 in Starterator was manually annotated in 37 /37 non-draft genes in this Pham. Start 1 is 62294 in Powerpuff. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 62294. /note=Function call: The function of this protein cannot be determined for numerous reasons. Phages DB Blast states that this protein has no function, and this is reflected by the hits this database presented. They were found in the phages Teal and Woes, and the hits had high scores and low e-values (compared to the rest of the data presented). Additionally, NCBI BLAST reflected the same findings as to the Phages DB Blast, with the hits coinciding with two phages Nimi13 and Woes, and also calling that the protein has no function. No data was gathered from CDD. Lastly, HHPred called 2 possible functions to the protein (RNA polymerase-associated protein), but the e-values were extremely high (13 and 18) and the probability was below 90% (69.94% and 69.68% respectively). This signifies that this protein indeed has no function. /note=Transmembrane domains: ThHmm and Topcons predict there to be no transmembrane proteins present (predicted to be 0). /note=Secondary Annotator Name: Villarreal, Alexia /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS complement (62291 - 62581) /gene="81" /product="gp81" /function="hypothetical protein" /locus tag="GalacticEye_81" /note=Original Glimmer call @bp 62581 has strength 5.51; Genemark calls start at 62581 /note=SSC: 62581-62291 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp34 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.48258E-64 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.373, -4.6382689799084815, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp34 [Gordonia phage Woes] ],,YP_009273469,100.0,1.48258E-64 SIF-HHPRED: SIF-Syn: This gene shows synteny with other published phages, including Newt, Harambe, and Hello. The gene upstream is a CobT-like cobalamin biosynthesis protein, pham 103389, and the gene downstream is NKF, pham 97983. Both the upstream and downstream genes show synteny in other published phages like Newt and Luker. /note=Primary Annotator Name: Lin, Yuri /note=Auto-annotation: Glimmer and GeneMark agree on a start site of 62581 bp (start codon ATG). The autoannotated site results in the LORF and a reasonable gene length of 291 bp. /note=Coding Potential: There is good coding potential within the ORF and all coding potential is contained within the autoannotated start site. /note=SD (Final) Score: The start site at 62581 bp has the least negative Final Score of -4.638 and a Z-score of 2.373, both of which are the best out of the available scores. This supports the autoannotated start site. /note=Gap/overlap: The start site at 62581 has an overlap of 4 bp, which indicates it may be part of an operon. /note=Phamerator: As of 4/7/22, the pham number is 24824. The gene is conserved in 122 other phages, of which 26 are drafts. Other phages that have this gene with the same pham number are in many different clusters, including CS like GalacticEye, A, M, DV, DQ, etc. /note=Starterator: Start site 10 in Starterator was manually annotated in 40/96 non-draft genes in this pham and is called 91.4% of the time when present. Start site 10 is at position 62581 bp on GalacticEye. This is strong support for the autoannotated start site. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 62581. /note=Function call: NKF - All of the top PhagesDB BLAST hits were function unknown, but one hit further down with phage Neoevie_79 was a CobT-like cobalamin biosynthesis protein (identity 98%, e-value = 2e-53). NCBI BLAST also had hypothetical protein for all hits except the third, which was with Neoevie_79 and CobT-like cobalamin biosynthesis protein again (coverage 100%, e-value < 2e-63, percent identity > 98.96%). CDD returned no results, and HHpred only returned hits for DUFs with high e-values. Because there was only one hit in the BLASTs that called this gene a CobT-like cobalamin biosynthesis protein (with everything else providing no information on function), and because the downstream gene has already been called a CobT-like cobalamin biosynthesis protein with far stronger evidence than this gene, I think it would be best say that this gene has no known function right now. /note=Transmembrane domains: Neither TmHmm nor TOPCONS predicted any TMHs. /note=Secondary Annotator Name: Araque, Colette /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: I`m guessing you left the synteny box empty because functions of upstream and downstream genes haven`t been called yet for other phages. I would be careful calling a definite function for this gene as there are so many hits with unknown functions that have strong E-values not too far from the selected hit. May want to add whether the chosen start site results in the LORF and/or a reasonable gene length. [Edit: All concerns addressed by primary annotator] CDS complement (62578 - 64494) /gene="82" /product="gp82" /function="CobT-like cobalamin biosynthesis protein" /locus tag="GalacticEye_82" /note=Original Glimmer call @bp 64494 has strength 14.36; Genemark calls start at 64494 /note=SSC: 64494-62578 CP: yes SCS: both ST: SS BLAST-Start: [CobT-like cobalamin biosynthesis protein [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.516, -3.649482460397077, no F: CobT-like cobalamin biosynthesis protein SIF-BLAST: ,,[CobT-like cobalamin biosynthesis protein [Gordonia phage Woes] ],,YP_009273470,99.8433,0.0 SIF-HHPRED: CobT ; Cobalamin biosynthesis protein CobT,,,PF06213.15,41.3793,99.8 SIF-Syn: CobT-like cobalamin biosynthesis, upstream downstream AAA-ATPase, downstream NKT with Pham 24824. Similar phages: Anamika, Lahirium /note=Primary Annotator Name: Perez, Joshua /note=Auto-annotation: Gene was called by both Glimmer and Genemark and start at the same site at 64494 and calls GTG. /note=Coding Potential: Yes, this gene has decent coding potential predicted within the third RF reverse. This chosen start site does not cover all, but most, ofl this coding potential in both the Host and Self GeneMark. /note=SD (Final) Score: -3.649, the least negative score. The Z score present for this start is 2.516, which is close to 2. /note=Gap/overlap: Glimmer start has a gap of -4 bp and seems to be conserved in the Pham Maps. This negative gap is most likely due to being part of an operon/gene family and is overlapped. This gap seems to be conserved in other phages, with a few being Anamika, Neovie, and Guillaume. /note=Phamerator: Pham 102121 4/05/22 is present in 62 members of the cluster CS. I used Austin, Diabla, and 5 others from cluster CS. 44 of the similar members called a CobT-like cobalamin biosynthesis protein as their function. /note=Starterator: This is a reasonable conserved start site. The start site number is 13, and the coordinate base pair number is 64494. It has 13 manual draft annotations, is called 100% of the time when present, and has 62 members in its pham. /note=Location call: The overall evidence shows that this gene is most likely real at start 13 and bp number 64494 due to decent coding potential in its third RF reverse and evidence from Starterator. Glimmer start site of 64494 is the most likely start site, as it has the highest ORF and lowest gap bp. This is most likely a real gene as it has coding potential. /note=Function call: The top 5 NCBI hits sorted by e-value had low e-values (0), and a high % identity (around 99%). For example, phage Woes has a 99.69% identity and an e-value of 0. The gene is said to be a CobT-like cobalamin biosynthesis protein, which must be confirmed with other programs. The first hits were cobalamin biosynthesis proteins. There is no data available from CDD. The HHpred has very high e-values and does not have significant results. Overall, the high percent identities and lack of evidence toward this gene’s function classify it as a CobT-like cobalamin biosynthesis protein. /note=Transmembrane domains: No predicted TMDs by TOPCON or TMHMM, which makes sense if the gene is a CobT-like cobalamin biosynthesis protein as their function, as this does not deal with the membrane as it is a vitamin. /note=Secondary Annotator: Bharadwaj, Shreya /note=Secondary Annotator QC: I have QC`ed this gene and agree with the primary annotator`s function and location calls. Don`t forget to do the "All GM Coding Potential" drop-down menu. For the final score, you want the least negative one and for the z-score you want one that`s closest to 2 so I would maybe edit that section. I would mention that a negative gap is an overlap. I would add a note on what your HHPRED results say. Don`t forget to include pham numbers of upstream/downstream genes in your synteny box. CDS complement (64491 - 66050) /gene="83" /product="gp83" /function="AAA-ATPase" /locus tag="GalacticEye_83" /note=Original Glimmer call @bp 66050 has strength 14.64; Genemark calls start at 66050 /note=SSC: 66050-64491 CP: yes SCS: both ST: SS BLAST-Start: [AAA-ATPase [Gordonia phage Luker]],,NCBI, q1:s1 100.0% 0.0 GAP: 80 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.257, -2.86135216970947, yes F: AAA-ATPase SIF-BLAST: ,,[AAA-ATPase [Gordonia phage Luker]],,QDH48329,99.8073,0.0 SIF-HHPRED: CbbQ/NirQ/NorQ domain protein; ATPase, AAA+ domain protein, carboxysome-associated, PROTEIN BINDING; HET: ADP; 2.8A {Halothiobacillus neapolitanus (strain ATCC 23641 / c2)},,,5C3C_A,52.7938,99.8 SIF-Syn: AAA-ATPase, downstream gene is a CobT-like cobalamin biosynthesis protein in pham 103389 and upstream gene is NKF in pham 13627, just like in phage Guillaume. /note=Primary Annotator Name: Shah, Aayushi /note=Auto-annotation: Called in Glimmer and GeneMark, both at start site 66050. /note=Coding Potential: There is reasonable coding potential between the putative ORF, and the chosen start site covers all this coding potential. /note=SD (Final) Score: -2.861, the best final score on PECAAN, and also has the best z-score on PECAAN of 3.257 /note=Gap/overlap: 80 BP, second lowest possible gap on PECAAN. Large but reasonable as this is preserved in synteny and no significant coding potential is seen in that region. /note=Phamerator: The pham as of 04/02/22 is 20351. A lot of other phages in this cluster have this pham present, as seen in Adgers_92, Beaver_92, and Butterball_90. The function called in phamerator is an AAA-ATPase. /note=Starterator: Start site 28 is conserved across many manually annotated genomes in the pham, and represents the start site at bp 66050. 37/52 non-draft genes call this start site, which is strong evidence. /note=Location call: Based on evidence the gene is real and has a start site of 66050, which covers all coding potential and has the most reasonable gap and final score, as well as evidence from phamerator as conserved within a pham. /note=Function call: AAA-ATPase. The top 5 non-draft PhagesDB BLAST hits call the function as a AAA-ATPase with e values of 0 for all of them, constituting very strong evidence. In the NCBI BLAST, the top 5 calls sorted by e value called it as a AAA-ATPase, with e values of 0 and 100% coverage for all, and over 99% identity, also strong evidence. The best CDD hits all called the function as an ATPase of some kind, with an e-value of 1.10436e-13, around 39% coverage, which is not strong evidence but supports the function all still. HHpred has a hit with an e value of 2.1e-16, 99.8% probability, and around 52% coverage that call it as an ATPase, AAA+ domain protein. There are no TMDs, which makes sense with this given function. Thus, there is strong evidence for this function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, so it is not a membrane protein. /note=Secondary Annotator Name: Likwong, Chloe /note=Secondary Annotator QC: I agree with the primary annotator. Both Glimmer and GeneMark call Start@66050, and it has the most positive Z-score and Final Score out of the start site candidates. Furthermore, Starterator lists that Start@66050 has 37 MA`s, while other start site candidates do not. Function is AAA-ATPase based on the findings of PhagesDB, NCBI BLASTP hits, CDD Hits, and HHPred. CDS complement (66131 - 66772) /gene="84" /product="gp84" /function="hypothetical protein" /locus tag="GalacticEye_84" /note=Original Glimmer call @bp 66757 has strength 11.16; Genemark calls start at 66772 /note=SSC: 66772-66131 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein BH793_gp31 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 3.86068E-155 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.603, -3.4076099559729456, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp31 [Gordonia phage Woes] ],,YP_009273472,100.0,3.86068E-155 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Villarreal, Alexia /note=Auto-annotation:Glimmer Start site is at 66757 while GeneMark Start is at 66772. The Start codon for Glimmer Start site is GTG while GeneMark’s is ATG, both holding equally probable chances of being legitimate start site. With consideration of final scores, z-values and starterator data, it can be seen that GeneMark’s start at site 66772 is most likely and probable as compared to the site of 66757. /note=Coding Potential:Coding Potential of this ORF is only on the Reverse strand, which indicates it is a reverese gene. There is strong Coding Potential indicated by GeneMark Self and strong Coding Potential from GeneMark Host inclusive of both start sites. Lots of synteny conservation observed of gene within other phages indicate high coding potential within this gene. /note=SD (Final) Score: The final score for this gene is -3.408, which is one of the better values listed, indicating that this start site is good and very probable. /note=Gap/overlap: Small overlap of 4 found with this gene, however remains in good standing and is preferred as it is the smallest overlap available from other possible start sites, as well as remains in good standing as others have large gaps; this start site does not contain the longest ORF however other evidence indicates this start site has higher potential than that listed as the LORF. /note=Phamerator: pham:13627. Date 04/07/2022. It is conserved in other phages within the same cluster CS such as Anamika_82, Adgers_93, and Newt_84. /note=Starterator: Start site 12 in Starterator was manually annotated in 45/46 non-draft genes in this pham. Start 12 is at 66772 in GalacticEye. This Start site is the the most annotated start site and goes to support the likelihood of this being the start site for the gene. /note=Location call: Confirm this gene’s start site at 66772. Would choose to keep this state site for the data suggested by guidelines regarding coding potential, high synteny with other phage data, most reasonable (smallest) overlap of 4 recorded between all other start sites, the start codon being ATG is highly probable as it is one of the more common initiation codons, as well as a good Final score of -3.408 and an excellent z-score of 2.603. Based on the above evidence, this is a real gene and the most likely start site is 66772, which differs from the assumed start at 66757. /note=Function call: According to the data obtained by PhagesDB Blastp and NCBI Blastp, as well as HHpred, suggested function of the gene is a hypothetical protein as there is evidence with high query coverage (100%), high percentage of identity conservation (100%-99.53%) as well as extremely low E-values such as 4e-155, 1e-154, and 3e-154. /note=Transmembrane domains:TMHMM or TOPCONS both do not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Pramana, Martin /note=Secondary Annotator QC: I have QC’ed this location call and do not agree with the first annotator. In the Coding potential section, there is a coding potential at the fourth frame of Genemark self and host. This indicates that this gene is a reverse strand. For start site 66772, the SD score of -3.408 is the third-best SD score. For the gap/overlap, there is a minor typo that indicates an overlap of 4 bp. This overlap is conserved in other phages such as Harambe. In the starterator section, mention other non-draft phages that have start site 12, such as Anamika which has a start site 12 of 67089 bp. The function should be NKF instead of hypothetical protein. Separate the searches from HHpred from PhagesDB BLASTp and NCBI BLASTp. Mention that there are no good HHpred hits due to the high e-values. Mention that there are no CDD hits available for this protein. Please fill out the synteny box. CDS complement (66769 - 67119) /gene="85" /product="gp85" /function="hypothetical protein" /locus tag="GalacticEye_85" /note=Original Glimmer call @bp 67122 has strength 8.15; Genemark calls start at 67119 /note=SSC: 67119-66769 CP: no SCS: both-gm ST: NI BLAST-Start: [hypothetical protein BH793_gp30 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 4.60148E-80 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.257, -2.4811409279978642, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp30 [Gordonia phage Woes] ],,YP_009273473,100.0,4.60148E-80 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Araque, Colette /note=Auto-annotation: Glimmer called the start site of 67122 which has a corresponding start codon of ATG. GeneMark called the start site of 67119 which has a corresponding start codon of ATG. /note=Coding Potential: The gene has coding potential predicted within the putative ORF according to both Genemark Self and Host. Both potential start sites cover all the corresponding coding potential. /note=SD (Final) Score: For the 67122 start site, there is a SD score of -3.259 and a Z score of 3.257. -3.259 is strong and is the second best SD score as it is less negative compared to the other scores. 3.257 is very strong since it’s above 2 and is tied for the highest compared to the other potential start sites. For the 67119 start site, there is a SD score of -2.481 and a Z score of 3.257. -2.481 is a stronger score than that of the 67122 start site as it is less negative. 3.257 is very strong since it’s above 2 and is tied for the highest with that of the 67122 start site. /note=Gap/overlap: For the 67122 start site, there is a -4bp overlap which indicates an operon. and is thus highly ideal. For the 67122 start site, there is a -1bp overlap which is below 4bp and is thus also highly ideal. Both these overlaps are the smallest compared to the other start sites. There are no other good alternative start sites besides the two auto-annotated. The length of the gene with the start site of 67122 is 354bp long which is acceptable as it is well over the 120bp minimum and is the LORF. The length of the gene with the start site of 67119 is 351bp long which is acceptable as it is well over the 120bp minimum, but it is not the LORF. /note=Phamerator: As of 4/5/2022 this gene is found in Pham 101211. There are 49 total members in this pham (39 non-drafts and 10 drafts). This pham is present in other members of the same cluster (CS) which my phage belongs to. Some phages used for comparison are Beaver, Butterball, and Newt. The Phams database did not have a function called for this gene of the GalacticEye phage nor did it have a function called for any other genes within this pham. /note=Starterator: The“Most Annotated” start site is not found in this gene. According to the Starterator data, the start site @67119 has 12 manual annotations which is the most out of any of the other start sites. This evidence supports the start site predicted by GeneMark. /note=Location call: Gene 81 of the GalacticEye phage appears to be a real gene due to it being conserved in phamerator and it having considerable coding potential. The 67119 start site seems the most likely as it has strong SD and Z scores, has the smallest gap/overlap, encompasses all the coding potential, and has 12 MA’s. /note=Function call: Gene 81 most likely encodes for a protein of an unknown function. There are multiple phagesDB BLAST hits that suggest function unknown and those hits have a strong e value of 5e-65. There is no NCBI BLAST data available. /note=Transmembrane domains: TMHMM and TOPCONS did not predict any TMDs meaning that this is not a transmembrane protein. /note=Secondary Annotator Name: Pay, Iona /note=Secondary Annotator QC: Excellent annotation Colette! Very thorough and well reasoned. CDS complement (67119 - 67556) /gene="86" /product="gp86" /function="hypothetical protein" /locus tag="GalacticEye_86" /note=Original Glimmer call @bp 67430 has strength 8.5; Genemark calls start at 67556 /note=SSC: 67556-67119 CP: yes SCS: both-gm ST: SS BLAST-Start: [hypothetical protein BH793_gp29 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.61618E-102 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.468, -4.660078681078926, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp29 [Gordonia phage Woes] ],,YP_009273474,100.0,1.61618E-102 SIF-HHPRED: SIF-Syn: NKF. The downstream gene is in pham 66769 and has NKF similar to Luker. The upstream gene is in pham 55296 and has NKF similar to Luker. /note=Primary Annotator Name: Bharadwaj, Shreya /note=Auto-annotation: Glimmer called a start site of 67430 while GeneMark called a start site of 67556. /note=Coding Potential: The coding potential of this ORF is on the reverse strand, which means that this is a reverse gene. I found coding potential for this gene in both GeneMark Self and Host. /note=SD (Final) Score: Start site 67556 has a final score of -4.660. This is the second best final score according to PECAAN. /note=Gap/overlap: Start site 67556 has a gap/overlap of -1.This is indicative of an overlap of 1 bp. This is well within the guidelines of -7 to +7 which is fairly good. /note=Phamerator: 55590 was the pham of the gene as of 4/7/22. It has also been found in phages Bianmat_84 and Luker_86..There are 46 non-draft genomes in this pham. Start site 6 was the most called 100% of the times when present. /note=Starterator: Start site 6 was manually annotated 36 times in non-draft genomes. The start site 6 corresponds to start coordinate 67556 in GalacticEye which agreed with the GeneMark auto-annotation but not the Glimmer auto-annotation. /note=Location call: Based on the above evidence, this is a real gene with a likely start site of 67556. Glimmer did not call this start site but GeneMark did call this start site. Even though these two programs do not agree, there is strong evidence for the 56787 start site as it has a small, well-conserved overlap, 36 manual annotations, as well a strong final score of -4.660. /note=Function call: NKF. All of the PhagesDB Blast hits indicate with a high e-value (1e-80) that the function is unknown for this gene. HHPRED hits indicated that it could be an infectivity protein but the e-value is very high (14) and the %coverage was very low (38.6%). NCBI BLAST hits indicate a “hypothetical protein” with a low e-value of >1.6e-102. CDD had one hit for a histone deacetylase complex but it had a very high e-value of 0.00141. /note=Transmembrane domains: TMHMM and TOPCONS do predict any TMDs, so this is not a membrane protein. /note=Secondary Annotator Name: Cho, Emily /note=Secondary Annotator QC: I have QCed this location call and agree with the primary annotator. NOTE: you cannot use “suggested start” for starterator dropdown menu when the start site is not the most conserved start site, so clarify if the start site was the most conserved start site. Do not check draft genomes as evidence for BLAST hits. Could have more notes for Phamerator. Good explanation of upstream overlap. CDS complement (67556 - 68554) /gene="87" /product="gp87" /function="hypothetical protein" /locus tag="GalacticEye_87" /note=Original Glimmer call @bp 68554 has strength 15.35; Genemark calls start at 68557 /note=SSC: 68554-67556 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_NEWT_87 [Gordonia phage Newt]],,NCBI, q1:s1 100.0% 0.0 GAP: 24 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.257, -2.0949393225970705, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_NEWT_87 [Gordonia phage Newt]],,QDH48730,100.0,0.0 SIF-HHPRED: SIF-Syn: NKF, downstream gene is NKF in pham 55590, upstream is NKF in pham 98082, just like in phage Luker and Newt /note=Primary Annotator Name: Likwong, Chloe /note=Auto-annotation: Glimmer and GeneMark did not predict the same start site. Glimmer predicted Start@68554 and GeneMark predicted Start@68557. /note=Coding Potential: Via GeneMark, Start@68554 and Start@68557 cover the coding potential found in both the host and self maps, and there is no violation of the guiding principles. /note=SD (Final) Score: For Start@68554, it has a Final Score of -2.095 and a Z-score of 3.257 for. For Start@68557 has a Final Score of -2.034 and a Z-score of 3.257. /note=Gap/overlap: For Start@68554, there is a 24bp gap with the upstream gene. For Start@68557, there is a 21bp gap with the upstream gene. The length of 12 or 9bp is also small–a functional gene must at least be 120bps long. Because both starts have the same Z-score, Start@68557 has a slightly more positive Final Score. /note=Phamerator: This gene is in pham 55296 as of the date 3/31/2022. There are 60 members, 12 of which are drafts. GalacticEye, with the members in this pham, is part of Cluster CS with members like Teal and Woes. GalacticEye is found in cluster CS with members like Diabla and Beaver. Majority of the Final Draft genes list no known function. /note=Starterator: In pham 55296, GalacticEye does not have the “Most Annotated” start. It lists Start 3 @68554 with 18 MA’s. /note=Location call: The location call is at Start @68554. The gene seems to be a real gene given that Start@68554 covers the coding potential, has a gap of 24bp, and has 13 MA’s done compared to the other potential Start sites. It is important to note that the Final Score of Start@56995 is not the most positive. /note=Function call: The function is unknown. In PhagesDB, there are multiple hits with strong e-values of 0 that list no function. For NCBI BLAST hits, there are also strong e-values of 0 that list hypothetical protein as the function; the top three hits in NCBI BLASTP list a value >99% for %coverage, %identity, and %alignment. /note=Transmembrane domains: No transmembrane domains detected by TMHMM or TOPCONS, indicating that the gene is not a membrane protein. /note=Secondary Annotator Name: Enos, Alexander /note=Secondary Annotator QC: I have QCed this location call and agree with the primary annotator. CDS complement (68579 - 68710) /gene="88" /product="gp88" /function="hypothetical protein" /locus tag="GalacticEye_88" /note=Original Glimmer call @bp 68710 has strength 2.62 /note=SSC: 68710-68579 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein BH793_gp27 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 2.56599E-21 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.633, -5.362493594322671, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp27 [Gordonia phage Woes] ],,YP_009273476,100.0,2.56599E-21 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF, downstream is NKF, similarly found in phage Jormungandr. /note=Primary Annotator Name: Pramana, Martin /note=Auto-annotation: Glimmer calls for the same start site of 68710. GeneMark (Self and Host) did not call for a start site. /note=Coding Potential: There is a reasonable coding potential in the ORF with a reverse direction (located at the third frame). This coding potential is found in both GeneMark Self, but not in GeneMark host. /note=SD (Final) Score: -5.362. It is the second best Final Score on PECAAN. This start site has the best Z score of 2.095 out of all the possible start sites. /note=Gap/overlap: There is an upstream overlap of - 4 bp. This overlap of 4 bp indicates that the gene contains an operon. This overlap is reasonable because it is conserved in other phages such as Guillaume and Jormungandr. This start site is not the longest ORF. /note=Phamerator: As of 4/3/2022 the gene is found in Pham 98082. It is conserved in cluster CS3 (Anamika and Harambe) but also found in other clusters such as CS2 (Boneham and Breezic). /note=Starterator: Start site 14 is conserved in 24/32 non-draft members. Start site 14 has a position of 69030 bp in Harambe. Start site 14 has a position of 68710 bp in GalacticEye. The start site agrees with both Glimmer. /note=Location call: Based on the evidence above, this is a real gene and the start site is most likely 68710 bp. Starterator agrees with both Glimmer. /note=Function call: NKF. Both NCBI and PhageDB BLAST’s top 2 results state that this gene doesn`t have a known function. The protein matches with phages Anamika, Guillaume, and Harambe have high query coverage of 100%, high % identities (100%), and low E-values (<2.565e-21). CDD shows no matches/ hits for this gene. While HHpred has no good hits in the databank. The first match from HHpred, has a high E-value (0.024), a high probability (94%), and high coverage (83.7209%). Therefore, there are no relevant hits from CDD or HHpred. /note=Transmembrane domains: Both TMHMM and TOPCONS did not predict any TMDs, so it is not a transmembrane protein. /note=Secondary Annotator Name: Maraziti, Gabriela /note=Secondary Annotator QC: I have QC`ed this function call and agree with the start site reflected in PECAAN. Notes: please comment on the Z-score for the given start site. The negative ‘gap’ indicates that this is an overlap. Additionally, an overlap of 4bp indicates that the gene is part of an operon which is highly favorable. Please also check the start site in your ‘Location Call’ section; it is different from the rest of the start sites mentioned. CDS complement (68707 - 68871) /gene="89" /product="gp89" /function="hypothetical protein" /locus tag="GalacticEye_89" /note=Original Glimmer call @bp 68871 has strength 15.32; Genemark calls start at 68871 /note=SSC: 68871-68707 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein BH793_gp26 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 9.2847E-29 GAP: 201 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.846, -2.958041784599676, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp26 [Gordonia phage Woes] ],,YP_009273477,90.0,9.2847E-29 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Pay, Iona /note=Auto-annotation: Glimmer and GeneMark agree that this gene starts at 68871 using ATG. /note=Coding Potential: Self-trained GeneMark shows significant coding potential in the third frame, as does host-trained GeneMark. This gene is also a reasonable length at 165 bp. /note=SD (Final) Score: Start 68871 has the better Z score (2.846) and final score (-2.958) of the two starts available, though their Z scores don’t differ significantly. /note=Gap/overlap: This gap is reasonable at 201 bp, which appears similar in other CS3 phages in which the gene is conserved. /note=Phamerator: This gene is conserved in other CS3 phages including Luker and Lahirium. /note=Starterator: Interestingly, this gene only seems to display one start! This would be Start 1 (68871), which has been manually annotated 13 times, and called 100% of the time when called. /note=Location call: Given the strong Starterator evidence, coding potential and conservation, it is safe to say that this is a real gene. /note=Function call: This Pham has very little information available in PhagesDB, returning 21 NKF protein hits. NCBI BLASTp returns only one result, a hypothetical Gordonia phage protein. CDD does not return any results, and HHPred only returns hits with e-values that are 20 or above, indicating function is not determined. /note=Transmembrane domains: TmHmm and TopCons do not predict any transmembrane domains. /note=Secondary Annotator Nme: Nelson, Shiloh /note=Secondary Annotator QC: great start — please add specifics to your function call — specific data values — also, don’t forget to hit the down arrow and make a selection on the GM Coding Capacity & fill in your synteny box CDS complement (69073 - 70017) /gene="90" /product="gp90" /function="hypothetical protein" /locus tag="GalacticEye_90" /note=Original Glimmer call @bp 70017 has strength 18.32; Genemark calls start at 70017 /note=SSC: 70017-69073 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_NEOEVIE_88 [Gordonia phage Neoevie] ],,NCBI, q1:s1 100.0% 0.0 GAP: 206 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.948, -2.9525085049809894, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_NEOEVIE_88 [Gordonia phage Neoevie] ],,QAX95449,100.0,0.0 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF pham 42570, 3/31/22), downstream is NKF (pham 10767, 3/31/22), just like in phage Anamika and Neoevie. /note=Primary Annotator Name: Cho, Emily /note=Auto-annotation: Both Glimmer and GeneMark call 70,017 as the start site. /note=Coding Potential: The host-trained GeneMark shows reasonable coding potential predicted within the putative ORF, and the chosen start site 70,017 covers all this coding potential. /note=SD (Final) Score: -2.953, The second highest (best) value among 18 start site candidates. /note=Gap/overlap: This candidate has the smallest gap (206) possible, while the candidate with the best final score has a gap of 290. Although there is a large gap upstream, the gap seems to be conserved and shows synteny with other genomes like Harambe and Neoevie. The length of the gene (945 bp) is also reasonable (>120bp). /note=Phamerator: Found in Pham 15232 as of 31th March, 2022. The pham is in other members of the cluster CS only and one singleton. There were no functions called for any of the pham members. /note=Starterator: The start number called the most often in the published annotations is 7, it was called in 44 of the 46 non-draft genes in the pham. Starterator was informative.The suggested start site was start number 7 (70017) was the most called site. /note=Location call: The gene highly seems to be real based on the starterator (which showed the most conserved and manually annotated start site) and phamerator with a start site of 70,017 and good coding potential, all covered by the called start site. The gene also shows synteny with many other annotated genomes like Anamika, Harambe, and Neoevie. It is also the common ATG start site with the second best final score and Z score of 2.948, which is close to 2. The start site also yields the longest open reading frame. /note=Function call: The top 10 PhagesDb BLASTp hits, sorted by E-value, suggested no known function, with a score of 615 to 616, 99-100 % identity, and low E-value of e-176. The first hit that called a function of major capsid hexamer protein had an e-value of 0.024 and had low % identity (20%). The Phagesdb function frequency of calling holin is also only 9% and also in different clusters, so it seems irrelevant. The top 4 NCBI BLASTp hits, sorted by E-value, suggested NKF (hypothetical protein) also, with 85-92% query coverage, high % identity (>82.90%), and low E-values ranging from 5e-143 to 0. There were no NCBI Conserved Domain Database hits. There was one HHpred hit corresponding to Calpain small subunit, but it had low probability of 32.1% and had e-value much greater than 10e-3 (640) and very low % coverage (23.2484 %), so there were no meaningful hits. Therefore, the gene seems to have NKF. /note=Transmembrane domains: No TMHs reported from TmHmm. No TMDs reported from Topcon. This protein with no known function does not seem to have a transmembrane domain. /note=Secondary Annotator Name: Ruiz, Paola /note=Secondary Annotator QC: I have QC’ed this location call and agree with first annotator. List z score. CDS complement (70224 - 70382) /gene="91" /product="gp91" /function="hypothetical protein" /locus tag="GalacticEye_91" /note=Original Glimmer call @bp 70382 has strength 5.1 /note=SSC: 70382-70224 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein BH793_gp24 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 5.215E-28 GAP: 113 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.546, -3.5077455217481304, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp24 [Gordonia phage Woes] ],,YP_009273479,100.0,5.215E-28 SIF-HHPRED: SIF-Syn: The downstream gene is NKF (pham 55997), upstream gene is NKF (pham 15232). This is similar to other cluster CS phages including Luker, Hello, Anamika, and Newt. /note=Primary Annotator Name: Enos, Alex /note=Auto-annotation: Glimmer calls start site 70382 but GeneMark does not call a start site. /note=Coding Potential: Coding potential in this ORF is on the reverse strand only, indicating that this is a reverse gene. Coding potential is found both in GeneMark Self and Host. The chosen start site covers all coding potential. /note=SD (Final) Score: The Final score of -3.508 was the best possible start site. /note=Gap/overlap: The gap is 113 which is large, but is the smallest of the options and is conserved in other phages such as Anamika and Hello. /note=Phamerator: Pham 10767 as of 4/7/22. It is conserved and found in Anamika (CS) and Hello (CS). /note=Starterator: Start site 22 in Starterator was manually annotated in 34/38 non-draft genes in this pham, and was the start site called by starterator for this gene. Start site 22 corresponds to start site of 70382 which is consistent with previous evidence. /note=Location call: Based on the consistent evidence from Glimmer, GeneMark, and Starterator, this is a real gene and the most likely start site is 70382. /note=Function call: Unknown Function. All hits from phagesDB BLAST called unknown function with small E-values <2^-23. All NCBI BLAST hits also call "hypothetical protein" (100% coverage, 90%+ identity, and E-value <5^-28). HHpred called a protein but had very low probability and coverage and an E value of 16. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Juarez, Sabrina /note=Secondary Annotator QC: QC complete, I agree with the annotation and location call with the primary annotator. . CDS complement (70496 - 70816) /gene="92" /product="gp92" /function="hypothetical protein" /locus tag="GalacticEye_92" /note=Original Glimmer call @bp 70816 has strength 10.72; Genemark calls start at 70804 /note=SSC: 70816-70496 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein BH793_gp23 [Gordonia phage Woes] ],,NCBI, q1:s1 100.0% 1.7885E-70 GAP: 102 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.314, -3.934452679007171, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein BH793_gp23 [Gordonia phage Woes] ],,YP_009273480,100.0,1.7885E-70 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Maraziti, Gabriela /note=Auto-annotation: Glimmer calls the start at 70816 and GeneMark calls the start at 70804. /note=Coding Potential: The gene has reasonable coding potential within the ORF for both the self and host models, and both starts include almost all typical and atypical coding potential. /note=SD (Final) Score: For start 70816, the final score is -3.934; this is the least negative final score. The Z-score is the highest at 2.314. For start 70804, the final score is -6.101, which is far more negative. The Z-score is 1.311. /note=Gap/overlap: Start 70816 creates a 102 gap and the longest possible ORF. Start 70804 creates a 114 gap. /note=Phamerator: Pham 55997 as of 4/5/2022. The gene is conserved in many other members of the same cluster, CS, such as Harambe_90 and Anamika_91. There is no function called for any of the genes of this pham. The genes are of varying lengths, but 321 bp is somewhat conserved; this is the length of the ORF bordered by start 70816. /note=Starterator: 20/32 non-draft genes in the pham call start site 5, which is not present in the GalacticEye genome. Start site 4 is called 100% of the time it is present, has the most manual annotations in the GalacticEye phage, and corresponds to position 70816. This evidence agrees with the start site predicted by Glimmer but not GeneMark. Start 70804 has the least manual annotations at 6. /note=Location call: Based on the evidence above, this gene is real and has a start site at 70816. /note=Function call: NKF; phagesdb BLAST, NCBI BLAST, HHPred and CDD did not return any informative hits. /note=Transmembrane domains: No TMDs predicted by TMHMM or TOPCONS, therefore it is not a membrane protein. /note=Secondary Annotator Name: Li, Shally /note=Secondary Annotator QC: I agree with the above location and function call. Minor correction: a less negative final score is more desirable, so the final score of -3.934 for the chosen start is actually the best final score given. CDS complement (70919 - 71098) /gene="93" /product="gp93" /function="hypothetical protein" /locus tag="GalacticEye_93" /note=Original Glimmer call @bp 71098 has strength 10.86; Genemark calls start at 71098 /note=SSC: 71098-70919 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_HARAMBE_91 [Gordonia phage Harambe] ],,NCBI, q1:s1 100.0% 7.45752E-34 GAP: 22 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.257, -2.0949393225970705, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_HARAMBE_91 [Gordonia phage Harambe] ],,QAX94697,100.0,7.45752E-34 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Nelson, Shiloh /note=Auto-annotation: The gene is 180 bp long, with coding potential in both glimmer and genemark. Both genemark and glimmer call the same start site, at 71098. As this start site is highly conserved, this shows that the gene has a greater chance of being ‘real.’ /note=Coding Potential: Glimmer and genemark show that there is coding potential, without violation of the basic guiding principles. All criteria are met — this is a real gene. The gene is 180 basepairs long, well over the 40 codon minimum length listed in the guiding principles, the gene shows synteny with non-draft phage genes lahirium and teal. There are no switches in gene orientation. The genemark self annotation map shows that the gene is in the 1st ORF in the reverse direction, with atypical coding potential in the 3rd forward direction open reading frame. /note=SD (Final) Score: The Final Score is -2.095 for the start site 71098. This is not the most negative score, however it is a reasonable score — as it is more negative than -2. /note=Gap/overlap: The gap is at 22 bp, for the downstream gene. /note=Phamerator: This gene is in pham 40771 — as of 3/25/2022. All of the genes belong to the CS cluster. /note=Starterator: According to the starterator report, “Pham number 40771 has 21 members, 8 are drafts.The start number called the most often in the published annotations is 2, (corresponding to 71098), as it was called in 12 of the 13 non-draft genes in the pham.” /note=Location call: This is a real gene, as it has good coding potential, which is conserved in the phamerator. From the evidence collected, the start site is 71098. This is the most annotated start site. /note=Function call: From the NCBI data on PECAAN notes, we see that the phage is NKF. membrane protein with no known function: Based on the data provided, it seems that the function of my gene is a hypothetical protein. As there were no hits in CDC and the values for HHPRED were substandard, I believe that the function of my ORF is NKF. The CDD site backs up my hypothesis, as there are no hits on this site. Based on the data provided, it seems that the function of my gene is a hypothetical protein. Multiple NCBI Blast hits concur with this finding. /note=Transmembrane domains: zero — neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. More specifically, it is a membrane protein with no known function. /note=Secondary Annotator Name: Patel, Rishi /note=Secondary Annotator QC: I agree with these annotations. The start site 71098 is the most heavily conserved among genes in the pham. Additionally, the coding potential is within the putative ORF and the gaps are conserved. The function call is also correct as there is no evidence for a function throughout all the programs, thus, there is no known function at this time. CDS complement (71121 - 71528) /gene="94" /product="gp94" /function="hypothetical protein" /locus tag="GalacticEye_94" /note=Original Glimmer call @bp 71528 has strength 4.61; Genemark calls start at 71528 /note=SSC: 71528-71121 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LUKER_94 [Gordonia phage Luker]],,NCBI, q1:s1 100.0% 1.57615E-90 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.846, -3.3442433900004698, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LUKER_94 [Gordonia phage Luker]],,QDH48339,100.0,1.57615E-90 SIF-HHPRED: SIF-Syn: Function unknown. The Pham number is 3970, downstream Pham number 40771 and has no known function, just like in phages Nimi13 and Guillaume. /note=Primary Annotator Name: Ruiz, Paola /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 71528. /note=Coding Potential: For both GeneMark Host and Self-Trained the coding potential in this ORF is mainly on the reverse strand, indicating that this is a reverse gene. It accurately shows an upward hash around 71528 and ends with a downward hash at 71121 for both. In Self Trained Genemark, the typical and atypical coding potential does not span the length of the start and end sight but does in GeneMark Host. Overall, the gene has good coding potential. /note=SD (Final) Score: The final score is -3.344. This is the second best final score on PECAAN. The z score is 2.846, the best score on PECAAN. /note=Gap/overlap: There is no gap listed in PECAAN as this gene is the last of the genome. /note=Phamerator: The pham number is 3970 as of 4/7/2022. It is conserved; found in Luker and Hello which are all in cluster CS3. /note=Starterator: Start site number is 2 which correlates to start site 71528 bp for GalacticEye. Start 2 is the most called start number. It was manually annotated in cluster CS3 13/33 times. Starterator agrees with GeneMark and Glimmer. /note=Location call: Based on the above evidence, this is a real gene. /note=Function call: Function is unknown. The top NCBI BLAST 7 hits have unknown function (e value 98% identity, 100% positives). For phagesdb BLAST, the top 4 non-draft hits were hypothetical proteins with e value of 2e-73, had >99% identity and positives. CDD and HHPRED were not helpful in determining function and most HHpred hits had very high e values at 220. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Patel, Sahaj /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered.