CDS 143 - 925 /gene="1" /product="gp1" /function="hypothetical protein" /locus tag="Snek_1" /note=Original Glimmer call @bp 143 has strength 19.73; Genemark calls start at 143 /note=SSC: 143-925 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_1 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.11776E-179 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.028, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_1 [Arthrobacter phage Tweety19]],,QNO12662,100.0,1.11776E-179 SIF-HHPRED: SIF-Syn: Pham 86035, downstream gene is a terminase just like in phage Tweety19. /note=Primary Annotator Name: Arshad, Iqra /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 143. /note=Coding Potential: There is significant coding potential in the ORF for this gene and the ORF is only in the forward strand therefore ensuring that this is a forward gene. /note=-2.584 which is the highest final score /note=Gap/overlap: There was no gene gap before the starting site and there was also no overlap indicating that this gene is most likely not part of an operon. /note=Phamerator: pham: 86035. Date 04/25/22. It is conserved; found in Tweety19 (AZ). /note=Starterator: Start site 1 in Starterator was manually annotated in 1/1 non-draft genes in this pham. Start 1 is 143 in Snek. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 143. /note=Function call: Based on all the data I have collected, the function of this protein cannot be determined. Phages DB Blast states that this protein has no function, and this is reflected by the hits this database presented. NCBI Blast results also show hits with Tweety19 and unknown Rhodococcus bacteria, both of which do not have the function listed. CDD also gave no hits for the given query. HHpred did give 2 hits: SPBc2 prophage-derived protein YomS and Bacillus subtilis 168 XepA, Viral Protein. The probability of SPBc2 prophage-derived protein YomS was fairly high (97.78), the coverage was decent (46.1538%) and the E-value was good as well (0.00093) however there was no classified function for this protein. The second hit was for Bacillus subtilis 168 XepA which was classified as a Viral Protein; it had a high probability (97.7) and a high E-value (0.02). Due to the E-value of this protein and the low coverage (39.6154%) I do not believe that this would be good evidence for function. Based on all this evidence I do not believe we can call the function of this ORF. /note=Transmembrane domains: TMHMM and TOPCONS does not predict TMD. Based off this evidence, this gene cannot be assumed to be a membrane domain. /note=Secondary Annotator Name: Gleason, Zoe /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. /note=The transmembrane domain section needs to be filled out on PECAAN. Additionally, HHpred should not be checked as the function is unknown. I agree with this annotation. All of the evidence categories have been considered. CDS 958 - 1422 /gene="2" /product="gp2" /function="terminase, small subunit" /locus tag="Snek_2" /note=Original Glimmer call @bp 958 has strength 12.33; Genemark calls start at 958 /note=SSC: 958-1422 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.79839E-107 GAP: 32 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.028, -2.5052746077145835, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Arthrobacter phage Tweety19]],,QNO12663,100.0,1.79839E-107 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_B,54.5455,98.9 SIF-Syn: Terminase small subunit, upstream gene has NKF, downstream is terminase large subunit, like Tweety19. /note=Primary Annotator Name: Mak, Amanda /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 958. /note=Coding Potential: There is coding potential on the forward strand only and there is coding potential in both the Genemark Self and Host. /note=SD (Final) Score: The final score is the best at -2.505 and the Z-score is the highest at 3.028. /note=Gap/overlap: There is a 32 bp gap, which is not too high and the gene is also conserved in other phages like Adumb2043. /note=Phamerator: Pham: 102883, Date: 04/22/22. The start site is conserved and is found in Adumb2043 (AZ) and Tweety19 (AZ). /note=Starterator: Start site 43 in Starterator was called in 29 out of 138 non-draft genes in the pham, which correlates to a start site of 958 bp for Snek. /note=Location call: Considering all the information above, this gene is a real gene and has a start site at 958 bp. Glimmer, Genemark, and Starterator agree on the start site. /note=Function call: Terminase small subunit. In PhagesDB BLAST, the first two results with phages Tweety19 and DrSierra have the function of a terminase small subunit with an e-value of 1e-86 and 2e-74, respectively. NCBI Blastp results also have strong hits with the function of terminase small subunit and small e-values of 2e-107 to 1e-91. HHPred has a hit with small terminase in HK97, with 98.87% probability, 54.55% coverage, and E-value of 9.6E-9. CDD had no relevant hits. /note=Transmembrane domains: TMHMM and TOPCONS do not predict any TMDs. Based on this evidence, this gene does not encode for a transmembrane protein. /note=Secondary Annotator Name: Olvera, Kevin /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 1419 - 3110 /gene="3" /product="gp3" /function="terminase, large subunit" /locus tag="Snek_3" /note=Original Glimmer call @bp 1419 has strength 13.6; Genemark calls start at 1419 /note=SSC: 1419-3110 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.089, -4.92192487950304, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Arthrobacter phage Tweety19]],,QNO12664,100.0,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,92.54,100.0 SIF-Syn: Large subunit terminase, upstream gene is the small subunit terminase, downstream is portal protein, just like in phage Tweety19. /note=Primary Annotator Name: Farazi, Sepp /note=Auto-annotation: Glimmer and GeneMark. Both call the gene and agree on 1419. Start codon GTG. /note=Coding Potential: Coding Potential is strong throughout the entire ORF. Found in forward direction on both GeneMark Self and Host. /note=SD (Final) Score: -4.922. Not the best score, but still reasonable, ensuring minimal gap and coverage of coding potential /note=Gap/overlap: Overlap: -4. Acceptable gap size, potentially part of operon. Length of gene in 1692, acceptable size. /note=Phamerator: pham: 98020. Date 4/27/22. This gene is conserved among other phages of the same cluster AZ, such as Tweety19, Adolin, and Crewmate. /note=Starterator: Start site 114 in starterator was manually annotated in 22 of 961 phages. This start site was manually annotated in 22 of the 26 AZ phages present in the Pham. Start site 114 is 1419 in Snek. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and most likely starts at 1419. /note=Function call: Large Subunit Terminase. The top three phagesdb BLAST hits have the function of large subunit terminase (all with E-values of 0). The top three NCBI BLAST also have the same function and with (100% coverage, 80+% identity, and E values of 0). HHpred had a hit for terminase large subunit and DNA packaging domain with 92% probability, and E value smaller than 6.3e-34). CDD had a 2 hits for phage terminase with coverage greater than 44%, E value of < 1e-10 but with low identity of 10%. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note= /note=Secondary Annotator Name: Kidd, Conner /note=Notes: Looks very thorough and well done. I am in agreement about the start site and function of the gene. Well done. CDS 3136 - 4500 /gene="4" /product="gp4" /function="portal protein" /locus tag="Snek_4" /note=Original Glimmer call @bp 3136 has strength 16.16; Genemark calls start at 3136 /note=SSC: 3136-4500 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: 25 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.263, -2.033982896655645, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage Tweety19]],,QNO12665,100.0,0.0 SIF-HHPRED: PORTAL PROTEIN; BACTERIOPHAGE SPP1, DNA TRANSLOCATION, MOLECULAR MOTOR, VIRAL PORTAL PROTEIN, VIRAL PROTEIN; HET: CA, HG; 3.4A {BACTERIOPHAGE SPP1},,,2JES_Q,88.7665,100.0 SIF-Syn: Portal protein, upstream gene is a terminase large subunit, downstream is a MuF-like minor capsid protein, just like in phage Tweety19 /note=Primary Annotator Name: Garin V, Paul /note=Auto-annotation: Both GeneMark and Glimmer called the start site at 3136 /note=Coding Potential: High coding potential throughout, as reported from both the host and self-trained auto-annotations. This gene is found on the forward strand. /note=SD (Final) Score: -2.034 /note=Gap/overlap: 25, a small enough gap to where a new gene does not need to be squeezed in before this gene. /note=Phamerator: On 4/25/22, the gene is in pham 102899 which has other AZ subcluster phages in it. There are 457members of this pham and 38 are drafts. I used crewmate_4 and AMyev_3 for comparison. /note=Starterator: The start site number called most often in the published annotations was 22. 154 of the 419 non-draft phage annotations called it. The autoannotated site was 58, which corresponds to a start nucleotide of 3136. The autoannotated site does not match the most common start site of the pham. /note=Location call: Based on previous evidence, this is a real gene that starts at 3136. /note=Function call: Portal protein. Multiple phagesdb blastp and ncbi blastp hits of the gene were portal proteins with very low e-values (0.0), high query identities (80%+), and 97% coverage. The CDD and HHpred hits were very promising, with extremely small e-values for other portal proteins. /note=Transmembrane domains: No hits from TMHMM or Topcons.The absence of TMDs makes sense in the context of the hypothesized function of the gene because a phosphoesterase does not need to cross membranes, its role is to cleave phosphoester bonds. /note=Secondary Annotator Name: Chew, Brandon /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 4503 - 5270 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="Snek_5" /note=Original Glimmer call @bp 4503 has strength 14.41; Genemark calls start at 4503 /note=SSC: 4503-5270 CP: yes SCS: both ST: SS BLAST-Start: [MuF-like minor capsid protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.873, -3.6727816321281046, yes F: hypothetical protein SIF-BLAST: ,,[MuF-like minor capsid protein [Arthrobacter phage Tweety19]],,QNO12666,99.6078,0.0 SIF-HHPRED: SIF-Syn: My Snek gene 5 displays synteny with Tweety19’s gene 5. They are also in the same pham (101588). Additionally upstream (portal, Pham 104471) and downstream (scaffold, Pham 98009) genes are highly conserved in the same order for both genes. /note=QC: Changed to NKF because MuF is no longer being called. -AF /note=Primary Annotator Name: Orr, Max /note=Auto-annotation: Gene (stop@5270 F), both Genemark and Glimmer call at 4503. /note=Coding Potential: High, start site covers all the coding potential for the gene, only in the forward ORF meaning it is a forward gene. Found on both Genemark Host and Self. Start site also covers all the coding potential /note=SD (Final) Score: -3.673, highest SD score in PECAAN /note=Gap/overlap: 2, no alternative start candidates meaning no coding potential in gap, length of gene is acceptable (768 bp), conserved with Tweety19, Maureen, and Liebe. /note=Phamerator: 101588, 04/25/22, commonly annotated and can be seen conserved with Liebe (AZ), Maureen (AZ) and Tweety19 (AZ), function of muf-like minor capsid protein consistent with other phages in pham /note=Starterator: (Start: 3 @4503 has 3 MA`s), 3/6 non-draft call site #3 in this pham, evidence agrees with Glimmer and Genemark predicted start site. /note=Location call: Based on evidence above it is a real gene, calls start site at 4503 bp, and Glimmer and GeneMark also agree with Starterator /note=Function call: There is evidence for MuF, but as of April 2022 seaphages is no longer calling this minor capsid protein MuF, due to it not showing up in the capsid, despite low BLAST hits from both NCBI and PhagesDB with Tweety19 (e-value = 0.0) and Liebe`s (e-value = 4e-118) call for MuF like minor capsid protein. HHpred produced two significant hits for Phage Mu protein F like protein with 98.01% probability, 27.8431% coverage, and e-value of 0.00008 from PF04233.17 and for Phage minor capsid protein with 97.73% probability, 29.8039% coverage, and e-value of 0.00026 from PF06152.14. Low coverage indicates lower likelihood of being associated with these protein functions. No hits were recorded in CDD. /note=Transmembrane domains: No TMD`s were predicted from TMHMM or TOPCONS, therefore this is not a membrane protein. /note=Secondary Annotator Name: Yan, Lisa /note=Secondary Annotator QC: I have QC`ed this location call; primary annotator should update drop-down menu for Starterator, update the RBS and comment on its relevancy, and answer whether the gene is a real gene and what the potential start site candidate is under "Location Call." For phamerator section, can note that the pham also contains phages in both EZ and AH clusters. CDS 5333 - 5875 /gene="6" /product="gp6" /function="scaffolding protein" /locus tag="Snek_6" /note=Original Glimmer call @bp 5333 has strength 16.48; Genemark calls start at 5333 /note=SSC: 5333-5875 CP: yes SCS: both ST: SS BLAST-Start: [scaffolding protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 4.95283E-121 GAP: 62 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.873, -3.6727816321281046, yes F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Arthrobacter phage Tweety19]],,QNO12667,100.0,4.95283E-121 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_f,58.8889,98.2 SIF-Syn: /note=QC: Juliet Stephenson unchecked hit in HHPRED for high e-value /note=Primary Annotator Name: Liu, Jinge /note=Auto-annotation: Gene (stop@5333 F) Both Glimmer and Genemark call the start site at 5333 with good coding potential. Start codon ATG is used, this is the most used codon in Snek. /note=Coding Potential: High coding potential throughout the gene. The autoannotated start site covers all the coding potential. /note=SD (Final) Score: -3.673. Highest and best score. /note=Gap/overlap: 62bp. The smallest gap possible and a reasonable gap. The length of the gene is 542bp, an acceptable gene length. /note=Phamerator: Pham 98009, run on 4/27/22, conserved with Tweety19(AZ) and Liebe(AZ). Evidence from phamerator and phams database called for scaffolding protein, which is also called in Tweety19 and Liebe for this gene, this function is consistently called and found in the approved function list. /note=Starterator: The auto-annotated and most annotated start site is both site 12. This is the most conserved start site. 30 out of 33 non-draft genes called this start site. This site is 5333 in Snek, and is consistent with Glimmerk and Genemark. /note=Location call: This is a real gene conserved in phamerator and has good coding potential. Based on the evidence, site 12 @5333 is the start site for this real gene as supported by GeneMark, Glimmer, and Starterator and covers all coding potential. /note=Function call: Scaffolding protein. All BLAST results from phagesDB suggested that this is a scaffolding protein, and so did NCBI BLAST. The top 2 phagesDB BLAST results have good e-values (below e-59) and good identities (above 63%). The top 2 NCBI BLAST results have good e-values (below e-60), high query coverage (above 98%), high identity (above 64%), and both call for a scaffolding protein. The top hit for HHpred also supported it as a scaffolding protein with good query coverage (59%), average identity (12%), and good e-value (e-04). PhagesDB function frequency also unanimously agreed on scaffolding protein with up to 31% frequency in AZ clusters. /note=Transmembrane domains: This is not a transmembrane protein. This is a soluble protein and has no transmembrane domain as suggested by Sosui. Topcons and TmHmm found no TMDs. This is reasonable, since scaffolding proteins have to move around to tether signaling components and localize them, these proteins should not be fixed on the membrane. /note=Secondary Annotator Name: Arshad, Iqra /note=Secondary Annotator QC: Needs attention: please check the evidence boxes. CDS 5954 - 6895 /gene="7" /product="gp7" /function="major capsid protein" /locus tag="Snek_7" /note=Original Glimmer call @bp 5954 has strength 17.95; Genemark calls start at 6011 /note=SSC: 5954-6895 CP: yes SCS: both-gl ST: SS BLAST-Start: [major capsid protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: 78 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.759, -5.160958035523474, no F: major capsid protein SIF-BLAST: ,,[major capsid protein [Arthrobacter phage Tweety19]],,QNO12668,100.0,0.0 SIF-HHPRED: P22_CoatProtein ; P22 coat protein - gene protein 5,,,PF11651.11,92.3323,100.0 SIF-Syn: Major capsid protein, upstream gene is scaffolding protein, downstream is head-to-tail adaptor, just like in phage Tweety19. /note=Primary Annotator Name: Grunden, Kyla /note=Auto-annotation: Glimmer and GeneMark, however they do not call the same start site. Glimmer: 5954 GeneMark: 6011 /note=Coding Potential: Gene shows coding potential in both self-checked and host-checked GeneMark /note=SD (Final) Score: -5.161, not the best final score in PECAAN but start site is highly conserved /note=Gap/overlap: 78 bp, which is greater than the expected 50 bp, however other related phages (like Tweety19) have similar gaps and there is no coding potential in the gap /note=Phamerator: pham 52753 on 4/25/22. Pham is called in many other phages in cluster AZ (Adolin, Adumb2043, Berry, Cassia…). Commonly called function: major capsid protein. /note=Starterator: Start site 8 is 5954 in Snek. It is the most conserved, manually called in 159/211 non-draft phages. This evidence agrees with the start site called by Glimmer. /note=Location call: 5954 /note=Function call: Major Capsid Protein. The top three phagesdb BLAST hits have the function of major capsid protein (E-value <10^-143),And 4 out of 5 top NCBI BLAST hits also have the function of major capsid protein. (89% coverage, 70%+ identity, and E-value <10^-133). The top 5 HHpred hits were for major capsid proteins (99.9%+ probability, 89.8%+ coverage, and E-value <2.9e-20). CDD had a hit for P22_CoatProtein, which is involved in the formation of pro-capsid shells E value 2.25e-08. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bagdatli, Dila /note=Secondary Annotator QC: Given that the chosen start site covers all of the coding potential as opposed to the other suggested start sites, and it has been conserved in the pham, along with the fact that it has the smallest gap, this is most likely the correct start site of the gene. The location and functional call both look very good. I agree with this annotation. Synteny is also supportive of the calls. CDS 6968 - 7366 /gene="8" /product="gp8" /function="head-to-tail adaptor" /locus tag="Snek_8" /note=Original Glimmer call @bp 6968 has strength 15.22; Genemark calls start at 6968 /note=SSC: 6968-7366 CP: no SCS: both ST: SS BLAST-Start: [head-to-tail adaptor [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 3.52731E-87 GAP: 72 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.263, -2.033982896655645, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Arthrobacter phage Tweety19]],,QNO12669,100.0,3.52731E-87 SIF-HHPRED: 15 PROTEIN; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_C,82.5758,99.1 SIF-Syn: This gene is a head-to-tail adaptor (Pham 76440), upstream gene is major capsid protein (Pham 57253 ), downstream gene has NKF (Pham 105363), just like in Tweety19. /note=QC: Juliet Stephenson unchecked HHPRED evidence bc of poor e-value. Checked Bacillus protein yqbG in HHPRED to align with SEA-PHAGES requirements. /note=Primary Annotator Name: Cho, Clara /note=Auto-annotation: Both Glimmer and GeneMark call the start at 6968. /note=Coding Potential: Coding potential in this ORF is on the forward strand, indicating that it is a forward gene. Coding potential is found on both GeneMark Self and Host. Start and stop site on GeneMArk does not cover all of the coding potential. /note=SD (Final) Score: -2.034. Best final score on PECAAN. /note=Gap/overlap: Gap: 72. Overlap: -1. One base pair overlap, suggests gene could be part of an operon /note=Phamerator: 76440. Date 4/22/22. It is conserved; found in Tweety19 (AZ) and DrSierra (AZ). /note=Starterator: Start site 4 in Starterator was manually annotated in 30 non-draft genes in this Pham. Start 4 is @6968bps in Snek. This evidence agrees with the site predicted by Glimmer and GeneMark. It is also the start site with the longest open reading frame. /note=Location call: The evidence suggests that this a real gene and the most likely start site is 6968bp. /note=Function call: Head to tail adaptor. Both PhagesdB Blast and NCBI Blast called the gene’s function as a head to tail adaptor with e-values less than e-56 and percent coverage greater than 94% . There were no hits from CDD and HHpred had hits for this gene. gene as a head to tail adaptor was found in three final phages, Tweety19, DrSierra. and Reedo. /note=Transmembrane domains: 0. Both TMHMM and TOPCON did not predict transmembrane domains for this gene. /note=Secondary Annotator Name: De Jesus, Jorja /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 7366 - 7482 /gene="9" /product="gp9" /function="hypothetical protein" /locus tag="Snek_9" /note=Original Glimmer call @bp 7366 has strength 17.77; Genemark calls start at 7366 /note=SSC: 7366-7482 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_9 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 3.35314E-17 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.952, -4.759312672633257, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_9 [Arthrobacter phage Tweety19]],,QNO12670,100.0,3.35314E-17 SIF-HHPRED: SIF-Syn: This gene has NKF (Pham 105363), the upstream gene is head-to-tail adapter, and the downstream gene is head-to-tail stopper, just like in phage Tweety19. /note=QC: Juliet Stephenson changed Starterator menu to SS to reflect the data in Starterator. /note=Primary Annotator Name: Chen, Daniel /note=Auto-annotation: Glimmer and Genemark. Both call the start at 7366. Start codon is ATG. /note=Coding Potential: Coding potential is found both in GeneMark Self and Host. There is reasonable coding potential predicted on the gene`s ORF, though the coding potential starts a little bit before the start site. Gene is found on the forward strand only. /note=SD (Final) Score: SD final score is -4.759. Z-score at 1.952. Only option on PECAAN. /note=Gap/overlap: The overlap is 1bp, meaning there is slight overlap that implies gene might be part of an operon. /note=Phamerator: Pham: 104254. Date 4/25/22. It is conserved; found in Tweety19 (AZ), Reedo (AZ), KeAlii (AZ). There was no function called for the gene. /note=Starterator: Start site 1 was manually annotated in 3/24 non-draft genes in this Pham. Start site 1 is 7366, which agrees with the start site predicted by Glimmer and GeneMark. Start site 2 was the most annotated for this pham, but it does not exist in Snek. All genes that did not have start site 2 had start site 1. /note=Location call: Based on above evidence, the gene is real and the start site is most likely at 7366. /note=Function call: Function unknown. No programs returned any informative results. /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: Chang, Stacy /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 7472 - 7813 /gene="10" /product="gp10" /function="head-to-tail stopper" /locus tag="Snek_10" /note=Original Glimmer call @bp 7472 has strength 15.05; Genemark calls start at 7523 /note=SSC: 7472-7813 CP: yes SCS: both-gl ST: SS BLAST-Start: [head-to-tail stopper [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 9.58641E-75 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.119, -2.3158002311349737, yes F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Arthrobacter phage Tweety19]],,QNO12671,100.0,9.58641E-75 SIF-HHPRED: HEAD COMPLETION PROTEIN GP16; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_E,95.5752,99.7 SIF-Syn: Head-to-tail stopper, upstream is NKF (pham is 105363), downstream is NKF (pham is 10439), just like in phage Tweety19 /note=QC: Juliet Stephenson changed Starterator menu to SS to reflect the data in Starterator. /note=Primary Annotator Name: Han, Maggie /note=Auto-annotation: Glimmer: 7472, GeneMark: 7523, Glimmer was autoannotated, start codon: TTG /note=Coding Potential: Reasonable coding potential in putative ORF, chosen start site covers all of coding potential /note=SD (Final) Score: SD score is the best. /note=Gap/overlap: There is an overlap with the previous gene that is reasonable. Overlap is 11 base pairs. The length of the gene is acceptable. /note=Phamerator: Gene was in pham 104222 as of 4/22/2022. The pham is conserved among cluster AZ. Some phages used for comparison was Tweety19, Dr. Sierra, and Adolin. Phamerator suggested that this gene was a head-to-tail stopper. /note=Starterator: Start site 3 at 7472 is a reasonable relatively conserved start site in pham. There are 45 other members in this pham. Most conserved site is start 5 at 8384. 38/46 members of the pham call the most conserved site. Snek does not contain the most conserved site. /note=Location call: Gene is a real gene. 7472 seems to be the most likely start site. /note=Function call: The top two PhagesDb Blast hits suggest that the gene is a head-to-tail stopper with phages Tweety19 and DrManhattan having low e-values of 6*10^-60 and 1*10^-36 and high % identity (>60%). The top two NCBI blast hits also suggest the gene is a head-to-tail stopper with low e-values of less than 10^-30, high identity (>50%), and high query coverage (>89%). HHpred gave results that had high probability (>97), high coverage (>95%) as well as low e-values (<10^-15). The functions all were indicative of a head-to-tail stopper. /note=Transmembrane domains: No transmembrane domains /note=Secondary Annotator Name: Doan, Pearl /note=Secondary Annotator QC: I agree with this call. Glimmer, GeneMark, and Starterator are all consistent with the start site at 17230 and all other categories of evidence have been considered. PhagesDB Blast and NCBI blast also show that this is a head-to-tail stopper. CDS 7822 - 8127 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="Snek_11" /note=Original Glimmer call @bp 7822 has strength 12.96; Genemark calls start at 7822 /note=SSC: 7822-8127 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU70_gp10 [Arthrobacter phage Liebe] ],,NCBI, q2:s3 95.0495% 3.37364E-37 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.122, -4.4057176022952405, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU70_gp10 [Arthrobacter phage Liebe] ],,YP_009817042,76.699,3.37364E-37 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chew, Brandon /note=Auto-annotation: Both Glimmer and GeneMark were used to autoannotate the gene and agreed on an ATG start site at 2309. /note=Coding Potential: Very strong coding potential from ~2320 to ~3640 with few dips in between. /note=SD (Final) Score: -2.481 /note=Gap/overlap: 18 bp gap (without coding potential), a reasonable gap and ORF length /note=Phamerator: Pham 48570 (accessed 9/27/22); highly conserved across several clusters, prominently clusters A and AZ. 1544 hits in total, the large majority of which call portal protein function, indicating strong evidence this is a real gene. /note=Starterator: Not informative, since most annotated start site is not present and most likely start site is unique amongst the entire pham. Evidence still agrees with the site auto-annotated by Glimmer and GeneMark. /note=Location call: Coding potential, synteny, overlap and spacing, and gene length indicate this is a real gene. Matching GeneMark and Glimmer start site auto-annotation, along with start codon identity, lack of large gaps before start site, coverage of coding potential and favorable RBS and Z-score indicate 2309 is the correct start site. /note=Transmembrane domains: TMHMM did not predict any TMDs, therefore it is highly unlikely to be a membrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 8127 - 8540 /gene="12" /product="gp12" /function="tail terminator" /locus tag="Snek_12" /note=Original Glimmer call @bp 8127 has strength 9.93; Genemark calls start at 8127 /note=SSC: 8127-8540 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 2.22406E-89 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.546, -3.525474288708562, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage Tweety19]],,QNO12673,100.0,2.22406E-89 SIF-HHPRED: TAIL-TO-HEAD JOINING PROTEIN GP17; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_G,97.0803,99.4 SIF-Syn: Tail terminator, upstream gene is a member of pham 104222, downstream gene is major tail protein, just like phages DrManhattan and Crewmate as of 5/3/22 /note=Primary Annotator Name: Vu, Trinity /note=Auto-annotation: Both Glimmer and Genemark were used and call the start site at 8127. The start codon is GTG. /note=Coding Potential: It has reasonable coding potential within the putative ORF and the chosen start site covers the potential. Coding potential is found in both GeneMark Self and Host in the forward direction. /note=SD (Final) Score: -3.525 (best on PECAAN) /note=Gap/overlap: There is a 1 base pair overlap which is reasonable. The other alternatives were not chosen as they result in a gap of greater than 150 basepairs. /note=Phamerator: This gene is found in pham 2023 as of 4/25/2022. The pham of this gene is conserved as other AZ members have it such as Maureen, Niobe, Phives, Powerpuff, Reedo, Tbone, Crewmate, and DrManhattan. All phages with this gene call the tail terminator function meaning the function is consistent and it is found in the approved function list. /note=Starterator: Start site #2 the most conserved in the pham but is not present in Snek. There are 46 members in this pham and 37, 26 of which are non-draft genomes, of these call conserved site #2. /note=Location call: This is a real gene whose most likely start site is 8127, agreeing with the auto-annotated call, as it has good coding potential as is conserved in its pham. The most likely start site is site #3, the auto-annotated site, as this site is called in 5 other phages, Adeline, DrManhattan, DrSierra, Tweety19, and VResidence, in the pham and Snek doesn`t have the most conserved site #2. /note=Function call: The top 3 phagesdb BLAST hits have tail terminator function (2e-70, 8e-60, 8e-60) and the top 3 NCBI BLAST hits also have this function (2.2e-89, 1.3e-66, 1.3e-66) with 100%, 84.8%. And 84.8% sequence identity respectively (all 100% query coverage). The absence of TMDs shown by TMHMM and TOPCONs further supports the tail terminator function. /note=Transmembrane domains: It is not a membrane protein because neither TMHMM nor TOPCONs predict any TMDs. /note=Secondary Annotator Name: Paul Garin /note=Secondary Annotator QC: Very thorough notes, I agree with the location call CDS 8559 - 9110 /gene="13" /product="gp13" /function="major tail protein" /locus tag="Snek_13" /note=Original Glimmer call @bp 8559 has strength 13.65; Genemark calls start at 8559 /note=SSC: 8559-9110 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 4.53067E-127 GAP: 18 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.165, -2.5074698667202133, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage Tweety19]],,QNO12674,99.4536,4.53067E-127 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_M,92.3497,98.4 SIF-Syn: Major tail protein, upstream gene is tail terminator, downstream is tail assembly chaperone, just like in Tweety19 /note=Primary Annotator Name: Hsu, Norman /note=Auto-annotation: Glimmer and GeneMark, start site @8559, ATG /note=Coding Potential: Yes, the chosen start sites covers all coding potential /note=SD (Final) Score:-2.507, it has the best SD score among all entries /note=Gap/overlap: 18bps, the gap is reasonably small as it cannot fit in another gene but remain close enough to the gene upstream /note=Phamerator: Pham #: 102101; Date: 4/25/2022; It is present in Tweety19 that is within the same cluster; it functions as a major tail protein /note=Starterator: Yes, start site #18 is the most conserved start site among the members in the pham. Start18 is @8559 in Snek. 78 out of 117 non-draft genes call for this start site, and this agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on all evidence, the most likely start site is @8559 /note=Functional call: The top 3 NCBI Blast results with the smallest evalue shows strong evidence that our gene codes for major tail protein, with high query coverage of 100%, and % identity above 80%, and low e values close to 0. /note=The top result of HHpred, 6XGR, suggests that this gene calls for the function of a major tail protein, with a probability of 98.4, a coverage of 92.3497, and an E-value close to 0. /note=Based on all evidence, this gene calls for major tail protein. /note=Transmembrane domains: both TmHmm and Topcons display no TMDs, and it makes sense given the type of protein /note=Secondary Annotator Name: Arshad, Iqra /note=Secondary Annotator QC: yes, I agree with all the evidence provided. CDS 9207 - 9476 /gene="14" /product="gp14" /function="tail assembly chaperone" /locus tag="Snek_14" /note=Genemark calls start at 9207 /note=SSC: 9207-9476 CP: yes SCS: genemark ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 9.94615E-56 GAP: 96 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.122, -4.97781437024576, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Tweety19]],,QNO12675,100.0,9.94615E-56 SIF-HHPRED: SIF-Syn: tail assembly chaperone, upstream gene is major tail protein, downstream is also tail assembly chaperone, just like in phage Tweety19 and DrSierra /note=QC: Juliet Stephenson unchecked evidence in HHPRED for poor e-values. /note=Primary Annotator Name: Chang, Stacy /note=Auto-annotation: Not called by Glimmer. GeneMark calls the start at 9207 (ATG). This site contains all of the coding potential of the gene. /note=Coding Potential: Strong coding potential in forward strand in both host and self-trained GeneMark. /note=SD (Final) Score: -4.978. It is not the best final score in PECAAN but is not the worst. Z-score is 2.122, which is not the best but above the >2 threshold for a good score. /note=Gap/overlap: There is a somewhat large gap before the start of the gene (96bp). However, this gene is conserved in several other phages and the gap was seen in the other phages as well, such as phage Tweety19 and Adolin. In addition, there is no coding potential for the gap. /note=Phamerator: Pham 104444. Date 4/25/2022. It is conserved; found in 65 other AZ phages including Tweety19 and Crewmate. /note=Starterator: Start site 11 (9207bp) is called in 32 of the 49 non-draft genes in the pham. Of the phams that contain this start site, all but one (Percival) has called it. /note=Location call: This is a real gene and the start site is most likely 9207. This is supported by GeneMark and Starterator (Glimmer did not call a start site). /note=Function call: tail assembly protein. Function called on all hits with known function on BLAST for phagesdb (top three: e <= 5 * 10-29) and NCBI database (top three: e <= 1 * 10-34 with 100% coverage and >65% identity). No good evidence to support function from CDD. HHpred calls some viral hits with the tail assembly protein function, but e-value is >3. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicts any transmembrane domains, therefore this gene does not encode a membrane protein. /note=Secondary Annotator Name: BarcikWeissman, Sara /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS join(9207..9470,9470..9826) /gene="15" /product="gp15" /function="tail assembly chaperone" /locus tag="Snek_15" /note= /note=SSC: 9207-9826 CP: yes SCS: neither ST: NI BLAST-Start: [tail assembly chaperone [Arthrobacter phage Tweety19] ],,NCBI, q1:s1 100.0% 1.63254E-146 GAP: -270 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.122, -4.97781437024576, no F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Tweety19] ],,YP_010678405,100.0,1.63254E-146 SIF-HHPRED: SIF-Syn: /note=added by AF, according to Tweety19 frameshift annotation CDS 9838 - 12117 /gene="16" /product="gp16" /function="tape measure protein" /locus tag="Snek_16" /note=Original Glimmer call @bp 9838 has strength 12.05; Genemark calls start at 9838 /note=SSC: 9838-12117 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.274, -1.993391246735709, yes F: tape measure protein SIF-BLAST: ,,[tape measure protein [Arthrobacter phage Tweety19]],,QNO12677,99.7365,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,83.3992,99.7 SIF-Syn: Tape Measure Protein, upstream gene is tail assembly chaperone, downstream is minor tail protein, just like in phage Tweety19. /note=Gene (stop@12117F) /note=Primary Annotator Name: Farazi, Sepp /note=Auto-annotation: Glimmer and GeneMark. Both call the gene and agree on 9838. Start codon GTG /note=Coding Potential: Coding Potential is strong throughout the entire ORF. Found in the forward direction on both GeneMark Self and Host. The start site suggested by glimmer and genemark covers all the coding potential. /note=SD (Final) Score: -1.993. This is the best final score on PECAAN. Best RBS score as well, gene is not located in an operon. /note=Gap/overlap: Gap: 11bp. Small gap size, conserved gap size found in non-draft phages Tweety19 and Liebe, no coding potential found in gap. Acceptable length of 2280bp. /note=Phamerator: Pham: 103070. Date 4/25/22. The gene is conserved in other phages such as Tweety19, Adolin, and Crewmate. All these phages are found in the same cluster (AZ), the gene’s function call is a tape measure protein, approved on SEA-PHAGES list. /note=Starterator: Start site 5 was manually annotated on 26 of 102 non draft phages in this pham. Start 5 is 9838 in Snek. The start site was called in 26 of 26 (AZ) non-draft phages. This is in agreement with Glimmer and GeneMark’s call. /note=Location call: This is a real gene with a most evident start site of 9838. /note=Function call: Tape Measure protein. The top three phagesdb BLAST hits call the function tape measure protein, all with an E value of 0.0. Two NCBI BLAST hits also have the same function. (85%+ coverage, 50%+ identity, E value 0.0). HHpred had a hit for a baseplate (99% probability, 83% coverage, and an e value of 9e-10). CDD had a hit for tape measure protein with an e value of 0.000006. /note=Transmembrane domains: TMHMM predicts four TMDs and SOSUI predicts seven TMDs. Based on this evidence, we have a TMD in our protein and it is a “membrane protein” but more specifically it is a tape measure protein. /note=Secondary Annotator Name: Max Orr /note=Secondary Annotator QC: Thorough and well done but needs to update the proper location call with start site, rather than pham. i have performed my second QC for this gene and all the evidence aligns properly, and PECAAN is filled out properly. CDS 12117 - 12995 /gene="17" /product="gp17" /function="minor tail protein" /locus tag="Snek_17" /note=Original Glimmer call @bp 12117 has strength 9.85; Genemark calls start at 12117 /note=SSC: 12117-12995 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.951, -3.2535385006449706, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Tweety19]],,QNO12678,100.0,0.0 SIF-HHPRED: ORF46; Distal tail protein, Receptor-binding protein, Phage baseplate, host adsorption apparatus, genome injection device, VIRAL PROTEIN; 3.8A {Lactococcus phage TP901-1},,,4V96_AX,98.9726,100.0 SIF-Syn: Comparison with the phage Tweety19 shows synteny as the upstream gene is tape measure protein in both and it is followed by consecutive minor tail proteins downstream. /note=QC: Juliet Stephenson changed Starterator menu to SS in order to match Starterator info. /note=Gene (stop@12995 F) /note=Primary Annotator Name: Bagdatli, Dila Zeynep /note=Auto-annotation: GeneMark, Glimmer both with start site number 1 at 12117. The start codon is ATG. /note=Coding Potential: There is high coding potential throughout the gene and all of the potential is covered by the chosen start site. /note=SD (Final) Score: -3.254 is the best RBS score that is reasonable. Also, the gene has a 1 bp overlap, meaning that there is a high likelihood that it is part of an operon in which case the RBS is not very relevant. It is still the least negative score and thus the best one compared to all other start sites. /note=Gap/overlap: -1 bp is suggestive of the gene being part of an operon. The 1 bp overlap was the smallest gap, had the longest ORF and the best RBS. It also covered all of the coding potential. /note=Phamerator: The gene belongs to the pham 103836 (4/22/2022). The auto-annotated start site is not the most conserved start site, and is not found in this gene. However, the auto-annotated start site is the most called start site in all of the genes in the same pham that belong to the same cluster as Snek, cluster AZ, which supports the fact that this is the correct start site for the pham genes in the cluster AZ. The auto-annotated start site was kept. PhagesDB called the function as minor tail protein for all of the final genes in the pham. /note=Starterator: The auto-annotated start site is conserved among all the members of the pham that are in the same cluster as Snek. The conserved start site is 2 which is not present in this Snek gene. The start site number for this gene is 1 which is conserved among cluster AZ phage genes. 36/41 genes call this conserved start site and 10/41 call the start site 1 which is the most appropriate for this gene. All 10 that call start site 1 are members of the cluster AZ in which Snek is found. Starterator is, therefore, non-informative as the most conserved start site is not found in 10/41 of the members. /note=Location call: This is a real gene with the start site 1 at 12117 supported by starterator cluster data, coding potential, and pham maps. /note=Function call: The top 3 NCBI BLAST hits, sorted by e-values show that the function is minor tail protein (98%+ coverage, a high identity (>82%) and an e-value 99.7468%), high % identity (99.7%, 82.8%, 79.2%, and 74.3%), and low E-values (0). HHpred found several significantly (e < 10^-12) similar tail protein genes in known prophages and bacteria, further strengthening evidence of this protein being a tail protein. /note=Transmembrane domains: TOPCONS and TMHMM do not predict any TMDs. It is unlikely that this protein has TMDs. /note=Secondary Annotator Name: Chettiyar, Rajshree /note=Secondary Annotator QC: I have QC`ed this location call and agree with the primary annotator CDS 14174 - 15217 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="Snek_19" /note=Original Glimmer call @bp 14174 has strength 11.81; Genemark calls start at 14174 /note=SSC: 14174-15217 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.515, -3.649482460397077, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Tweety19]],,QNO12680,100.0,0.0 SIF-HHPRED: Endo-N-acetylneuraminidase; Chaperone, Glycosidase, Hydrolase; HET: TAM, PEG; 2.6A {Enterobacteria phage K1F},,,3GW6_B,71.4697,98.0 SIF-Syn: Minor tail protein, minor tail protein upstream and downstream, just like in phage Tweety19 /note=Primary Annotator Name: Bradley, Allyssa /note=Auto-annotation: Glimmer & GeneMark agree; start@14174F; ATG /note=Coding Potential: Best coding potential overall in this ORF, but poor at the beginning of the gene (first 10%). Chosen start site does cover all of coding potential. /note=SD (Final) Score: -3.649. It`s pretty mid-range relative to others, but less significant because it`s part of an operon. /note=Gap/overlap: 0. This is a reasonable gap. This is the longest ORF, so all good. The gene is over 140 bp, which is an acceptable length. /note=Phamerator: Pham 99997, as of 4/25/2022. Great match for other AZ phages Tweety19, Reedo, KeAlii, Crewmate. I didn`t see a function call on Phamerator. /note=Starterator: Start 37 is common in 12.4% of genes in pham. Basepair 14174. 23/186 call site #37. /note=Location call: Considering similarity with other phages` genes, I believe that this gene is a real gene. /note= The start site seems to be Start 37 at location 14174, which is conserved in Starterator, covers all coding potential, and is used by other phages on Phamerator. /note=Function call: Minor tail protein. This Snek gene shows the most synteny with other phages’ manually-annotated minor tail protein genes. The top 8 Phagesdb BLAST hits call minor tail protein, and the top hit from Tweety19 yielded an e-value of 0. The top 3 NCBI BLAST hits, which found minor tail protein, found query coverage >97.9%, identity >81%, and e-values of 0. While CDD and HHpred interestingly provided some similar proteins related to tail function, since they had no significant e-values, I don’t think the functions found in those called proteins are relevant to Snek. /note=Transmembrane domains: None called in TMHMM or TOPCONS. /note=Secondary Annotator Name: Hernandez, Betania /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. **Note: just fill in starterator and coding potential drop down menu, add a bit more detail on PECAAN notes to make easier to understand such as in location call section (using annotation manual), might want to include orientation of strand in coding potential section** CDS 15214 - 16278 /gene="20" /product="gp20" /function="minor tail protein" /locus tag="Snek_20" /note=Original Glimmer call @bp 15214 has strength 13.17; Genemark calls start at 15214 /note=SSC: 15214-16278 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.028, -2.5052746077145835, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Tweety19]],,QNO12681,99.7175,0.0 SIF-HHPRED: Receptor Binding Protein; beta sandwich domain, phage receptor binding protein, Lactococcus lactis pellicle cell wall polyphosphosaccharide, VIRAL PROTEIN; 1.75A {Lactococcus phage 1358},,,4L9B_A,49.435,99.3 SIF-Syn: minor tail protein, upstream gene is a minor tail protein, downstream gene is in pham 104034 (as of 5/6/2022), which is also found in Tweety19. /note=Gene (stop@16278 F) /note=Primary Annotator Name: Ramesh, Naren /note=Auto-annotation: Both Glimmer and Genemark agree on the same start site, 15214. The start codon is ATG. /note=Coding Potential: The gene has a reasonably high coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. /note=SD (Final) Score: -2.505. It is the best final score on PECAAN. /note=Gap/overlap: -4bp. Smallest gap and it is the best gap of all start candidates. It can indicate an operon. /note=Phamerator: 104116. Date 04/25/2022. It is conserved; found in other phages of the AZ cluster. /note=Starterator: Start site 6 in Starterator was manually annotated in 7/7 non-draft genes in this pham. Start 6 is 15214 in Snek. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 15214. /note=Function call: Multiple phagesDB BLAST has hits with the suggested function minor tail protein with small e values of 0 to 1e-156 and NCBI BLAST hits with e-values of 0. HHPRED results are not consistent with this conclusion but the results from HHPRED are not nearly as significant (1.7e-10 for the best candidate). No CDD results. /note=Transmembrane domains: TMHMM predicts zero TMD. TOPCONS also predicts zero TMD. This makes sense given the function call of a minor tail protein. /note=Secondary Annotator Name: Mehra, Muskaan /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. ** maybe mention possibility of an operon** CDS 16347 - 16814 /gene="21" /product="gp21" /function="membrane protein" /locus tag="Snek_21" /note=Original Glimmer call @bp 16347 has strength 11.05; Genemark calls start at 16347 /note=SSC: 16347-16814 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.91738E-102 GAP: 68 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.932, -5.822493004643739, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Tweety19]],,QNO12682,100.0,1.91738E-102 SIF-HHPRED: SIF-Syn: Gene 21 has synteny with Tweety 19. the upstream genes are minor tail proteins and downstream genes are tail needle proteins. there is also synteny with Reedo though the synteny breaks a few genes downstream. /note=QC: Juliet Stephenson checked more evidence for membrane protein in NCBI BLAST. /note=Primary Annotator Name: Rajshree Chettiyar /note=Auto-annotation: both, start site called @16347 and it is a ATG start codon /note=Coding Potential: very good coding potential, chosen start site covers all coding potential /note=SD (Final) Score: Throughout this module there were 2 potential other start sites other than the predicted start site@16347 (ATG). This start site has good coding potential though has a 68bp gap with the upstream gene. One potential other start site was at 16329 but it is a TTG codon with the start site inside the coding potential. Its Z-score was also 0.248. Another potential start site was at 16407 with a z-score of 2.058 but a gap of 128bp and once again the start site would be placed after the coding potential. Thus the predicted start site is the best @16347 /note=Gap/overlap: 68 base pair gap with upstream gene. possible upstream start site @16329 - 50 bp gap instead but final score of -8.877 and a z score of 0.248... also a TTG start codon. gene with 16347 start site is acceptable length (467bp) /note=Phamerator: as of 04/25/22 gene is found in pham 104034. Of the 36 members in the pham, 19 of them (not including Snek) are members of cluster AZ. Phage Tweety19 also has a gene in this pham. /note=Starterator: Start: 10 @16347 has 11 MA`s. 11/24 non-draft genes have this start site and this start site is called 94.7% of the time when it is present in the genome /note=Location call: start site at 16347 based on coding potential, z-score, final score, and starterator /note=Function call: This protein’s function is unknown because CDD and HHpred gave no good matches but PhagesDB Blast gave very good matches with other AZ phages (Tweety 19 and Reedo) so the gene is a hypothetical protein. there are 4 transmembrane domains shown by TMHMM and SOSUI thus this protein is now a membrane protein /note=Transmembrane domains: 4 transmembrane domains /note=Secondary Annotator Name: Liu, Jinge /note=Secondary Annotator QC: I agree with the primary annotator. Starting site 16347 has the best RBS score, it is the most annotated start site in the pham when the site is present (>90%) as well as the most annotated start site. Note that the final score and the Z-score are not the best. Note that phamerator did not call for a function. Comment that it is a real gene in the location call section. Transfer the information of the programs and evidence used to determine the transmembrane domains from the location call section to the tramsbebrane domains section. CDS 16814 - 17230 /gene="22" /product="gp22" /function="membrane protein" /locus tag="Snek_22" /note=Original Glimmer call @bp 16814 has strength 14.69; Genemark calls start at 16814 /note=SSC: 16814-17230 CP: yes SCS: both ST: SS BLAST-Start: [tail needle protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.46406E-94 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.208, -4.148052427098672, yes F: membrane protein SIF-BLAST: ,,[tail needle protein [Arthrobacter phage Tweety19]],,QNO12683,100.0,1.46406E-94 SIF-HHPRED: SIF-Syn: Through Pham maps, it is seen that the gene is conserved and has synteny with tweety 19 .The upstream and downstream portions near 53 are conserved and have synteny. Both genes that are directly upstream and downstream seem to be NKF, but the pham numbers are the same. Directly upstream the pham number is 104034, and directly downstream the pham number is 10993. (These pham numbers are as of 5/6/2022) /note=AF: switched to membrane protein. One found in DeepTMHMM. (HHpred alignments do not fully support tail needle) /note=Primary Annotator Name: Pena, Melina /note=Auto-annotation: 16814, Glimmer and Genemark both call the gene at this site. The start codon is ATG. /note=Coding Potential: Coding potential is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The gene encapsulates the entire coding potential. /note=SD (Final) Score: -4.148. This is the best final (least negative) score on PECAAN. /note=Gap/overlap:-1. This overlap is a very reasonable with the upstream gene. The length of my gene is acceptable as it is 416bp long. There seems to be no other best start site. /note=Phamerator: The pham number is 21631. Date 04/22/2022. The pham that my gene is in is conserved; it is found in other AZ phages, namely in Tweety19 and Crewmate. In addition, PECAAN seems to indicate through PhagesDB that this gene codes for a tail needle protein. /note=Starterator: There are only 13 non-draft members of this Pham. 8/13 non-draft members call start site 6 However, this start site is not in snek. An alternative would be start site 8 which correlates to a start site of 16814 bp which matches with information from Tweety19. Therefore, Starterator is uninformative. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 16814 bp. The gene is conserved in phamerator and has good coding potential. Starterator agrees with Glimmer and Genemark in 16814 bp as it is conserved in Starterator and covers all coding potential. /note=Function call: The top NCBI BLASTp hit, sorted by E-value, suggests that function is a tail needle protein with high query coverage (100%), high % identity (100%), and a low E-value of 4e-74. CDD seems to also propose this is a tail needle protein as well with an e-value of 4.39e-5. /note=Transmembrane domains: Based on TmHmm, there seems to be evidence that there is one TMD. This was found by the one hit given by both TmHmm. and TopCon. This would make sense with the function of the gene as it is a tail needle protein and would need to be able to cross into the membrane of a bacterium. /note=Secondary Annotator Name: BarcikWeissman, Sara /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered CDS 17230 - 17565 /gene="23" /product="gp23" /function="membrane protein" /locus tag="Snek_23" /note=Original Glimmer call @bp 17230 has strength 16.3; Genemark calls start at 17230 /note=SSC: 17230-17565 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 4.09577E-62 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.778, -2.9625409806968013, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Tweety19]],,QNO12684,98.1982,4.09577E-62 SIF-HHPRED: SIF-Syn: Gene is a membrane protein, upstream gene is pham 21631, downstream gene is endolysin. This is exactly the same as Tweety19. /note=Primary Annotator Name: Doan, Pearl /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 17230. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.963. It is the best final score on PECAAN. /note=Gap/overlap: Upstream overlap: 1 base pair. This overlap is fine and suggests that this gene is a viable gene. Downstream gap: 30 base pairs.This gap is medium-sized and cannot fit other genes, and there is no coding potential in the gap that might be a new gene. /note=Phamerator: Pham: 10933. Date 4/22/2022. It is conserved; it is found in Adolin (AZ) and Berrie (AZ). There is an unknown gene with unknown function. /note=Starterator: Start site 7 in Starterator was manually annotated in 6/37 non-draft genes in this pham. Start 7 is 17230 in Powerpuff. This is not the most conserved start site, which was start site 6. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 17,230. /note=Functional call: Membrane protein. The top three phagesdb BLAST hits have conflicting information (endolysin, holin, putative structural protein). NCBI BLAST hits also have the function of unknown function. (51.6% coverage, 30%+ identity, and E-value <10^-6). HHpred had no relevant hits. CDD had no relevant hits. However, TMHMM shows that there are two transmembrane domains in this protein, signifying that this protein is a membrane protein. /note=Transmembrane domains: TMHMM predict 2 TMDs and Sosui predicts three TMDs, therefore there is high evidence this is a membrane protein. /note=Secondary Annotator Name: Mak, Amanda /note=Secondary Annotator QC: I agree with this call. Glimmer, GeneMark, and Starterator are all consistent with the start site at 17230 and all other categories of evidence have been considered. TMHMM data is consistent with function being determined as membrane protein. CDS 17562 - 19064 /gene="24" /product="gp24" /function="endolysin" /locus tag="Snek_24" /note=Original Glimmer call @bp 17595 has strength 12.41; Genemark calls start at 17562 /note=SSC: 17562-19064 CP: yes SCS: both-gm ST: NA BLAST-Start: [endolysin [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.322, -6.898293885147568, no F: endolysin SIF-BLAST: ,,[endolysin [Arthrobacter phage Tweety19]],,QNO12685,100.0,0.0 SIF-HHPRED: Bifunctional autolysin; peptidoglycan, autolysin, amidase, N-acetylmuramoyl-L-alanine amidase, HYDROLASE; HET: IMD, PEG; 1.124A {Staphylococcus aureus subsp. aureus},,,4KNK_A,30.0,99.3 SIF-Syn: [Endolysin, upstream gene is in pham 10993 (NKF) as of 05/09/2022, downstream gene is deoxynucleoside monophosphate kinase] since this gene has moved in other phages, Tweety19 is the only phage that has a high similarity to it (synteny). On Tweety19, the upstream gene is in the same pham as our upstream gene (NKF) and the downstream gene is called adenylate kinase which is assumed to have the same function as deoxynucleoside monophosphate kinase (very similar). /note=Primary Annotator Name: Tarighat, Asefeh /note=Auto-annotation: Glimmer has marked the start site to be 17595 and Genemark has marked 17562 /note=Coding Potential: Coding potential is found in the forward direction only. both self and host trained confirm the coding potential /note=SD (Final) Score: -6.898 is for start site of 17562, and -5.115 for 17595 (the originally chosen one) All the options seems to have reasonable SD number. Not too much emphasize was put on this SD score. /note=Gap/overlap: -4 bp (overlapping an upstream gene) for ss 17562 and 29 for ss 17595. (17562 is much more favorable) /note=Phamerator: Pham number as of 02/26/22 is 104252. The start site does not look to be conserved. Highly variable length for this gene is seen (618 bp to 1503 bp). /note=Starterator: start site 17 is MA 10 times but never called for Snek and Tweety19. start site 17 @ 18057 will not include a large area with coding potential and has not called by neither Glimmer nor genemark. most importantly, upon reviewing the Pham maps, we discovered that this gene has moved, validating the fact that start site 17 is not the best start site for Snek. Consequently, I am choosing start site 1 @ 17562 which is designated by the auto-annotated program. /note=Location call: Despite the original start site @17595 and the most manually annotated start site @18057, I call 17562 to be the right start site considering 2 important fact of the genes in the map have moved a lot and start @18057 will not cover a large area with coding potential and for ss 17595 having a larger Gap whereas favorable Gap of -4 for ss 17562. /note=Function call: Endolysin. The top nine phagesdb BLAST hits have the function of endolysin (E-value for the top hit is zero and for the ninth one is 10^-129), similarly, the top six hits on NCBI BLAST also have the function of Endolysin. (100% coverage, 99.8% identity, and E-value of zero is recorded for Tweety19). The data gathered are convincing as a whole since they all confirm this gene has a role in lysing the peptidoglycan layer in bacteria cell wall and is an endolysin. The probability is high on all the tools investigated ( >99%), the E-value is low(<10e-3) and zero on phagesdb blast and NCBI. The coverage reported on HHpred is lower than we desire but still worth recording since the description, probability, and the-E score seem to be reasonable and is quiet consistent in numerous hits found on HHpred. On CDD, (identity=50%, E-value= 1.23e-22, coverage= 21%) data not too strong but still checked as evidence since it`s reported from pfam and also the function reported is peptidase_M23 which cleaves the peptidoglycan cell wall similar to endolysin. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. This supports other evidences. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 19187 - 19789 /gene="25" /product="gp25" /function="deoxynucleoside monophosphate kinase" /locus tag="Snek_25" /note=Original Glimmer call @bp 19187 has strength 15.33; Genemark calls start at 19211 /note=SSC: 19187-19789 CP: yes SCS: both-gl ST: SS BLAST-Start: [adenylate kinase [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 2.25081E-141 GAP: 122 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.028, -2.5052746077145835, yes F: deoxynucleoside monophosphate kinase SIF-BLAST: ,,[adenylate kinase [Arthrobacter phage Tweety19]],,QNO12686,100.0,2.25081E-141 SIF-HHPRED: c.37.1.1 (A:) Deoxynucleoside monophosphate kinase {Bacteriophage T4 [TaxId: 10665]},,,d1deka_,93.0,99.9 SIF-Syn: This gene (Deoxynucleotide Monophosphate Kinase) shows synteny with genes 22, 23, 25, and 26 in phage Tweety19. Where this gene (pham 54294 on 5.6.22) is upstream of genes in Phams 21631 and 10993 (in that order) and downstream of genes from pham 2502 and 105129 (in that order). /note=Primary Annotator Name: Kidd, Conner /note=Auto-annotation: 19187, GTG start codon. /note=Coding Potential: This gene shows very strong coding potential due to good indication from Glimmer and GeneMarkS as well as very very strong BLAST hits and good synteny with other AZ phages. /note=SD (Final) Score: -2.505, best (highest) final score. /note=Gap/overlap: 122, this gap is the smallest gap out of all potential starts and is conserved across other phage genomes in class AZ. /note=Phamerator: Pham 54294 on 4.26.22. This pham has 185 members and is present in 26 other AZ phages. /note=Starterator: Start site 39 @19187 with 30 MA`s. It is not the most conserved start but that most conserved site is not present in this gene. Instead, this gene has the second most conserved start site and calls it. /note=Location call: 19187 is the most likely potential start. This gene is likely a real gene with good coding potential, confirmed in phamerator. /note=Function call: Deoxynucleoiside Monophosphate Kinase. This protein likely functions as a deoxynucleotide monophosphate kinase. Strangely enough, however, its most significant BLASTp hit, with 100% similarity called this protein as an adenylate kinase. The difference in function between these two enzymes is that an adenylate kinase interconverts ATP, ADP, and AMP whereas a deoxynucletide monophosphate kinase phosphorylates nucleosides using ATP. Because CDD called this protein as a member of the nucleoside kinase superfamily, it is more likely that this protein is a deoxynucleotide monophosphate kinase. /note=Transmembrane domains: None Predicted /note=Secondary Annotator Name: Farazi, Sepp /note=Secondary Annotator QC: I QC`d this gene and agree with all the evidence provided. My recommendations are to add the start codon for the auto-annotation, state if Glimmer and GeneMark agree, and elaborate on SD final score. CDS 19891 - 20475 /gene="26" /product="gp26" /function="hypothetical protein" /locus tag="Snek_26" /note=Original Glimmer call @bp 19885 has strength 11.01; Genemark calls start at 20104 /note=SSC: 19891-20475 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_26 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.15547E-135 GAP: 101 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.495, -4.458092987303931, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_26 [Arthrobacter phage Tweety19]],,QNO12687,99.4845,1.15547E-135 SIF-HHPRED: SIF-Syn: NKF, upstream gene deoxynucleoside monophosphate kinase is in pham 54294, downstream gene Cas4 family exonuclease is in pham 105129 (as of 5/2/2022), which is also found in Tweety19. /note=Primary Annotator Name: Yan, Lisa /note=Auto-annotation: Glimmer and GeneMark both call the gene, but Glimmer has the start site at 19885 while GeneMark has the start site at 20104. /note=Coding Potential: Coding potential in this ORF is most strong in the forward strand in both Glimmer and GeneMark, but there is also some coding potential found in the reverse strand. It is unlikely that the gene is in the reverse direction due to this gene being surrounded by other forward genes. /note=SD (Final) Score: -3.631, which is the best final score on PECAAN (for the 19885bp start site). The Z-score is 2.495, which is also the best and above 2. /note=Gap/overlap: The gap for the start site of 19885bp start site is 95, which is somewhat large. However, this ~100bp gap is also observed in other confirmed phage genomes, and there is no coding potential in the gap in either Glimmer or GeneMark. The length of the gene is reasonable (591bp). /note=Phamerator: As of the analysis conducted on April 22, 2022, the gene is found in Pham 2502. The pham in which the gene is conserved is also found in other members of the cluster AZ to which Snek belongs. Many such phages in AZ such as Tweety19 and Warda also contained this pham. The function of the gene in this pham is still unknown. /note=Starterator: The reasonable start site choice that is conserved among members of the pham is start site 15 at 19891bp. 27/34 non-draft genes in the pham have manually annotated start site 15. The Starterator data does not agree with Glimmer or GeneMark, but Z-score data and synteny indicate start site 15 as a better start site. /note=Location call: Based on the evidence, this is a real gene, and the most likely start site is 19891bp. /note=Function call: Multiple phages in phagesDB BLAST and NCBI BLAST have hits with this Snek gene with low e-values. However, these phages, such as Tweety19 (e-104), Adumb2043 (3e-71), and Athrobacter sp. EPSL27 (3e-75) also do not have a called function for this gene. HHpred and CDD also had uninformative results (no matches with low e-values). No transmembrane domains were called. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any transmembrane domains, so it is not a membrane protein. /note=Secondary Annotator Name: Ramesh, Naren /note=Secondary Annotator QC: I agree with this annotation and this function call. All of the evidence categories have been considered. CDS 20682 - 21515 /gene="27" /product="gp27" /function="exonuclease" /locus tag="Snek_27" /note=Original Glimmer call @bp 20682 has strength 14.2; Genemark calls start at 20580 /note=SSC: 20682-21515 CP: yes SCS: both-gl ST: SS BLAST-Start: [Cas4 family exonuclease [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: 206 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.108, -2.356391881054909, yes F: exonuclease SIF-BLAST: ,,[Cas4 family exonuclease [Arthrobacter phage Tweety19]],,QNO12688,100.0,0.0 SIF-HHPRED: Cas4_I-A_I-B_I-C_I-D_II-B; CRISPR/Cas system-associated protein Cas4. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) and associated Cas proteins comprise a system for heritable host defense by prokaryotic cells against phage and other foreign DNA.,,,cd09637,46.5704,99.0 SIF-Syn: There is synteny with phage Tweety19 as the gene preceding in both genomes does not have a known function (pham 2502) and the gene after is a nucleoside dexyribosyltranferase (67497). /note=AF: changed to exonuclease per recent AZ harmonization /note=Primary Annotator Name: Olvera, Kevin /note=Auto-annotation:Both Glimmer and GeneMark although they are different; Glimmer start site:20682, GeneMark: 20580 /note=Coding Potential: Coding potential in this gene is primarily in the forward direction. Coding potential is found in both GeneMark Host and GeneMark Self. The length of the gene is acceptable /note=SD (Final) Score:-2.356 This is the best score on PECAAN /note=Gap/overlap:206 bp. Large gap but this gap is conserved in another phage (Tweety19) and there is no coding potential in the gap. /note=Phamerator: pham 104748; Date 4/25/2022. It is conserved and found in Adolin (AZ) and Tweety19 (AZ). Phamerator called for Cas4 Family Exonuclease /note=Starterator: Date run: 04/22/22; Start site 37 is the most annotated start site, other genes called start site 29. Start 37 is called in 37 out of 92 non-draft genomes and is at position 20682 in Snek. This site is supported by Glimmer. /note=Location call: Based on the prior evidence, this is a real gene with start at 20682 /note=Function call: Cas4 Family exonuclease. Multiple PhagesDB and GenBank Hits for exonuclease prodcuts. Tweety19 and Reedo have e-values of 1e-160 and 1e-153 respectively and both are exonucleases. HHpred had a hit for Cas4 Family exonuclease with a probability of 99, %coverage of 46.57, and an e-value of 5e-8. Other phages were not compared when analyzing the HHpred data. /note=Transmembrane domains: No TMDs detected in TMHMM or TOPCONS therefore not a membrane protein /note=Secondary Annotator Name: Liu, Jinge /note=Secondary Annotator QC: I agree with the primary annotator. The manually annotated site 20682 has compared to the auto-annotated site with better RBS and Z-scores, it is also the most annotated start site in the pham. Note that the Glimmer and Genemark site are different. Add that if the coding potential is within the ORF. Also note that the length of the gene is acceptable. Comment that the phamerator called for Cas4 family exonuclease function. For starterator section, comment that what the start sites are for the other genes which did not call for the most conserved start site since only 37/92 called for the conserved site and that is only ~1/3, notice that start site 29 is also called in 39/119 genes (including the non-draft genes). I agree with the function call. Comment on which phage was used to compare the HHpred values. I agree that it is not a transmembrane protein. CDS 21512 - 21880 /gene="28" /product="gp28" /function="nucleoside deoxyribosyltransferase" /locus tag="Snek_28" /note=Original Glimmer call @bp 21512 has strength 8.28; Genemark calls start at 21536 /note=SSC: 21512-21880 CP: yes SCS: both-gl ST: SS BLAST-Start: [nucleoside deoxyribosyltransferase [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 7.4029E-63 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.316, -6.002696769995274, no F: nucleoside deoxyribosyltransferase SIF-BLAST: ,,[nucleoside deoxyribosyltransferase [Arthrobacter phage Adolin]],,QHB36608,86.0656,7.4029E-63 SIF-HHPRED: c.23.14.1 (A:9-160) Nucleoside 2-deoxyribosyltransferase {Trypanosome (Trypanosoma brucei) [TaxId: 5691]},,,d2f62a2,89.3443,99.7 SIF-Syn: nucleoside deoxyribosyltransferase, upstream gene is Cas4 family exonuclease, downstream is LAGLIDADG endonuclease, just like in phage Tweety19 /note=Primary Annotator Name: Hernandez, Betania /note=Auto-annotation: Glimmer and GeneMark call the start at different sites. Glimmer calls the start at 21512 (GTG codon) while GeneMark calls the start at 21536 (ATG codon). /note=Coding Potential: Reasonable coding potential found within the ORF. Coding potential in forward strand on Glimmer and GeneMark indicates this is a forward gene. The start site at 21512 includes all of the coding potential while start site 21536 excludes some of it at the start. /note=SD (Final) Score: -6.003. Gene shows reasonable SD score and z-value of 1.316. Start site at 21536 has better final score and higher Z-value but 4 bp overlap of start 21512 makes a better start site since indicative as part of an operon. /note=Gap/overlap: -4 bp overlap indicates gene may be part of an operon. This start site creates the longest ORF with acceptable length of 369 bp. /note=Phamerator: Pham: 67697. Date: 4/25/26. It is conserved in Tweety_19, Adolin, and Amyev; all from cluster AZ. Phamerator suggests the function is a nucleoside deoxyribosyltransferase found on the approved SEA-PHAGES list. /note=Starterator: Start site 32 was called by 21/34 non-draft members. This start site corresponds to start site 21512 in Snek. /note=Location call: Gathered evidence suggests this is a real gene. Most likely start site is at 21512 which covers all coding potential. /note=Function call: nucleoside deoxyribosyltransferase. The top 10 top hits on phagesdb BLAST have the function of nucleoside deoxyribosyltransferase with e-values of 6e-67 to 2e-54. 3 out of the top 5 NCBI BLASTp hits, sorted by E-value, suggested function is nucleoside deoxyribosyltransferase. High query coverage (100%), high % identity (>81.15%), and low E-values (5e-81 to 8e-6.3). HHpred displayed hits with Trypanosoma brucei (99.7% probability, 89.3443% coverage, e-value:1.3e-15) and Lactobacillus leichmannii (99.63% probability, 89.3443% coverage, e-value: 2.6e-14). CDD had a relevant hit for Nucleoside 2-deoxyribosyltransferase with an e-value of 4.84e-05. /note=Transmembrane domains: Both TMHMM and TOPCONS do not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gleason, Zoe /note=Secondary Annotator QC: **add direction of the gene in coding potential; maybe write about how the other final score is better and why you didn`t choose that one?** I have QC’ed this location call and agree with the first annotator. I agree with this annotation. All of the evidence categories have been considered. CDS 21883 - 22287 /gene="29" /product="gp29" /function="LAGLIDADG endonuclease" /locus tag="Snek_29" /note=Original Glimmer call @bp 21883 has strength 14.57; Genemark calls start at 21883 /note=SSC: 21883-22287 CP: yes SCS: both ST: SS BLAST-Start: [LAGLIDADG endonuclease [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.38725E-90 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.543, -4.103700314387452, yes F: LAGLIDADG endonuclease SIF-BLAST: ,,[LAGLIDADG endonuclease [Arthrobacter phage Tweety19]],,QNO12690,100.0,1.38725E-90 SIF-HHPRED: d.95.2.0 (A:257-389) automated matches {Baker`s yeast (Saccharomyces cerevisiae) [TaxId: 4932]},,,d2ab5a1,90.2985,99.6 SIF-Syn: LAGLIDADG endonuclease. Gene upstream is nucleoside deoxyribosyltransferase (pham 67497) and downstream gene is a recombination directionality factor (4822), just like the phage Tweety19. /note=Primary Annotator Name: Gleason, Zoe /note=Auto-annotation: Called by both Glimmer and GeneMark /note=Coding Potential: High coding potential in host and self trained GeneMark. Coding potential drops towards the end of the gene. Coding potential is found in the forward direction, the gene is forward. /note=SD (Final) Score: -4.104- This is the best final score. /note=Gap/overlap: 2. Slightly too long to be an operon, but below the recommended 50bp limit. Results in the longest possible ORF (405, which is acceptable). /note=Phamerator: 104402 (as of 4/25/22). Other genes in this pham belong to phages in the AZ subcluster. /note=Starterator: Start number- 13; @21883. Start number 13 is the most frequently called for this pham. The location 21883 is also called by Glimmer and GeneMark. When this start site is present, it is called as the start of the gene 100% of the time. /note=Location call: 21883- This is called by Glimmer, GeneMark, and Starterator. Based on the above data, it is the best possible location call. /note=Function call: LAGLIDADG endonuclease. This gene has strong similarity to both HNH, DNA, and LAGLIDADG endonucleases, but is more similar to Tweety19, whose gene function is LAGLIDADG endonuclease. Tweety19 LAGLIDADG endonuclease has an e value of 1e-72 and a score of 269. The LAGLIDADG domain has slightly better similarity than the other 2, and CDD only resulted in a hit from the LAGLIDADG domain. HHpred`s best hit was a LAGLIDADG endonuclease that had a 90% coverage, 99.6% probability and a 4.8e-15 e value. The top 5 NCBI BLASTp hits, sorted by E-value, suggested function is LAGLIDADG endonuclease, with high query coverage (100%), high % identity (>89%), and low E-values (<1.3e-80). /note=Transmembrane domains: There are no TMDs called by either TMHMM or TOPCONS. This aligns with the hypothesized function of the gene. /note=Secondary Annotator Name: Rivera, Wendy /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. CDS 22425 - 23138 /gene="30" /product="gp30" /function="recombination directionality factor" /locus tag="Snek_30" /note=Original Glimmer call @bp 22425 has strength 18.89; Genemark calls start at 22362 /note=SSC: 22425-23138 CP: yes SCS: both-gl ST: SS BLAST-Start: [recombination directionality factor [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 6.3061E-173 GAP: 137 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.244, -4.134629096801125, no F: recombination directionality factor SIF-BLAST: ,,[recombination directionality factor [Arthrobacter phage Tweety19]],,QNO12691,100.0,6.3061E-173 SIF-HHPRED: Gp3-like ; Recombination directionality factor-like,,,PF18897.3,88.6076,100.0 SIF-Syn: There is synteny upstream and downstream of the gene when compared to similar phages Tweety19 and Reedo. The upstream gene function has been called to be a LAGLIDADG endonuclease, and the downstream gene has no known function. The genes also belong to the same phams (104402 upstream and 105507 downstream) /note=Primary Annotator Name: Mehra, Muskaan /note=Auto-annotation: Glimmer and GeneMark. Do not agree on the start site (Glimmer: 22425, GeneMark: 22362) /note=Coding Potential:The coding potential for this ORF is on the forward strand only, thus confirming that this is a forward gene. Coding potential is found for this ORF for both GeneMark Self and Host-Trained formats. The final start codon is TTG. /note=SD (Final) Score: -4.135. This is the best (least negative) final score on PECAAN. /note=Gap/overlap: 137bp. Relatively large, gap conserved in other related, manually annotated genomes. Unlikely there is a gene that can be added here. No coding potential found in any direction in this gap. /note=Phamerator: pham: 4822 on 04/25/2022. It is conserved, found in Tweety19 and Asa16 (both cluster AZ). /note=Starterator: Start site 35 (22425) is MA`ed 51/110 times. Called 96% of time when present. /note=Location call:: Based on the above evidence, this is a real gene with start site 22425. /note=Function call: Recombination Directionality Factor. The top three PhagesDB and NCBI Blast hits call the function as recombination directionality factor with good E-values for both (<10^-113 and <10^-142 respectively). They also have good coverage (>99%). HHPred had one hit with a good E-value (5.1e-35) and which also calls the function as a recombination directionality factor. There were no hits on CDD. No accurate conclusions can be drawn from CDD, therefore, the function based on the BLAST hits and HHPred call stands as recombination directionality factor. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Pena, Melina /note=Secondary Annotator QC: Good job! If anything, there could be more notes on the Location call section, but the annotation is good as is! -> looks fixed!!! /note=New note: 5/6 Just add the start codon in the autoannotation section CDS 23138 - 23254 /gene="31" /product="gp31" /function="hypothetical protein" /locus tag="Snek_31" /note=Original Glimmer call @bp 23138 has strength 17.13; Genemark calls start at 23138 /note=SSC: 23138-23254 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_31 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.43843E-15 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.128, -4.455662434380964, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_31 [Arthrobacter phage Tweety19]],,QNO12692,100.0,1.43843E-15 SIF-HHPRED: SIF-Syn: Pham 10407, upstream gene is recombination directionality factor, downstream gene is Pham 103043, just like in Tweety19. /note=TMD predicted but it is a signal peptide -AF /note=Primary Annotator Name: Arshad, Iqra /note=Auto-annotation:Glimmer and GeneMark. Both call the start at 23138. /note=Coding Potential: There is significant coding potential in the ORF for this gene and the ORF is only in the forward strand therefore ensuring that this is a forward gene. /note=SD (Final) Score: The RBS Final Score for the called start site is -4.456, this the highest final score. /note=Gap/overlap: There is a gap of 0 between the start site of this gene and the stop site of the previous gene which could be evidence for this gene being part of an operon. /note=Phamerator: pham: 104079. Date 04/26/22. It is conserved; found in Crewneck (AZ) and Amyve (AZ). /note=Starterator: Start site 4 in Starterator was manually annotated in 25/25 non-draft genes in this pham. Start 4 is in Snek. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 23138. /note=Function call: There were no hits that were found that would give us any information about the function of the gene with the stop codon at 23254. The probability of the HHPRED hits were 50% or below and the E-values were extremely (13 being the lowest). The hits from phagesDB and NCBI blast were phage Tweety19(100% coverage and an E-value of 1.43843e-15) and phage Reedo (92.1053% coverage and an E-value of 0.000757501) which did not have any function listed. Due to the available data I was unable to determine the function of this gene. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Cho, Clara /note=Secondary Annotator QC: I agree with the location call. Evidence supports it. CDS 23329 - 23622 /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="Snek_32" /note=Original Glimmer call @bp 23329 has strength 11.38; Genemark calls start at 23329 /note=SSC: 23329-23622 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_32 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.77804E-65 GAP: 74 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.028, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_32 [Arthrobacter phage Tweety19]],,QNO12693,100.0,1.77804E-65 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mak, Amanda /note=Auto-annotation: Both Glimmer and Genemark call the gene and agree that the start site is at 23329 bp. /note=Coding Potential: There is high coding potential in the forward direction. There is coding potential in both GeneMark host and self. /note=SD (Final) Score: The best SD is -2.443 on PECAAN. /note=Gap/overlap: There is a 74 bp gap upstream of the start of the gene, which is a bit large. However, this is ultimately reasonable because the gap is conserved in other phages like Adumb2043 and the gap is too small to encode another gene. There is also no coding potential in the gap. /note=Phamerator: Pham 103043, date: 04/22/22. Conserved start site– found in Liebe (AZ), Maureen (AZ), and Tweety19 (AZ) /note=Starterator: There are 15 non draft members (out of 18) in this pham. 8 out of 15 non draft members call start number 3, which correlates to Snek’s start site of 23329 /note=Location call: This gene is most likely a real gene with a start site at 23329. Starterator data (most annotated and most conserved start sites) agree with Glimmer and GeneMark. /note=Function call: NKF. Although there are PhagesDB hits to phages Tweety19 and KeAlii, neither have a known function. NCBI yielded the same results– similar amino acid sequence in arthrobacter phages Tweety19 and KeAlii, but both were hypothetical proteins. CDD and HHpred yielded no significant hits. /note=Transmembrane domains: TMHMM and TOPCONS both had no TMD hits, therefore this gene is not a membrane protein. /note=Secondary Annotator Name: Rivera, Wendy /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator CDS 23631 - 23897 /gene="33" /product="gp33" /function="NrdH-like glutaredoxin" /locus tag="Snek_33" /note=Original Glimmer call @bp 23631 has strength 12.09; Genemark calls start at 23631 /note=SSC: 23631-23897 CP: yes SCS: both ST: SS BLAST-Start: [NrdH-like glutaredoxin [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 9.98412E-53 GAP: 8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.274, -2.45827804503836, yes F: NrdH-like glutaredoxin SIF-BLAST: ,,[NrdH-like glutaredoxin [Arthrobacter phage Tweety19]],,QNO12694,93.1818,9.98412E-53 SIF-HHPRED: c.47.1.0 (A:1-84) automated matches {Mycobacterium tuberculosis [TaxId: 1773]},,,d2lqoa1,87.5,99.5 SIF-Syn: Genomic architecture in this region shows perfect synteny with phage Tweety19. Upstream gene is NKF (pham 103043), downstream is phosphoesterase (pham 96672). Appears to be part of a five gene operon. /note=Primary Annotator Name: Stephenson, Juliet /note=Auto-annotation: Both Glimmer and Genemark call the gene and agree that the start site is at 23361bp. /note=Coding Potential: There is high coding potential in the forward direction which includes start/stop site. There is coding potential in both GeneMark host and self. /note=SD (Final) Score: The best SD is -2.458 which corresponds to the selected site of 23631. /note=Gap/overlap: There is an 8 bp gap upstream of the start of the gene, which is a reasonable. Gene length is 267 which is adequate. /note=Phamerator: Pham 108681, date: 06/01/22. 964 members, mostly with function NrdH-like glutaredoxin. /note=Starterator: Start 77 @23631 with 1 MA is the most MA`d. MA`d start in pham is not available. /note=Location call: This gene is most likely a real gene with a start site at 23631. Starterator data (most annotated and most conserved start sites) agree with Glimmer and GeneMark. /note=Function call: NrdH-like glutaredoxin. Present in fellow AZ phages Tweety19, Warda, and Tbone. Also good hits with HHPRED M. tuberculosis (87.5% coverage, 2.9e-11), and glutaredoxin hits in the CDD. /note=Transmembrane Domains: TmHmm and TOPCONS detect no TMDs. /note=Secondary Annotator Name: Rivera, Wendy /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator CDS 23894 - 24478 /gene="34" /product="gp34" /function="metallophosphoesterase" /locus tag="Snek_34" /note=Original Glimmer call @bp 23894 has strength 11.94; Genemark calls start at 23894 /note=SSC: 23894-24478 CP: yes SCS: both ST: SS BLAST-Start: [phosphodiesterase [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.36973E-137 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.028, -2.5052746077145835, no F: metallophosphoesterase SIF-BLAST: ,,[phosphodiesterase [Arthrobacter phage Tweety19]],,QNO12695,99.4845,1.36973E-137 SIF-HHPRED: Phosphodiesterase MJ0936; phosphodiesterase, Structural Genomics, PSI, Protein Structure Initiative, Berkeley Structural Genomics Center, BSGC, HYDROLASE; 3.0A {Methanocaldococcus jannaschii} SCOP: d.159.1.7,,,2AHD_A,86.0825,99.8 SIF-Syn: Phosphodiesterase, upstream gene is unknown (same pham 103043) though, downstream is holiday junction resolvase, just like in phage Tweety19 /note=AF: called metallophosphoesterase per recent AZ harmonization + required motif /note= /note=Primary Annotator Name: Garin V, Paul /note=Auto-annotation: Both Glimmer and GeneMark called the start site at 23894 /note=Coding Potential: High coding potential throughout the gene, from both Glimmer and GeneMark, apart from a dip in coding potential around bp 24100. The coding potential is found on the forward strand only, suggesting that this is a forward gene. /note=SD (Final) Score: -2.505 (best on PECAAN) /note=Gap/overlap: -4, which suggests that this gene is a part of an operon. /note=Phamerator: Gene is within pham 99672 as of 4/25/22 and there are other AZ subcluster phages. There are 122 phages in the pham, 22 are drafts. I used phages DrManhattan_34 and Iter_35 for comparison. /note=Starterator: Start site number 51 was called most often in the published annotations. 22/22 of the non-draft phage annotations called start site number 51. This was also the autoannotated start site called, corresponding to 23894. /note=Location call: Based on the previous evidence, this is likely a real gene that starts at 23894. /note=Function call: Phosphodiesterase. Multiple phagesdb blastp and ncbi blastp hits of the gene were phosphodiesterases with very low e-values (e-90 or lower) ,high query identities (80%+) , and 87%+ coverage. The CDD hit was not terribly helpful, as it suggested the gene may code a metallophosphotase, when previous evidence suggested a phosphodiesterase. One of the HHpred hits suggests that the gene could code for the predicted phosphodiesterase and the other hit suggested a vacuolar protein sorting-associated protein, which was also not previously considered. /note=Transmembrane domains: N/A, no hits from TMHMM or Topcons. The absence of TMDs makes sense in the context of the hypothesized function of the gene because a phosphoesterase does not need to cross membranes, its role is to cleave phosphoester bonds. /note=Secondary Annotator Name: Cho, Clara /note=Secondary Annotator QC: I agree with the location call. Add date of phamerator and 2 other non draft genes it is found on. CDS 24475 - 24915 /gene="35" /product="gp35" /function="holliday junction resolvase" /locus tag="Snek_35" /note=Original Glimmer call @bp 24475 has strength 7.56; Genemark calls start at 24475 /note=SSC: 24475-24915 CP: yes SCS: both ST: SS BLAST-Start: [holliday junction resolvase [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 4.89165E-101 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.981, -4.760102963860426, no F: holliday junction resolvase SIF-BLAST: ,,[holliday junction resolvase [Arthrobacter phage Tweety19]],,QNO12696,100.0,4.89165E-101 SIF-HHPRED: Holliday junction resolvase; archeal holliday junction resolvase helicase DNA binding enzyme phage 15-6 thermus thermophilus, RECOMBINATION; HET: MSE, SO4; 2.5A {Thermus thermophilus phage 15-6},,,7BGS_A,80.8219,99.6 SIF-Syn: My Snek gene 36 displays synteny with Tweety19’s gene 35 for holliday junction resolvase. They also are part of the same pham (104158). Their gene order with upstream (phosphoesterase, Pham 96672) and downstream (NKF, Pham 788) genes are all highly conserved in the same order. /note=Primary Annotator Name: Orr, Max /note=Auto-annotation: Gene (stop@24915 F), Glimmer and Genemark both call start site at 24475. /note=Coding Potential: High except for last ~100 bp, found on both Genemark Self and Host, coding potential found in this ORF on forward strand only, indicating forward gene. Start site covers all of the coding potential. /note=SD (Final) Score: -4.760, third highest SD score in PECAAN /note=Gap/overlap: -4, there is an overlap, but it is reasonable as it is of the same operon as another gene. /note=Phamerator: 104158, 04/25/22, conserved with Tweety19 (AZ) and DrSierra (AZ) and displays same gene function of holliday junction resolvase with other members in pham. /note=Starterator: Start site 84 in Starterator was manually annotated in 28 of the 299 non-draft genes in this pham. All 28 of the manual annotations are in the same AZ cluster. The start site 84 is 24475 in Snek, and agrees with Glimmer and Genemark`s start site. /note=Location call: Based on evidence above this is a real gene, calls the start site at 24475, and GeneMark, Glimmer, and Starterator all agree. /note=Function call: holliday junction resolvase, top 9 phagesdb non-draft BLAST hits call the holliday junction resolvase (E-value < 10^-62). Top 7 Arthrobacter phage from NCBI BLAST also call holliday junction resolvase (100% coverage, E-value < 10^-72, 83%+ identity). HHpred produced a most significant hit for holliday junction resolvase from 7BGS_A with 99.64% probability, 80.8219% coverage, and an e-value of 3e-15. Other significant hits with >98.9% probability, >70% coverage, and <10^-7 e-value were also observed by 1OB8_B and d1ob8a_ in HHpred. No hits were observed in CDD. /note=Transmembrane domains: There were no TMDs predicted from TMHMM or TOPCONS, therefore this would not be a membrane protein. /note=Secondary Annotator Name: Tarighat, Asefeh /note=Secondary Annotator QC: I have Qc`d this location call and agree with the first annotator. CDS complement (24912 - 25070) /gene="36" /product="gp36" /function="hypothetical protein" /locus tag="Snek_36" /note=Original Glimmer call @bp 25049 has strength 4.44; Genemark calls start at 25049 /note=SSC: 25070-24912 CP: yes SCS: both-cs ST: NI BLAST-Start: [hypothetical protein SEA_TWEETY19_36 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 3.98181E-28 GAP: 138 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.546, -4.097571056659081, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_36 [Arthrobacter phage Tweety19]],,QNO12697,100.0,3.98181E-28 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liu, Jinge /note=Auto-annotation: Gene (stop@24912 R), Glimmer and Genemark both call the start site at 25049. Start codon ATG is used, it is the most used codon for Snek. /note=Coding Potential: High coding potential throughout most of the gene. A small region near the end does not have coding potential, nevertheless, there is only one stop site that includes all the coding potential. /note=SD (Final) Score: -6.159. Lowest out of the three possible site. Not the best score. It is still a reasonable score for ribosome binding site. /note=Gap/overlap: Downstream gap of 159bp. Large gap, but there is a change in direction, so reasonable. However, this can still be minimized by choosing one of the other start sites. It is a downstream gap because there is a change in direction. Length of the gene is 137bp with the autoannotated start site, a reasonable length. /note=Phamerator: Pham 788, run on 4/27/22, conserved with Tweety19 (AZ) and Reedo (AZ). Phamerator and phams database did not call for a function. /note=Starterator: Autotannotated start site is 21, called in 17/23 of the non-draft genes. This is the most conserved site, but it is not found in Snek. Among the start sites Snek contains, start site 22 was manually annotated in 3 out of the 23 non-draft genes, and all three annotations are part of the AZ cluster. Start site 22 is 25070 in Snek and does not agree with Glimmer and Genemark. /note=Location call: This gene contains good coding potentials and is conserved in phamerator, thus a real gene. Evidence suggests that the auto-annotated start site 25049 is not the best site. Calls for a different start site than the auto-annotated start site, 25070, as starterator and RBS and Z-scores suggest, this site also covers all the coding potentials. /note=Function call: NKF. Function unknown, all similar genes in other phages labeled this gene as "function unknown" on phagesDB BLAST, and NCBI BLAST labeled all significant alignments as hypothetical proteins. The first two hits in NCBI BLAST have good query coverages (above 88%), high identities (above 48%), and good e-values (below e-08). The first two hits in phagesDB BLAST also had high identities (above 62%), and good e-values (below e-09). /note=Transmembrane domains: This is not a transmembrane protein. This is a soluble protein as suggested by Sosui. Topcons and TmHmm both predicted no TMDs. /note=Secondary Annotator Name: Yan, Lisa /note=Secondary Annotator QC: I agree with this annotation. The SD score annotation should be fixed, as the final score was -4.098; in addition, there should be comments on the Z-score in this section as well. The annotation on the gap should also be fixed, as you ultimately ended up choosing the start site with a better final and Z score. In the Phamerator section, include information about whether the pham contains other members of the cluster to which Snek belongs to (AZ). In the Starterator section, should comment that the manual annotation at site 22 matches the Z and final score data, and answer other questions such as is the site conserved among members of pham and how many other members are in the pham. CDS 25209 - 27740 /gene="37" /product="gp37" /function="DNA primase/helicase" /locus tag="Snek_37" /note=Original Glimmer call @bp 25209 has strength 13.22; Genemark calls start at 25209 /note=SSC: 25209-27740 CP: yes SCS: both ST: SS BLAST-Start: [DNA primase/helicase [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: 138 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.708, -3.170856472917156, no F: DNA primase/helicase SIF-BLAST: ,,[DNA primase/helicase [Arthrobacter phage Tweety19]],,QNO12698,100.0,0.0 SIF-HHPRED: Primase; primase, helicase, ssDNA-binding protein, TRANSFERASE; HET: SO4; 2.406A {Nitratiruptor phage NrS-1},,,6K9C_A,39.6204,100.0 SIF-Syn: DNA primase/helicase, upstream gene is NKF (Pham 788), downstream is NKF (Pham 5451), just like in phage Tweety19 /note=Primary Annotator Name: Grunden, Kyla /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 25209 /note=Coding Potential:Coding potential is on the forward strand only, indicating the gene is in the forward direction; high coding potential for GeneMark self and host; start site covers all coding potential /note=SD (Final) Score: -3.171, it is the best SD on PECAAN /note=Gap/overlap: 159 bp, there is a change in gene direction so a gap of at least 50 bp is expected /note=Phamerator: pham 83049 (4/26/22); present in other members of the AZ cluster (Adolin, Asa16, Berrie, Cassia, Cremate, etc.); commonly called function: DNA primase/helicase /note=Starterator: start site 37 is 25209 in Snek. It is manually called in 41/85 of non-draft phages. This evidence agrees with the start site called by Glimmer and GeneMark. /note=Location call: 25209 /note=Function call: DNA primase/helicase. The top 9 NCBI hits have the function of DNA primase/ helicase (100% coverage, 83%+ identity, and all E-value are 0). The top 6 non-draft phagesDB blast hits have the function of DNA primase/helicase (all E-value are 0). The top 2 HHPred hits (99.92%+ probability, E-value<1.5e-22) are for DNA-primases. The third top HHpred hit (99.83%+, E-value 1.9e-18) is for a helicase. There were 4 relevant CCD hits, 3 for DNA Primase(E-value.9.19e-7) and 1 for Protein D5 (E-value>1.55e-27), which is a DNA helicase. The info from HHPred and CCD fulfills the SEA-PHAGES requirement to call a protein DNA Primase/Helicase because there is evidence for both functions. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Orr, Max /note=Secondary Annotator QC: I agree with all the evidence provided by Glimmer, GeneMark, and Starterator. I would suggest providing more information on starterator information. I have performed by second QC and all the evidence aligns properly. CDS 27749 - 27862 /gene="38" /product="gp38" /function="hypothetical protein" /locus tag="Snek_38" /note=Original Glimmer call @bp 27749 has strength 6.6 /note=SSC: 27749-27862 CP: no SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_38 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 8.56693E-19 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.122, -4.4057176022952405, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_38 [Arthrobacter phage Tweety19]],,QNO12728,100.0,8.56693E-19 SIF-HHPRED: SIF-Syn: This gene has NKF (Pham 5451), upstream gene is a DNA primase/helicase (Pham 83049), downstream gene has NKF (Pham 437), just like in phage Tweety19. /note=Primary Annotator Name: Cho, Clara /note=Auto-annotation: Only Glimmer called the start at 27749. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The chosen start site does not cover all the coding potential. /note=SD (Final) Score: -4.406. Best final score in PECAAN. /note=Gap/overlap: Gap: 8. Gap between the upstream gene is reasonable because there are no start codons in the gap /note=Phamerator: 54541. Date 4/22/22. It is conserved; found in Tweety19 (AZ) and KeAlii (AZ) /note=Starterator: Start site 7 in Starterator was manually annotated one time in this Pham and found in 2/38 of genes in Pham. Start 7 is @27749. This was the only start site called in Snek. This evidence agrees with the site predicted by Glimmer. The start site also corresponds to the longest open reading frame. /note=Location call: The evidence suggests that this is a real gene and the most likely start site is 27749. /note=Function call: NKF. Phagesdb BLAST had two hits from Tweety19 and KeAlii with low e-values, 2e-17 and 9e-13, but the function was unknown. NCBI also produced two hits from Tweety19 and Phives with low e-values, 8.6e-19 and 9.4e-12, but they both predicted a hypothetical protein. There were no hits from CDD. The top 3 hits from HHpred did not have good e-values (<1e-3) and were not on the approved functions list. /note=Transmembrane domains: 0. Both TMHMM and TOPCON predicted that this gene does not have any transmembrane domains. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 27868 - 28032 /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="Snek_39" /note=Original Glimmer call @bp 27847 has strength 8.71; Genemark calls start at 27868 /note=SSC: 27868-28032 CP: yes SCS: both-gm ST: NI BLAST-Start: [hypothetical protein SEA_TWEETY19_39 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.79046E-30 GAP: 5 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.345, -5.943378579789011, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_39 [Arthrobacter phage Tweety19]],,QNO12699,100.0,1.79046E-30 SIF-HHPRED: SIF-Syn: This gene has NKF (Pham 437), upstream gene has NKF (Pham 54541), and the downstream is DNA Polymerase I, just like in phage Tweety19. /note=Primary Annotator Name: Chen, Daniel /note=Auto-annotation: Glimmer calls the start site at 27847. GeneMark calls the start site at 27868. The start codon that is called is GTG. /note=Coding Potential: Coding potential is found both in GeneMark Self and Host. The ORF has reasonable coding potential and the predicted start site cover all of it. /note=SD (Final) Score: SD final score of -5.381. Z-score of 2.052. These are the best scores on PECAAN. /note=Gap/overlap: Overlap of 16 bp. Although this overlap is slightly larger than usual, this overlap is conserved in other phages (Tweety19 and Reedo). /note=Phamerator: Pham: 437. Date 4/25/22. It is conserved; found in Tweety19 (AZ), Reedo (AZ), KeAlii (AZ), and more. There was no function called for the gene. /note=Starterator: Start site 7 was manually annotated in 4/15 non-draft genes in this Pham. Start site 7 is 27868, which agrees with the start site predicted by GeneMark. Start site 6 was the most annotated for this pham, but it does not exist in Snek. /note=Location call: Based on above evidence, the gene is real and the start site is most likely at 27868. /note=Function call: Function unknown. No programs returned any informative results. /note=Transmembrane domains: No transmembrane domains predicted. /note=Secondary Annotator Name: Bradley, Allyssa /note=Secondary Annotator QC: I`m a little confused why you marked Not Informative for Pham Starterator drop-down on top, since it seems like you picked the suggested start site. (I trust what you pick, though.) Everything else looks great, though! CDS 28185 - 30029 /gene="40" /product="gp40" /function="DNA polymerase I" /locus tag="Snek_40" /note=Original Glimmer call @bp 28185 has strength 15.01; Genemark calls start at 28185 /note=SSC: 28185-30029 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase I [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: 152 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.122, -4.387988835334809, no F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Arthrobacter phage Tweety19]],,QNO12700,100.0,0.0 SIF-HHPRED: Prex DNA polymerase; DNA polymerase, TRANSFERASE; HET: SO4; 2.9A {Plasmodium falciparum},,,5DKT_A,96.7427,100.0 SIF-Syn: DNA Polymerase 1, upstream is NKF (pham 437), downstream is NKF (pham 16103) just like Tweety19 /note=Primary Annotator Name: Han, Maggie /note=Auto-annotation: Glimmer and GeneMark: 28185, Start codon: TTG /note=Coding Potential: Reasonable coding potential in putative ORF, Chosen start site covers all of the coding potential /note=SD (Final) Score: The SD score is not the best but is still reasonable to suggest the presence of a credible ribosome binding site /note=Gap/overlap: There is a gap with the upstream gene. No other site creates a longer ORF and the length of the gene is acceptable /note=Phamerator: 4/25/22 Pham 47481, Pham is conserved in cluster AZ. Phages used are Adolin, Tweety 19, and Asa16. Function given was DNA Polymerase 1. /note=Starterator: Start site 61 at 28185 is a reasonable start site. 52/892 genes in this pham call this site. Most often called site is 62 at 31266. This was in 826/892 members of the pham but was not present in phage Snek. /note=Location call: Evidence suggests that gene is a real gene and that the most likely start site is 28185 /note=Function call: Evidence from both PhagesDb and NCBI suggests that this gene is a DNA Polymerase 1. The top 2 PhagesDb hits have an e-value of 0 which is very low and have high identity (>90%). For NCBI, the top 2 hits have an e-value of 0 which is very low, high identity (>90%), and high query coverage (100%). CDD showed multiple domains of DNA Polymerase 1. All with low e-values (<10^-50). HHpred also showed hits that have 100% probability, an e-value of 0 (which is the best e-value possible), and high coverage (>95%). These hits suggest that the function of the gene is DNA Polymerase 1. /note=Transmembrane domains: No Transmembrane domains /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 30099 - 30401 /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="Snek_41" /note=Original Glimmer call @bp 30099 has strength 10.26; Genemark calls start at 30099 /note=SSC: 30099-30401 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_42 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 5.52732E-64 GAP: 69 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.961, -2.6623718267059564, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_42 [Arthrobacter phage Tweety19]],,QNO12701,100.0,5.52732E-64 SIF-HHPRED: SIF-Syn: NKF (pham 16103) DNA polymerase 1 upstream and DNA binding protein downstream is exactly 1-to-1 with Tweety19. /note=QC: Noah Canio- Synteny box: changed downstream function from "RNA sigma factor" to "DNA binding protein." Note: RNA sigma factor is not on the approved functional list, its alternative approved name is DNA binding protein. Also: added function and pham of this gene and deleted the excess upstream and downstream gene info. /note= /note=Primary Annotator Name: Chew, Brandon /note=Auto-annotation: Both Glimmer and GeneMark were used to autoannotate the gene and agreed on an ATG start, start site #30, coordinate 30099. /note=Coding Potential: Significant coding potential over the entire auto-annotated region /note=SD (Final) Score: -2.662 (best) /note=Gap/overlap:+69bp gap; reasonable gap and ORF length /note=Phamerator: Pham 16103 (accessed 4/25/22); highly conserved across AZ cluster; compared to Adolin, DrManhattan and Tweety19 /note=Starterator: Highly conserved, most annotated start site #30 with 64/73 non-draft phage annotations also having called it; Start #30 @30099 /note=Location call: Coding potential, synteny, overlap and spacing, and gene length indicate this is a real gene. Matching GeneMark and Glimmer start site auto-annotation, along with start codon identity, lack of large gaps before start site, coverage of coding potential, favorable RBS and Z-score, and conservation of most annotated start site indicate 30099 is the correct start site. /note=Function call: Despite PhagesDB BLAST and NCBI BLASTp returning strong hits with low E-values, high identity and query match, none returned a known function. Based on a lack of hits with any function from BLAST or BLASTp, a lack of CDD hits, and no acceptable HHpred hits, the function is still NKF. /note=Transmembrane domains: There are no transmembrane domains to analyze. /note=Secondary Annotator Name: De Jesus, Jorja /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 30604 - 31398 /gene="42" /product="gp42" /function="DNA binding protein" /locus tag="Snek_42" /note=Original Glimmer call @bp 30544 has strength 16.63; Genemark calls start at 30544 /note=SSC: 30604-31398 CP: no SCS: both-cs ST: NI BLAST-Start: [RNA polymerase sigma factor [Arthrobacter phage Tweety19]],,NCBI, q1:s21 100.0% 0.0 GAP: 202 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.961, -2.644643059745525, yes F: DNA binding protein SIF-BLAST: ,,[RNA polymerase sigma factor [Arthrobacter phage Tweety19]],,QNO12702,92.9577,0.0 SIF-HHPRED: RNA polymerase sigma factor RpoS; Transcription-activator, DNA/RNA, SigmaS, beta`, TRANSCRIPTION, Transferase-DNA complex; 3.26A {Escherichia coli},,,6OMF_F,97.3485,100.0 SIF-Syn: DNA binding protein, upstream gene is a member of pham 16103, downstream gene is a member of pham 11290, just like phages KeAlii and DrManhattan as of 5/3/22 /note=Primary Annotator Name: Vu, Trinity /note=Auto-annotation: Both Glimmer and Genemark were used and call the start site at 30544. The start codon is GTG. /note=Coding Potential: It has reasonable coding potential within the putative ORF. The chosen start site is different from the auto-annotated and does not cover all of the coding potential. Coding potential is found in both the GeneMark Self and Host maps in the forward direction. /note=SD (Final) Score: -2.645 (best on PECAAN) /note=Gap/overlap: The gap is 202 bp which is large but conserved in other genomes such as Crewmate and Iter. I didn`t choose a gap that would create a longer ORF because there was no additional coding potential to cover outside the auto-annotated ORF and because start sites that would extend the ORF had poor final and z-scores. /note=Phamerator: It`s in pham 97248 as of 4/25/2022. It`s conserved in other AZ phages such as Iter, Lego, Liebe, Maureen, and Percival. The function for this gene called was consistent between DNA binding protein and RNA Polymerase sigma factor which are both in the approved function list. /note=Starterator: The conserved start site is site #22 which corresponds to basepair 30604 in Snek. There are 48 members in the pham and 38/48 call site #22 (22/34 non-draft annotations call this site). Start site #22 is reasonable because it is present in 79.2% of the phages in the pham. /note=Location call: This is a real gene whose most likely start site is 30604 because this is the "Most Annotated" start site. Even though this start site results in the loss of some coding potential, because only 1 other phage, Tweety19, contains the auto-annotated start site #2, the Most Annotated is better as many other phages in the pham have it. /note=Function call: DNA binding protein. The top phagesdb BLAST hit has the RNA polymerase binding factor (1e-144) and the top NCBI BLAST hit also has this function with 100% identity (4e-129). The 2nd hit in phagesdb BLAST has DNA binding protein function (5e-91 probability and 100% coverage) and in NCBI BLAST it have alignment with 3e-77 probability and 61.62% sequence identity with 99% query coverage. CDD hits show 2 conserved domains listed as RNA polymerase with low e-values and the top 2 HHpred hits also give structures that correspond to RNA polymerase sigma factor function with probabilities of 99.7, % coverage of over 97 for both, and low e-values. However, SEA-PHAGES approved function list says to not call RNA polymerase sigma factor and instead call the function DNA binding protein. The function is further supported by the absence of TMDs. /note=Transmembrane domains: It is not a membrane protein because neither TMHMM nor TOPCONS predicts any TMDs. /note=Secondary Annotator Name: Chang, Stacy /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 31511 - 31789 /gene="43" /product="gp43" /function="hypothetical protein" /locus tag="Snek_43" /note=Original Glimmer call @bp 31511 has strength 16.18; Genemark calls start at 31511 /note=SSC: 31511-31789 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_44 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 2.39073E-58 GAP: 112 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.343, -4.519979978165142, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_44 [Arthrobacter phage Tweety19]],,QNO12703,100.0,2.39073E-58 SIF-HHPRED: SIF-Syn: NKF, upstream gene is DNA binding protein, downstream is SprT-like protease, just like in Tweety19 /note=Primary Annotator Name: Hsu, Norman /note=Auto-annotation: Glimmer and GeneMark, start site @31511, ATG /note=Coding Potential: Yes, the chosen start sites covers all coding potential /note=SD (Final) Score:-4.52, it has the best SD score /note=Gap/overlap: 112bps, the gap is larger than 50bps, but it seems like the gap does not have any coding potential and no genes can be inserted /note=Phamerator: Pham #: 11290; Date: 4/26/2022; It is present in Tweety19 that is within the same cluster; it has an unknown function. /note=Starterator: Yes, start site #5 is the most conserved start site among the members in the pham. Start5 is @31511 in Snek. 12 out of 12 non-draft genes call for this start site, and this agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on all evidence, the most likely start site is @31511 /note=Function call: The top 2 NCBI BLASTp hits, TWEETY19_44 and KEALII_43, suggested function remains unknown, with high query coverage (>97%), % identity (100% and 45.63% respectively), and low E-values close to 0. The top result of HHpred, 6O5L_A, suggests that this gene calls for the function of a DNA binding protein, with a probability of 90.8, a coverage of 92.3913, but a high E-value of 5.2. Therefore, there is no strong evidence that this particular gene calls for the function of DNA binding protein, which is why the gene has no known function. /note=Transmembrane domains: both TmHmm and Topcons display no TMDs /note=Secondary Annotator Name: Grunden, Kyla /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 31914 - 32510 /gene="44" /product="gp44" /function="SprT-like protease" /locus tag="Snek_44" /note=Original Glimmer call @bp 31914 has strength 14.99; Genemark calls start at 31914 /note=SSC: 31914-32510 CP: yes SCS: both ST: SS BLAST-Start: [SprT-like protease [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 8.18781E-143 GAP: 124 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.028, -2.523003374675015, yes F: SprT-like protease SIF-BLAST: ,,[SprT-like protease [Arthrobacter phage Tweety19]],,QNO12704,100.0,8.18781E-143 SIF-HHPRED: SprT-like domain-containing protein Spartan; DPC repair protease, DNA BINDING PROTEIN; HET: ADP, MLZ, FLC; 1.5A {Homo sapiens},,,6MDW_A,51.5152,99.6 SIF-Syn: SprT-like protease. Difficult to determine synteny with adjoining genes, as phams are not often conserved and functions are primarily NKF. However, synteny can be found with Tweety19, where the upstream pham is 11290 and the downstream pham is 6637. /note=QC: Noah Canio- Checked evidence on CDD. /note=Primary Annotator Name: Chang, Stacy /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on the start site at 31914 bp. /note=Coding Potential: Strong coding potential in forward strand, weak but present coding potential in reverse strand in both host- and self-trained GeneMark. /note=SD (Final) Score: The final score and Z score are the best options at final score = -2.523 and z score = 3.028 respectively /note=Gap/overlap: 124. Somewhat large, but ultimately reasonable because the gap is conserved in other phages (Tweety19, DrManhattan) and there is no coding potential in the gap that might be a new gene. /note=Phamerator: Pham 19450. Date 4/25/2022. It is conserved; found in 38 other AZ phages including Tweety19 and Crewmate. /note=Starterator: Start site 40 is called in 29 of the 58 non-draft genes in the pham. Of the phams that contain this start site, all but one (Percival) has called it. /note=Location call: This is a real gene and the start site is most likely 31914. This is supported by Glimmer, GeneMark, and Starterator. /note=Function call: SprT-like protease. All results on phagesdb BLAST call this function (top three are e <= 1e-101). The top ten results on NCBI Blast call this function (top three are e <= 1.03157e-127, with 100% coverage and >90% identity). CDD and HHpred support this function (HHpred top 3 e <= 1.7e-9, but only found hits in non-viral genomes). /note=Transmembrane domains: Neither TMHMM nor TOPCONS predicts any transmembrane domains, therefore this gene does not encode a membrane protein. /note=Secondary Annotator Name: Tarighat, Asefeh /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 32578 - 32718 /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="Snek_45" /note=Original Glimmer call @bp 32578 has strength 11.19; Genemark calls start at 32578 /note=SSC: 32578-32718 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_46 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 3.0249E-25 GAP: 67 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.165, -2.2364030944336752, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_46 [Arthrobacter phage Tweety19]],,QNO12705,100.0,3.0249E-25 SIF-HHPRED: SIF-Syn: NKF; upstream gene is a sprt-like protease just like in phage Tweety19. The downstream gene is NKF and matches the corresponding gene (from Pham 105490) in size and has the same label color on Pham Maps when compared with Tweety19. However the gene for Tweety19 on Pham Maps does not have a function listed. Because the two genes are connected by a line, they have synteny despite this. The neighboring upstream genes and the neighboring downstream genes also show synteny. /note=Primary Annotator Name: De Jesus, Jorja /note=Auto-annotation: Glimmer Start@32578, GeneMark: Start@32578, start codon sequence: ATG /note=Coding Potential: The coding potential for this sequence is high and is covered by the start site. On both the Host-trained and self-trained reports, this is true. /note=SD (Final) Score: The SD (Final) Score is -2.236. This is the only provided SD score. /note=Gap/overlap: There is a gap of 67 bp between this gene and the upstream gene and a gap of 76 bp between this gene and the downstream gene. Both gaps are slightly larger than usual but on GeneMark there is no coding potential where those gaps would be, so they are reasonable enough. /note=Phamerator: As of 04/22/2022 this gene is a part of the 6637 pham. It is conserved and found within Tweety19 and TripleJ. Tweety19 is the only pham member within the same cluster as Snek. TripleJ is a singleton, and the rest are not within the same cluster. /note=Starterator: The start site is site 16 and the start position should be 32578. This matches the start site listed by Glimmer and GeneMark. This position also has 5 manual annotations which counts for 5 out of the 6 total manual annotations. The remaining annotation was for site 15 which is only found in 4/12 genes whereas site 16 is found in 10/12 genes of the pham. /note=Location call: This is most likely a real gene and the evidence indicates that 32578 should be the start site of this gene. Regardless of the other potential sites or start positions listed on Starterator, start position 32578 is the only candidate listed on PECAAN so it is the most likely start site. /note=Function call: NKF. There is no data for this gene under Phagesdb Function Frequency. The top 10 hits on Phagesdb BLAST (with e-values of 5e-14 to 2e-22) all indicate that this gene’s function is unknown. The top 2 hits were marked as evidence because they show good e-values. On PECAAN, HHPRED’s hit with the highest probability has a probability of 73.1% which is moderately high, does not have high coverage (63%), and does not have a low e-value (4.9). On the official HHpred program, the top hits include 3OT2_A which lists the gene as an uncharacterized protein and 2CBP_A which lists the gene as cucumber basic protein. However all of these hits have poor identity, alignment, and e-values so they were not checked as evidence. NCBI BLAST had hits with high identity, low e-values, and good scores so the top 2 hits (matching the NIH NCBI BLAST results) were recorded as evidence. CDD has no available data for this gene. /note=Transmembrane domains: TMHMM indicates that there are 0 predicted TMHs. Topcons has no data for this gene. With these two factors considered, this is not a transmembrane protein. /note=Secondary Annotator Name: Bradley, Allyssa /note=Secondary Annotator QC: Maybe elaborate in final score column that it`s the only option, and say if that`s a good gap to have? I think starterator should be a fraction if it`s one of the common start sites. Overall though looks good! CDS 32794 - 33606 /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="Snek_46" /note=Original Glimmer call @bp 32794 has strength 17.59; Genemark calls start at 32794 /note=SSC: 32794-33606 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CREWMATE_54 [Arthrobacter phage Crewmate]],,NCBI, q2:s3 99.6296% 1.22565E-104 GAP: 75 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.726, -3.133663537764895, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CREWMATE_54 [Arthrobacter phage Crewmate]],,UIW13306,78.2288,1.22565E-104 SIF-HHPRED: SIF-Syn: NKF, upstream gene (pham#6637) is NKF, downstream gene (pham #87679) is NKF, just like in Tweety19 as of 5/5/22. /note=QC: Noah Canio- Changed Starterator dropdown to "suggested start." Just because the most annotated start site is not called doesn`t mean that the call is uninformative. Starterator called a start site that is called 100% of the time it exists in non-draft phage genomes; therefore, it is a suggested start. /note=Primary Annotator Name: Rivera, Wendy /note=Auto-annotation: Gene (stop@33606 F); Both; Glimmer and GeneMark both agree on start site at 32794 with start codon ATG /note=Coding Potential: Reasonable coding potential; and start site is included in coding potential /note=SD (Final) Score: SD score of -3.134 is the best with a z score of 2.726 /note=Gap/overlap: No overlap; gap of 75 bp upstream of the gene is reasonable. Length of gene is acceptable with the chosen start site. The candidate start site with the longest ORF was chosen as the best start site. /note=Phamerator: As of 4/25/22, the gene is found in pham 104648 and the pham numbers did not change frequently over time. Yes, the pham in which the gene is conserved is found in other members of the same cluster like Tweety19 and TforTory. Phamerator did not have a function for this gene. /note=Starterator: Yes, there is a reasonable start site choice that is conserved among the members of the pham that the gene belongs to such as Tweety19 and TforTory. Start site number 7 at 32794 bp. 7/12 called for start site 7. /note=Location call: Considering all evidence from above, this gene is a real gene and has a start site @ 32794 bp. Starterator agrees with Glimmer and Genemark. /note=Function call: The top 2 NCBI BLASTp hits that are non-drafts are Tweety19 with an E value of 9e^-118 and Crewmate with an e value of 3e^-70. Both phages have no suggested function, but Tweety19 has a Query coverage of 100%, a percent identity of 99.63%. While Crewmate has a Query coverage of 98% and a percent identity of 67.04%. For CDD and HHpred, there were no significant hits/informative results that revealed the function of the gene and is recorded as NKF. Also, PhagesDB BLAST results show that non-draft phages Tweety19 and Crewmate display Function Unknown and have an e-value of e^-148 and 2e^-94 respectively. /note=Transmembrane domains: No transmembrane proteins were predicted in TMHMM or in Topcons therefore it’s not a membrane protein. /note=Secondary Annotator Name: Han, Maggie /note=Secondary Annotator QC: I have QC`d this gene and agree with the annotation. CDS 33662 - 33787 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="Snek_47" /note=Original Glimmer call @bp 33662 has strength 18.99; Genemark calls start at 33662 /note=SSC: 33662-33787 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_48 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 97.561% 1.0142E-19 GAP: 55 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.112, -4.426679849915311, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_48 [Arthrobacter phage Tweety19]],,QNO12707,97.561,1.0142E-19 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF and downstream gene is NKF just as in phage Tweety19. The upstream genes are both in pham 104648, and the downstream genes are both in pham 89053. /note=Primary Annotator Name: Farazi, Sepp /note=Auto-annotation: Glimmer and GeneMark. Both call the gene and agree on 33662. Start codon ATG /note=Coding Potential: Coding Potential is strong throughout the entire ORF. Found in forward direction on both GeneMark Self and Host. The start site suggested by glimmer and genemark does not cover all the coding potential. /note=SD (Final) Score: -4.427. Not the best score, still reasonable for credible RBS. /note=Gap/overlap: Gap: 52bp. Mildly large gap, same gap exists in Tweety19 phage, no coding potential in gap. Acceptable gene length of 126bp. /note=Phamerator: pham: 87679. Date 4/25/2022. It is conserved; found in Tweety19(AZ). No function call for this gene by Pham data base. /note=Starterator: There is only 1 non draft phage in this pham. 1/1 non-draft genes were manually annotated for start site 1. Start 1 is conserved and is 33662 in Snek, which agrees with Glimmer and GeneMark. /note=Location call: This is a real gene with a start site of 33662. Despite not covering the entire coding potential, it is still the most likely start site for this gene. /note=Function call: No known function. The top hit on phagesDB was no known function in Tweety19 with an E value of 7e-18. The top hit on NCBI blast called was for a hypothetical protein in Tweety109 with 100% identity, and an e value of 6e-17. HHpred had no relevant hits, CDD had no hits at all. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bagdatli, Dila /note=Secondary Annotator QC: Given the evidence, I agree with this annotation. There is no start site that can cover all of the coding potential, therefore, this start site is the most likely. CDS 33859 - 34101 /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="Snek_48" /note=Original Glimmer call @bp 33859 has strength 15.98; Genemark calls start at 33859 /note=SSC: 33859-34101 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_49 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.50918E-52 GAP: 71 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.36, -3.8308929311341546, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_49 [Arthrobacter phage Tweety19]],,QNO12708,100.0,1.50918E-52 SIF-HHPRED: SIF-Syn: NKF, Comparison with Tweety19 shows synteny given by the identical phams of the genes both upstream and downstream of the gene. These genes also have unknown functions. The upstream gene is a member of pham 87679, and the downstream genes is a member of pham 11961. /note=Primary Annotator Name: Bagdatli, Dila /note=Auto-annotation: Glimmer, GeneMark with start site 33859 and start codon ATG. /note=Coding Potential: The coding potential is high for the region between the start and stop sites and the start site covers all of the coding potential. /note=SD (Final) Score: -3.831 is the best RBS score and is lower than the alternative start sites. /note=Gap/overlap: There is a 71 bp gap between the start of this gene and the stop of the previous gene. This is also the longest ORF. This is an acceptable gap that does not require the insertion of a new gene and cannot be filled by changing the start site gene because there is no coding potential before the start site and there are no start sites before the chosen start site of the gene. /note=Phamerator: The gene is found in the pham 89053 (4/22/2022). All of the members of this pham belong to the cluster AZ that Snek belongs to. The pham is conserved in other members of the same cluster. There was no function called for this gene. /note=Starterator: The most conserved start site is present and called as the start site of this gene. The start site is 1 and is located at 33859 bp in this gene. There is another gene that is identical to this Snek gene that called the same start site which supports the fact that this is the correct start site. 4/4 of the members of this pham call this start site 1. /note=Location call: The coding potential, synteny, and starterator support the fact that this is a real gene, and the start site at 33859 is the best candidate as it covers all of the coding potential and the other start site has a very large gap which most likely requires the insertion of a new gene. This candidate also has the best RBS score, the smallest gap and is the longest ORF. It is also predicted by starterator and is the most conserved start site among the members of the same cluster in this pham. /note=Function call: There is no function call by NCBI BLAST although there are strong hits with a hypothetical protein of the phages Tweety19 and Liebe (e-value 16). PhagesDB blast also had strong hits with proteins of Tweety19, Liebe, and Maureen (e-value < e-27), however, these proteins had unknown functions as well. /note=Transmembrane domains: None found in TMHMM or Topcons, not a membrane protein. /note=Secondary Annotator Name: Farazi, Sepp /note=Secondary Annotator QC: I QC`d this location call and agree with the primary annotator. Very detailed and thorough. CDS 34211 - 34417 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="Snek_49" /note=Original Glimmer call @bp 34211 has strength 17.6; Genemark calls start at 34211 /note=SSC: 34211-34417 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_50 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.84141E-37 GAP: 109 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.263, -2.033982896655645, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_50 [Arthrobacter phage Tweety19]],,QNO12709,100.0,1.84141E-37 SIF-HHPRED: SIF-Syn: NKF, upstream gene is pham 89053, downstream gene is pham 92039, just like in Tweety19. /note=Primary Annotator Name: BarcikWeissman, Sara /note=Auto-annotation: Both Glimmer and GeneMark call the same start site. This auto-annotated site is at 34,211 bp. The start codon is ATG. /note=Coding Potential: There is coding potential throughout the ORF which appears to be encapsulated by the auto-annotated start and stop in GeneMark host. GeneMark self’s coding potential starts very slightly sooner in the genome so some very small amount of coding potential may have been not included by the start site (though all appears to be included near the stop). /note=SD (Final) Score: The final score for the auto-annotated start is the best by a significant margin but the longest open reading frame (not the auto-annotated start) also yields a reasonable SD score. The RBS score is of consequence here because the gap between this gene and the upstream one indicates it is not apart of an operon, making the quality of the ribosome binding site an important factor. /note=Gap/overlap: There is a gap of 109 base pairs for the auto-annotated start which is slightly large. This may be evidence for choosing a different start that creates a longer (the longest of the predicted starts) ORF, reducing the gap to 61 bp (a more reasonable level closer to the 50 bp we might expect for the apparent ORF switch we see here going from the upstream gene to this one), and encapsulating all the coding potential that was barely missed in the self trained genemark. In both cases, the gene is over 200 bp long which is an acceptable gene length. /note=Phamerator: The gene is found in pham 11961 as of 4/27/22. This gene appears to only be present in members of cluster AZ, including Amyev and DrSierra. The function has not been called and is listed as unknown for all annotated non-draft phage genomes. /note=Starterator: The auto-annotated start site 10 which is present at 34,211 bp is conserved in all phages containing this gene. Of the 26 members of this pham, all 18 of the non-draft genomes call this as the start site. This is an informative number of non-draft genomes. /note=Location call: Evidence including coding potential, phamerator conservation, gene length, and more suggests that this is a real gene. Further evidence including having the best RBS and Z scores, encapsulating nearly all (if not all) the coding potential, and having a 100% conservation and manual call rate in starterator indicate that site 10 at 34,211 base pairs is the appropriate start site. /note=Function call: There is no known function (NKF). All BLAST hits were for genes with unknown functions. PhagesDB function frequency reported hits in completely different phams from the gene of interest, with two completely different functions, indicating that it is not good evidence for function. CDD showed no conserved domains and HHpred only had hits with very high e values and functions that didn’t make sense, indicating that this evidence is not significant. /note=Transmembrane domains: There are no TMDs predicted by TMHMM or TOPCONS. As a result, this is unlikely to be a membrane protein. /note=Secondary Annotator Name: Chen, Daniel /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 34420 - 34545 /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="Snek_50" /note=Original Glimmer call @bp 34420 has strength 6.09; Genemark calls start at 34420 /note=SSC: 34420-34545 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_51 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 4.39911E-20 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.944, -6.856994079240039, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_51 [Arthrobacter phage Tweety19]],,QNO12710,100.0,4.39911E-20 SIF-HHPRED: SIF-Syn: NKF, upstream gene is pham 11961, downstream gene is serine integrase, just like in phage Tweety19 /note=QC: Noah Canio- Synteny box: information was incorrect, upstream gene is gene to the left while downstream is gene to the right. Changed upstream to be pham 11961 while downstream changed to be serine integrase /note=Primary Annotator Name: Bradley, Allyssa /note=Auto-annotation: Glimmer & GeneMark agree; start@34420; ATG /note=Coding Potential: Very high at first 75% of ORF; covers all coding potential. /note=SD (Final) Score: -6.857. This isn`t the best score but it is very similar to the second option. I still think this is the correct start site because of gap length. /note=Gap/overlap: 2 - This suggests the presence of an operon. While not ideal ([-1,0,1]), I think this is better than 41. Length is 125, which is longer than the alternative one and same length as in Tweety19. /note=Phamerator: Pham 92039, as of 4/26/2022. Yes, the one other phage (tweety19) is also an AZ phage. No function /note=Starterator: Yes, this pham only has 1 other phage which chose the one listed start site option. Start site number 1 at basepair 34420. 1/1 calls site #1. /note=Location call: Although this ORF is pretty short and there is very little evidence for this gene`s start site, I think that Start Site 1 at basepair 34420 is correct because this phage is so similar to Tweety19, so I think this gene would have the same start site as its counterpart in Tweety19. /note=Function call: NKF. The data culminate to find no known function [NKF] for this gene; synteny for it is found only in Tweety19 where its function has also not yet been determined, and NCBI BLAST identified this synteny with an e-value of 2e-17. No proteins at all were found in CDD, and none with significant e-values were found in HHpred. /note=Transmembrane domains: None called in TMHMM or TOPCONS. /note=Secondary Annotator Name: Doan, Pearl /note=Secondary Annotator QC: I agree with this call. Glimmer, GeneMark, and Starterator are all consistent with the start site at 17230 and all other categories of evidence have been considered. PhagesDB Blast and NCBI blast also show that this gene has no known function. CDS 34577 - 36082 /gene="51" /product="gp51" /function="serine integrase" /locus tag="Snek_51" /note=Original Glimmer call @bp 34634 has strength 12.67; Genemark calls start at 34634 /note=SSC: 34577-36082 CP: yes SCS: both-cs ST: SS BLAST-Start: [serine integrase [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 0.0 GAP: 31 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.228, -4.2482196028815515, no F: serine integrase SIF-BLAST: ,,[serine integrase [Arthrobacter phage Tweety19]],,QNO12711,99.6008,0.0 SIF-HHPRED: INTEGRASE; HYDROLASE, SERINE RECOMBINASE, UNIDIRECTIONAL, SITE-SPECIFIC RECOMBINATION; 2.15A {STREPTOMYCES PHAGE PHIC31},,,4BQQ_B,65.0699,100.0 SIF-Syn: serine integrase, upstream gene is in pham 92039, downstream gene is in pham 104530 (as of 5/6/2022), which is also found in Tweety19. /note=QC: Noah Canio- Changed Starterator dropdown to "suggested start." Just because the most annotated start site is not called doesn`t mean that the call is uninformative (it does not have this start site, so the program didn`t call it). Starterator called a start site that is called for a few non-draft phage genomes. /note=Primary Annotator Name: Ramesh, Naren /note=Auto-annotation: Both agree on the same start site, 34634. The start codon is ATG. /note=Coding Potential: The gene does have a reasonably high coding potential within the ORF in question. The chosen start site covers all this coding potential. /note=SD (Final) Score: 7.561. It is not the best final score on PECAAN. The best final score is -4.248 for start site 34577. /note=Gap/overlap: 88bp. It’s large, and there is a smaller gap in start site 34577. /note=Phamerator: 78437. Date 04/25/2022. It is conserved; found in Tbone, Tweety19, Warda, Yang, YesChef, Adolin, Adumb2043, all part of AZ. /note=Starterator: Start site 104 in Starterator was manually annotated in 4/5 phages and is consistent with data from Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 34577. There is inconsistent data but the auto-annotated start site of 34634 is rarely found and rarely manually annotated. Ultimately, I picked 34577 as it has a significantly better Z-score and Final score. /note=Functional call: multiple phagesDB BLAST has hits with the suggested function serine integrase with small e values of 0 to 1e-166. HHPRED has hits that correspond to unique SEA-PHAGES requirements for this gene. Has alignment with QNO12711, a known serine integrase, with 99.6% alignment, 100% coverage, and e-value of 0) and has hits with two serine integrases in HHPRED. /note=Transmembrane domains: TMHMM predicts no TMDs. TOPCONS predicts no TMDs. This makes sense given the type of protein. /note=Secondary Annotator Name: Hsu, Norman /note=Secondary Annotator QC: Please fill out the Synteny section. Other than that, I agree with the primary annotator. CDS 36382 - 36615 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="Snek_52" /note=Original Glimmer call @bp 36382 has strength 12.96; Genemark calls start at 36382 /note=SSC: 36382-36615 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_53 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 3.06734E-40 GAP: 299 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.274, -2.58321678164666, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_53 [Arthrobacter phage Tweety19]],,QNO12712,98.7013,3.06734E-40 SIF-HHPRED: SIF-Syn: NKF, gene_52 has synteny with Tweety 19. the upstream gene is an integrase (conserved in Tweety 19) and the downstream gene is a RNA binding protein (conserved in Tweety 19). It does not have true synteny with Reedo because there was a gene insertion in Snek at gene 53 but the other genes in the neighborhood have synteny /note=QC: Noah Canio- Synteny box: added "NKF" for the function of this gene. /note=Primary Annotator Name: Rajshree Chettiyar /note=Auto-annotation: called by both Genemark and Glimmer, start site @36382 /note=Coding Potential: very good coding potential in both host and self-trained, coding potential covers after start site /note=SD (Final) Score: only one possible start site given by PECAAN with RBS final score = -2.583 and z-score = 3.274 /note=Gap/overlap: 299 gap with upstream gene - same gap and upstream gene found conserved in other phage genomes using pham map (ex. Asa16, DrManhattan). 4bp overlap with downstream gene. length of gene with given ORF = 233bp /note=Phamerator: On April 25, 2022 gene is part of pham 104530. This pham has 35 members, and of the 35, 27 are non-draft genes. all genes in this pham are from phages in cluster AZ. /note=Starterator: (Start: 7 @36382 has 22 MA`s), this start site has been called in 22 of 23 non-draft genes and it is called 97.1% of the time it is present in the gene. /note=Location call: start site @36382 given z-score coding potential, and starterator /note=Function call: This protein’s function is unknown because CDD and HHpred gave no good matches but PhagesDB Blast gave very good matches with other AZ phages (Tweety 19 and Reedo) so the gene is a hypothetical protein. /note=Transmembrane domains: 0 TM domains according to TMHMM and SOSUI /note=Secondary Annotator Name: Ramesh, Naren /note=Secondary Annotator QC: I agree with this annotation and function call. All of the evidence categories have been considered. CDS 36612 - 36860 /gene="53" /product="gp53" /function="RNA binding protein" /locus tag="Snek_53" /note=Original Glimmer call @bp 36612 has strength 19.56; Genemark calls start at 36612 /note=SSC: 36612-36860 CP: yes SCS: both ST: SS BLAST-Start: [RNA binding protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.93757E-50 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.873, -2.9063687850157054, yes F: RNA binding protein SIF-BLAST: ,,[RNA binding protein [Arthrobacter phage Tweety19]],,QNO12713,100.0,1.93757E-50 SIF-HHPRED: SM-ATX ; Ataxin 2 SM domain,,,PF14438.9,52.439,97.5 SIF-Syn: Through Pham maps, it is seen that the gene is conserved and has synteny with tweety 19. The upstream and downstream portions near 53 are conserved and have synteny. Both genes that are directly upstream and downstream seem to be NKF, but the pham numbers are the same. Directly upstream the pham number is 104530, and directly downstream the pham number is 37057. (These pham numbers are as of 5/62022) /note=Primary Annotator Name: Pena, Melina /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 36612.The start codon is called as GTG. /note=Coding Potential: Coding potential is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host and the start and stop site encapsulates the entire gene. /note=SD (Final) Score: -2.906 This is the best final (least negative) score on PECAAN. The Z score is 2.873. /note=Gap/overlap: -4. This is based on the gene candidates section on PECAAN. This could indicate that it is part of an operon. The overlap with the upstream gene is reasonable and the length of the gene is 258bp which is acceptable. /note=Phamerator: Pham number 34460. Date 04/22/2022. It has 3 members in the pham, one of which is a draft. It is conserved; found in other AZ phages, and it is found in Tweety19. According to PhagesDB this might be an RNA binding protein. /note=Starterator: There are only 3 members of this Pham. 1/2 non-draft members call start site 3. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 36612 bp and at start site 3. Both Glimmer and GeneMark call the start at 36612. /note=Function call: Based on the evidence given to me, this seems to be a RNA binding protein. There were only a couple NCBI BLASTp hits but based on the information there, sorted by E-value, the suggested function is an RNA binding protein with high query coverage (100%), for the first hit and 60 % for the next hit. There is high % identity (100%), for the first hit. It is important to note that the HHpred E-values are low (1.6e-3 and 2.7e-2.) but there are only 3 phages in the pham to compare it to. However, NCBI BLASTp had e values that were below 1e-5. Phages DB Blast also seemed to agree with the gene coding for an RNA binding protein through noticing the same shared pham. Also, CDD did not offer any information. /note=Transmembrane domain: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Vu, Trinity /note=Secondary Annotator QC: Overall, I agree with the function call but HHpred notes mention 2 top hits while only 1 hit is checked off in the box. Additionally, it could be beneficial to also add a note about phagesdb blast in your function call. CDS 36857 - 37057 /gene="54" /product="gp54" /function="RNA binding protein" /locus tag="Snek_54" /note=Original Glimmer call @bp 36857 has strength 12.39; Genemark calls start at 36857 /note=SSC: 36857-37057 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_55 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.08594E-38 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.873, -2.9063687850157054, yes F: RNA binding protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_55 [Arthrobacter phage Tweety19]],,QNO12714,98.4848,1.08594E-38 SIF-HHPRED: Host Factor for Q beta; Hfq, hexamer, RNA binding protein, translational regulator, Sm motif, TRANSLATION; HET: ACY; 1.55A {Staphylococcus aureus} SCOP: b.38.1.2,,,1KQ1_M,77.2727,95.4 SIF-Syn: This gene has no known function. The gene upstream Pham is 34460 and the function is RNA binding protein. The gene downstream Pham is 105494 and the function is an RNA binding protein. This is identical to Tweety. /note=QC: Noah Canio- Changed Starterator dropdown to "suggested start." Just because the most annotated start site is not called doesn`t mean that the call is uninformative (it does not have this start site, so the program didn`t call it). Starterator called a start site that is called for a non-draft phage. /note=Primary Annotator Name: Doan, Pearl /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 36857. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: SD (Final) Score: -2.963. It is the best final score on PECAAN. /note=Gap/overlap: Upstream overlap: 1 base pair. This overlap is fine and suggests that this gene is a viable gene. Downstream gap: 30 base pairs.This gap is medium-sized and cannot fit other genes, and there is no coding potential in the gap that might be a new gene. /note=Phamerator: Phamerator: Pham: 55203. Date 4/22/2022. It is conserved; it is found in Adolin (AZ) and Crewmate (AZ). There is an unknown gene with unknown function. /note=Starterator: Start site 9 in Starterator was manually annotated in 1/36 non-draft genes in this pham. Start 9 is 36857 in Starterator. This was not the most conserved start site (start site 6) which Snek did not contain. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 17,230. /note=Function call: Unknown function. The top three phagesdb BLAST hits have the function of unknwon function (E-value <10^-31), and 3 out of 5 top NCBI BLAST hits also have the function of unknown protein. (100% coverage, 98%+ identity, and E-value <10^-38). HHpred had no relevant hits. CDD had no relevant hits and HHPred shows a protein with no known function ( e value = 0.014). /note=Transmembrane domains: No transmembrane domains are detected. TMHMM and Sosui does not detect any transmembrane domains. /note=Secondary Annotator Name: Hsu, Norman /note=Secondary Annotator QC: I have QC’ed this location call and mostly agree with the primary annotator. Please edit the Synteny section, and include the function of the genes upstream and downstream. Also, please add a phage that displays synteny with your phage. CDS 37054 - 37248 /gene="55" /product="gp55" /function="RNA binding protein" /locus tag="Snek_55" /note=Original Glimmer call @bp 37054 has strength 13.77; Genemark calls start at 37054 /note=SSC: 37054-37248 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_56 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.425E-37 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.455, -4.540783695108708, yes F: RNA binding protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_56 [Arthrobacter phage Tweety19]],,QNO12715,100.0,1.425E-37 SIF-HHPRED: RNA-binding protein Hfq; Hfq, Caulobacter, sRNA, RNA-protein interactions, natively unstructured protein, RNA BINDING PROTEIN; 2.15A { Caulobacter vibrioides CB15} SCOP: b.38.1.0,,,6GWK_O,93.75,96.1 SIF-Syn: pham number 105864 as of 05/09/2022, the upstream gene, pham number 55203 as of 05/09/2022, and the downstream gene, pham number 625 as of 05/09/2022 just like in phage Tweety19. (No known function for any of the genes) /note=QC: Noah Canio- Changed Starterator drop-down to be "suggested start (ss)" /note=Primary Annotator Name: Tarighat, Asefeh /note=Auto-annotation: Both Glimmer and Genemark marked the start site @37054 bp. /note=Coding Potential: Coding potential in this ORF is on the forward strand only (no frame switching), indicating that this a forward gene. very high potential based on host and self host, glimmer, and genemark. synteny (specially with Tweety 19), gene is also long enough for a real gene. /note=SD (Final) Score: -4.51 (highest/best score on PECAAN) /note=Gap/overlap: -4 bp overlapping with an upstream gene. Should be a part of an operon. /note=Phamerator: The pham number as of 02/26/22 is 103210. The gene is well conserved with either 192 or 195 bp length in the pham members. The start site is conserved in other pham members such as Tweety19 and Yang all in the same cluster, AZ. /note=Starterator: Start site number 4 @37054 bp.13/13 MA agree on this as well as the auto annotated start site. Start site conserved in the pham. This evidence agrees with Genemark and Glimmer call. /note=Location call: Based on all the evidence, this is a real gene with the SS of 37054. /note=Function call: CDD had no relevant calls. phagesdb BLAST and NCBI BLAST had no function called, "hypothetical function" with strong hits. ( e-vlaue less than 10e-22) , (high coverage more than 90%), (high probability, above 94%). There is a function called on HHPRED, the function is called as RNA binding protein but since the e-value is not low enough (almost 0) and HHPRED is not the best tool to rely on, as well as Tweety19n not calling a function for this gene, i will call no known function for this gene. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. This supports other evidences. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 37245 - 37406 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="Snek_56" /note=Original Glimmer call @bp 37245 has strength 9.93; Genemark calls start at 37245 /note=SSC: 37245-37406 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_57 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.39974E-28 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.952, -4.820269098574683, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_57 [Arthrobacter phage Tweety19]],,QNO12716,100.0,1.39974E-28 SIF-HHPRED: SIF-Syn: This gene (NKF and pham 625 on 5.6.22) is upstream of an RNA binding protein (pham 105494) and down stream of a gene in pham 99335 (NKF) followed by a membrane protein in pham 105616 just like in phage KeAlii. /note=Primary Annotator Name: Kidd, Conner /note=Auto-annotation: 37245 (Glimmer and Genemark agree) ATG. /note=Coding Potential: Yes, this gene shows strong evidence for coding potential due to synteny, favorable BLAST results, and synteny with phages Tweety19 and Reedo. Additionally, this start site includes all of the coding potential for this gene. /note=SD (Final) Score: -4.820 /note=Gap/overlap: -4, likely part of an operon. /note=Phamerator: Pham 625 on 4.25.22. Pham is only present in other AZ phages and the pham has a total of 4 members including Snek (3 others are non-draft). These phages are Tweety19, Reedo, and KeAlii. /note=Starterator: Start site 2 @37245 is the most conserved start site present in this phage with 2 MA`s out of 3 total genomes. 2/3 nondraft annotations call this start site (100% of the time when present). /note=Location call: 37245 because of favorable Z-score, most conserved status, and a gap/overlap corresponding to a operon. Likely a real gene. /note=Function call: Unknown, related genes in phages Tweety19 and Reedo (e-values less than 1e-7) do not have any known function. Therefore, the function cannot be reasonably hypothesized based on BLASTp results. Additionally, there were no CDD hits at all and no HHpred hits with e-values below 3.1 (or hits to other phages). /note=Transmembrane domains: No predicted TMDs. /note=Secondary Annotator Name: Hernandez, Betania /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. **Note: fill out auto-annotation section, missing some details in coding potential (include if gene covers all of coding potential), SD score (state if it was the best), and overlap sections (include if length of gene acceptable), might want to state outright that the gene is real in the location call section (just make sure you`re addressing all points from the annotation manual)** CDS 37403 - 37558 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="Snek_57" /note=Original Glimmer call @bp 37403 has strength 15.02; Genemark calls start at 37403 /note=SSC: 37403-37558 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_58 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 3.75152E-26 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.93, -2.996490344739583, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_58 [Arthrobacter phage Tweety19]],,QNO12717,100.0,3.75152E-26 SIF-HHPRED: SIF-Syn: NKF, upstream gene NKF is in pham 625, downstream gene membrane protein is in pham 105616, similar to Tweety19. /note=QC: Noah Canio- Changed Starterator dropdown to "suggested start." Just because the most annotated start site is not called doesn`t mean that the call is uninformative. Starterator called a start site that is called 100% of the time it exists in non-draft phage genomes; therefore, it is a suggested start. /note=Primary Annotator Name: Lisa Yan /note=Auto-annotation: This gene is called by both Glimmer and GeneMark, and both agree on the same start site of 37403 bp. The start codon is associated with 37403 bp is ATG. /note=Coding Potential: The coding potential in this ORF is on the forward strand only, and both Glimmer and GeneMark show high coding potential for this gene up to the chosen start site. /note=SD (Final) Score: The final score is -2.996, with a Z-score of 2.93, which are both the best scores on PECAAN. /note=Gap/overlap: The overlap is 4bp, which is reasonable because it would indicate that this gene is part of an operon, and the coding potential supports this claim. The length of the gene is also acceptable (156bp). The arrangement of genes in close succession is also supported by comparison to other phage genomes. /note=Phamerator: The gene is found in pham 99335 as of April 25, 2022. All other genes in the pham also belong to the same AZ cluster as Snek does, including genes from phages Berrie, Eraser, and DrManhattan. The function for this gene is currently unknown. /note=Starterator: The Most Annotated start site in this pham (start site 12) is not found in Snek, although the most reasonable start site for Snek is 37403bp at start site 14, which is found in 4/25 manual annotations of genes in this pham. Starterator is mostly uninformative because there are a diverse range of start sites for genes in this pham. /note=Location call: Based on the aforementioned evidence, the gene is a real gene and has a start site at 37403bp. /note=Function call: PhagesDB BLAST calls a few sequences with significant alignments including Tweety19 (e-value 5e-22), Reedo_54 (e-value 5e-17), and DrManhattan (e-value 1e-16). NCBI BLAST also calls similar phages with high % identity (from 85-100%). However, none of these were informative, as the hits also had genes with unknown function. CDD and HHpred are uninformative, there are no hits with low e-values to support a function for this gene. There were not transmembrane domains predicted. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any transmembrane domains, so it is not a membrane protein. /note=Secondary Annotator Name: Han, Maggie /note=Secondary Annotator QC: I have QC`d the gene and agree with the annotation CDS 37551 - 37787 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="Snek_58" /note=Original Glimmer call @bp 37551 has strength 13.57; Genemark calls start at 37551 /note=SSC: 37551-37787 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_59 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 9.14932E-46 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.592, -3.410574749511644, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_59 [Arthrobacter phage Tweety19]],,QNO12718,100.0,9.14932E-46 SIF-HHPRED: SIF-Syn: /note=DeepTMHMM detects a signal peptide, not true TMD. -AF /note=Primary Annotator Name: Olvera, Kevin /note=Auto-annotation:Gimmer and GeneMark: Glimmer start site: 37551, GeneMark start: 37551 /note=Coding Potential:Coding potential in this gene is in the forward direction. Coding potential is found in both GeneMark Host and GeneMark Self /note=SD (Final) Score:-3.411 This is the best score on PECAAN /note=Gap/overlap:-8 bp This is a small bp overlap but since it is not 4 or 1 bp long it is likely not part of an operon /note=Phamerator: Pham 103411- Date: 04/26/2022, Conserved in Reedo, Tweety19, and KeAlii /note=Starterator: Snek does not contain start 14 which is called in 147/200 non-draft genomes. Instead it contains start 12 located at 37551 on Snek with 23 MA’s; Starterator agrees with Glimmer and GeneMark /note=Location call: This is most likely a real gene with start site at 37751. /note=Function call: Membrane protein- All of the hits on PhagesDB BLAST displayed unknown function and the top two hits, Tweety19 and Reedo, had e-values of 7e-39 and 8e-9. CDD and HHpred did not yield useful results. Based on TMD evidence this is a membrane protein /note=Transmembrane domains: TMHMM had one hit and SOSUI had supporting evidence for TMD which indicates that this is a membrane protein. TOPCONS also suggests this as a membrane protein /note=Secondary Annotator Name: Grunden, Kyla /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 37917 - 38114 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="Snek_59" /note=Original Glimmer call @bp 37917 has strength 15.26; Genemark calls start at 37917 /note=SSC: 37917-38114 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_60 [Arthrobacter phage Tweety19]],,NCBI, q1:s22 100.0% 4.44647E-40 GAP: 129 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.028, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_60 [Arthrobacter phage Tweety19]],,QNO12719,75.5814,4.44647E-40 SIF-HHPRED: SIF-Syn: NKF, pham 104541, upstream gene is pham 105616, downstream is pham 18955, just like in phage Tweety19 /note=QC: Noah Canio- Synteny Box: added "NKF" /note=Primary Annotator Name: Hernandez, Betania /note=Auto-annotation: Both Glimmer and GeneMark call the start at 37917 bp (ATG codon). /note=Coding Potential: Both Glimmer and GeneMark show coding potential in the forward strand. The ORF contains most of the coding potential in this region. There is a small region of coding potential before the start site but the ORF contains all the highest peaks. /note=SD (Final) Score: -2.443. This is the best final score with a high Z-value of 3.028. /note=Gap/overlap: 129 bp gap. This is a large gap but reasonable since this gap is conserved in other final phage genomes such as Asa16 and Amyev. Gene length of 198 bp is acceptable. /note=Phamerator: Pham: 10454. Date: 4/25/22. The gene is conserved. It is found in phages such as Adolin, Amyev, London; all from cluster AZ. /note=Starterator: Start site 12 is manually annotated 10/20 non-draft genes in this pham. Start site 12 is 37917 bp in Snek. This start site agrees with the start site called by Glimmer and GeneMark. /note=Location call: The gathered evidence suggests this is a real gene with a start site at 37917 bp. /note=Function call: Multiple hits on phagesdb BLAST had e-values lower than 3e-17 but had no known function. NCBI BLAST had 5 hits of hypothetical proteins with e-values lower than 1e-19 but also had no known function. CDD and HHpred did not have any significant hits. /note=Transmembrane domains: Both TMHMM and TOPCONS do not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chew, Brandon /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 38237 - 38578 /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="Snek_60" /note=Original Glimmer call @bp 38237 has strength 17.14; Genemark calls start at 38252 /note=SSC: 38237-38578 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_61 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 6.04515E-75 GAP: 122 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.93, -2.707694805492614, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_61 [Arthrobacter phage Tweety19]],,QNO12720,100.0,6.04515E-75 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gleason, Zoe /note=Auto-annotation: Called by both Glimmer and GeneMark, but they do not agree on the start site. Glimmer-38237; GeneMark- 38252 /note=Coding Potential: The coding potential is very high throughout the entirety of the gene and is all in the forward direction, so the gene is in the forward direction. /note=SD (Final) Score: -2.708- This is the best final score and supports the start site. /note=Gap/overlap: 122. There is coding potential before my gene, but it is in the reverse direction, so it would not make sense to include an additional gene. There is not usually a gene before this present in other phage genomes. The ORF is 342, which is acceptable. /note=Phamerator: 18955- All genes in this pham are of phages from subcluster AZ so it is conserved. /note=Starterator: start site- 14 @38237. This start site is the most annotated start for this pham and matches the start site called by Glimmer. When this site is present it is called as the start of the gene 91% of the time. /note=Location call: 38237. This location was called by both Glimmer and Starterator. Based on the above data it is the best location call and it is a real gene. /note=Function call: Function unknown. PhagesDB and NCBI BLASTp match this gene well to other genes, but their functions are unknown. CDD does not have any hits and all the hits on HHpred have low similarity and high e values. /note=Transmembrane domains: There are no TMDs called by either TMHMM or TOPCONS. The hypothesized function of the gene is not known so the absence of TMDs neither validates nor invalidates the function. /note=Secondary Annotator Name: Mak, Amanda /note=Secondary Annotator QC: I agree with this call. All evidence categories have been considered and the data points to a start site of 38237. Alignment is consistent with PhagesDB and NCBI Blastp data and the lack of TMHMM and TOPCONS shows that it is not a membrane protein. CDS 38593 - 38904 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="Snek_61" /note=Original Glimmer call @bp 38593 has strength 8.6; Genemark calls start at 38593 /note=SSC: 38593-38904 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_62 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 6.48864E-69 GAP: 14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.176, -2.78563697942469, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_62 [Arthrobacter phage Tweety19]],,QNO12721,100.0,6.48864E-69 SIF-HHPRED: SIF-Syn: NKF, pham 104229. There is synteny upstream and downstream of the gene when compared to its closest related genome, Tweety19. Genes surrounding this gene have no known function as well: upstream gene is pham 18955, downstream is pham 89448 /note=QC: Noah Canio- Synteny box: added NKF and pham # of this gene, also added upstream and downstream gene pham #`s /note=Primary Annotator Name: Mehra, Muskaan /note=Auto-annotation: Glimmer and GeneMark. Both call the start site as 38593. Start codon is GTG. /note=Coding Potential: The coding potential for this ORF is on the forward strand only, thus confirming that this is a forward gene. Coding potential is found for this ORF for both GeneMark Self and Host-Trained formats. All coding potential is included within the suggested start codon. /note=SD (Final) Score: -2.786. This is the best (least negative) final score on PECAAN. /note=Gap/overlap: 14bp. Relatively small, no space for insertion of a new gene. No coding potential in any direction in this gap. /note=Phamerator: pham: 104229 on 04/25/2022. It is conserved, found in Tweety19 and KeAlli (both cluster AZ). /note=Starterator: Start site 37 is manually annotated in 27 phages. It is not the most annotated start site, however, it is called 100% of the time it is present. The most annotated start site is not present in Snek (38). /note=Location call: Based on the above evidence, this is a real gene with start site 38593. /note=Function call: NKF. The top three NCBI BLAST hits have no known function, with a good E-value (<10-57) and good coverage (>97%). The top three phagesDB BLAST hits also have no known function with a good E-value (<10-48). The E-value of all HHPred hits were bad so no conclusion can be drawn from this (best hit had an e-value of 0.25). CDD had no hits. /note=Transmembrane domains: Neither TMHMM nor TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chen, Daniel /note=Secondary Annotator QC: I have QC’ed this location call and agree with the first annotator. CDS 38904 - 39179 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="Snek_62" /note=Original Glimmer call @bp 38901 has strength 12.62; Genemark calls start at 38901 /note=SSC: 38904-39179 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein PQE19_gp46 [Arthrobacter phage Tweety19] ],,NCBI, q1:s2 100.0% 3.84769E-56 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.028, -4.049342652064859, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE19_gp46 [Arthrobacter phage Tweety19] ],,YP_010678453,98.913,3.84769E-56 SIF-HHPRED: SIF-Syn: NKF, upstream gene (pham # 104229) is NKF, downstream gene (pham #105187) is NKF, just like in phage Tweety19 as of 5/5/2022. /note=QC: Noah Canio- Changed Starterator dropdown to "suggested start." Just because the most annotated start site is not called doesn`t mean that the call is uninformative. Starterator called a start site that is called 100% of the time it exists in non-draft phage genomes; therefore, it is a suggested start. /note=Primary Annotator Name: Rivera, Wendy /note=Auto-annotation: Gene (stop@39179 F) Both; Glimmer and Genemark both agree on start site 38091 with start codon ATG /note=Coding Potential: Gene has reasonable coding potential predicted within the ORF; and start site is included in coding potential /note=SD (Final) Score: Has the best Final score of -3.350 with the highest Z score of 3.028 /note=Gap/overlap: There is a 4 bp overlap with the upstream gene which indicates that this gene is part of an operon and has a reasonable length. /note=Phamerator:As of 4/25/2022, the gene is found in pham 89448. Yes, the pham in which the gene is conserved is found in other members of the same cluster such as: Aodlin, DrManhattan, DrSierra, and Tweety19. No Phamerator does not list the function. /note=Starterator: Yes, there is a reasonable start site that is conserved among the members of the pham. Start number 7 @38901 bp. 7/18 call #7. /note=Location call: The evidence suggested that this gene is a real gene with a potential start site at 38901 bp. /note=Function call: The top 2 NCBI BLASTp hits are from Tweety19 with an e value of 5e^-57 and DrSierra with an e value of 2e^-40. Both phages have no known function but Tweety19 has a Query coverage of 100% and a % identity of 100%. As for DrSierra it has a Query coverage of 98% and a % identity of 76.92%. For CDD and HHpred, there were no significant hits/informative results that revealed the function of the gene and is recorded as NKF. Also, the results from PhagesDB BLAST from non-draft phages Tweety19 and DrSierra display function unknown with e values of 7e^-47 and 7e^-34 respectively. /note=Transmembrane domains: There are no TMDs as predicted by TOPCONS and TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Vu, Trinity /note=Secondary Annotator QC: I agree with the function call. It may be beneficial to also add how phagesdb blast also yielded NFK results. Additionally, you could also add if the upstream and downstream genes in the synteny box are in the same pham since they`re all NFK. CDS 39170 - 39511 /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="Snek_63" /note=Original Glimmer call @bp 39170 has strength 11.57; Genemark calls start at 39170 /note=SSC: 39170-39511 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_64 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.17758E-75 GAP: -10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.274, -2.2821867859826788, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_64 [Arthrobacter phage Tweety19]],,QNO12723,100.0,1.17758E-75 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vu, Trinity /note=Auto-annotation start source: Both GeneMark and Glimmer call the start site 39170 which is the GTG start codon. /note=Coding Potential: There is high coding potential in the putative ORF in the forward direction and the chosen start site covers all of the coding potential. /note=SD (Final) Score: The final score -2.282 is the best. /note=Gap/overlap: There is a 10 bp overlap which is reasonable. The only alternative start site results in a shorter ORF. /note=Phamerator: The gene is in the pham 103426 and is commonly annotated in other AZ phages such as Tweety19, DrManhattan, and Lego. There was no function called. /note=Starterator: The start site conserved among pham members is start site 31 which is called in 27/134 of the non-draft members (48/147 members including the drafts). Snek does not contain this start site. Snek`s suggested start is site 18. /note=Location call: This is a real gene whose start site is likely (start @39170), start site 18, because it covers all of the coding potential, there is no unreasonable gap, and this start site is called in 2 other phages. /note=Function call: Function unknown. All of the phagedb BLAST and NCBI BLAST hits have function unknown but because there are some published phage genomes with unknown function and the top phagesdb BLAST (2e-59) and NCBI BLAST (1.1e-75) hits have 100% identity sequence and 100% query coverage, it is evidence the gene is still real. There are no CDD hits. The top HHpred hit 5Z1V_B has a 83.14% probability (and all other hits have probabilities below 80%) and has the unknown function classification but all hits including the top one have e-values greater than 4 meaning that none of the hits are useful evidence. /note=Transmembrane domains: This is not a membrane protein as neither TOPCONS nor TMHMM predicts any TMDs. /note=Secondary Annotator Name: Tarighat, Asefeh /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 39505 - 39636 /gene="64" /product="gp64" /function="membrane protein" /locus tag="Snek_64" /note=Original Glimmer call @bp 39505 has strength 14.16; Genemark calls start at 39505 /note=SSC: 39505-39636 CP: no SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.87982E-20 GAP: -7 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.198, -2.0895162839948163, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Tweety19]],,QNO12724,100.0,1.87982E-20 SIF-HHPRED: SIF-Syn: NKF. Upstream gene is pham 107640 while downstream gene is pham 17297, just like phage Tweety19. /note=Primary Annotator Name: Tarighat, Asefeh /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 39505. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host- trained. The gene is also long enough to be a real gene. /note=SD (Final) Score: The final score is the best option at -2.090 and the z score is the highest at 3.198. /note=Gap/overlap: There is an overlap of -7 with an upstream gene. /note=Phamerator: The pham number as of May, 8th, 2022 is 87094. The gene is conserved in phage Tweety19 (AZ). There are only 2 members in this pham. They are both 132 bp long. No function is called for this gene. /note=Starterator: start site 1 which is the only manually annotated start site in 2/2 non-drafted genes in this pham. Start site 1 is 39505 in Snek and 39506 in Tweety19, the other member of the pham. This evidence confirms the auto annotated start site on Gene-mark and Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the start site is 39505. /note=Function call: NKF. There is only one strong corresponding hit on phagesDB BLAST (e-value = 6e-17) and NCBI BLAST, with a percent identity of 100%, percent coverage of 100%, and an e-value < 10e-19. The hit corresponds to a gene in phage Tweety19. HHpred and CDD had no relevant hits. /note=Transmembrane domains: There is one TMD predicted by TmHmm, but there are none predicted by TOPCONS (only signaling peptides). Interestingly enough, Tweety19`s NCBI BLAST entry for its related gene is designated "membrane protein." However, because the requirements for the functional call are not met, the function of this gene cannot be designated as a "membrane protein." /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 39626 - 39811 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="Snek_65" /note=Original Glimmer call @bp 39626 has strength 6.4 /note=SSC: 39626-39811 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_TWEETY19_66 [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 1.67902E-35 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.028, -3.095100142625534, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TWEETY19_66 [Arthrobacter phage Tweety19]],,QNO12725,100.0,1.67902E-35 SIF-HHPRED: SIF-Syn: NKF (pham 17297). There is synteny shared with Tweety19; both phages have genes in pham 87094 upstream and in pham 89129 downstream of this gene. /note=QC: Noah Canio- Synteny Box: added "NKF" and pham # for this gene. /note=Primary Annotator Name: Yan, Lisa /note=Auto-annotation: Only Glimmer calls the start site at 39626bp with stop codon GTG. GeneMark does not call a start site for this gene. /note=Coding Potential: The gene does have reasonable coding potential within the putative ORF, and the chosen start site does cover all of the coding potential in both GeneMark and Glimmer in the forward direction. /note=SD (Final) Score: The SD score is 3.028, which is the best score out of all of the possible start sites. The RBS final score is the least negative at -3.095, which is the best final score on PECAAN. /note=Gap/overlap: The overlap with the upstream gene is 11bp, but based on synteny with other genes as well as the length of the ORF, the overlap is reasonable and conserved. The length of the gene is 186bp, which is reasonable. /note=Phamerator: As of 4/27/2022, the gene is found in pham 17297. All other genes in this pham are also in phages in the AZ cluster, for example, Tweety19, Reedo, and Warda. The function is currently unknown for this gene. /note=Starterator: The reasonable start site for this gene is start site 9 at 39626bp, which is manually annotated in 23/24 of non-draft genes. The starterator data agrees with the Glimmer predicted start site, which also agrees with the final score and Z-score. /note=Location call: Based on the evidence, this is a real gene, and the most likely start site is at 39626bp. /note=Function call: Multiple phages in PhagesDB and NBCI BLAST have hits for this gene, such as Tweety19 (e-value = 3e-31) and Dr. Sierra (e-value = 3e-12). However, none of these phages have called functions for this gene. CDD and HHpred are both uninformative (e-values are high or no hits found). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any transmembrane domains, so it is not a membrane protein. /note=Secondary Annotator Name: Grunden, Kyla /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 39795 - 39935 /gene="66" /product="gp66" /function="membrane protein" /locus tag="Snek_66" /note=Original Glimmer call @bp 39795 has strength 13.42; Genemark calls start at 39795 /note=SSC: 39795-39935 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 100.0% 2.78499E-24 GAP: -17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.951, -2.66371296573402, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Tweety19]],,QNO12726,100.0,2.78499E-24 SIF-HHPRED: SIF-Syn: Membrane protein (pham 89129). Upstream gene is pham 17297 while downstream HNH endonuclease, just like in phage Tweety19. /note=Primary Annotator Name: Noah Canio /note=Auto-annotation: Glimmer and Genemark both show start at 39795 bp. /note=Coding Potential: A substantial amount of coding potential exists on both Host-trained GeneMark and Self-trained GeneMark. This exists on the forward strand moreso than the reverse strand, indicating that this is a forward gene. /note=SD (Final) Score: -2.664. This is the best score on PECAAN. /note=Gap/overlap: -17 bp. While this overlap is abnormally large, this feature is conserved in a related gene in phage Tweety19. /note=Phamerator: (As of 5/24/22), this gene is designated as part of pham 89129. Tweety19 is the only other phage with a gene that is a part of this pham as well. /note=Starterator: Starterator calls this gene at start 1, corresponding to 39795 bp. This gene is only shared with phage Tweety19, and its start site is also called at start 1. /note=Location call: The evidence provided suggests that this is a real gene with a start site at 39795 bp. /note=Function call: There is only 1 significant hit for both PhagesDB BLAST (e-value = 2e-20) and NCBI BLAST (% identity and % query coverage = 100%, e-value < 10e-23) for a hypothetical protein/unknown function. It is for a gene in phage Tweety19. HHPRED and CDD do not have any relevant hits. The evidence suggests that this gene has NKF. /note=Transmembrane domains: TmHmm calls 1 TMH while TOPCONS calls a TMH in 4/6 of its programs. This evidence suggests that the gene`s function can be designated as a "membrane protein." /note=Secondary Annotator Name: /note=Secondary Annotator QC: CDS 39932 - 40216 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="Snek_67" /note=Original Glimmer call @bp 39932 has strength 6.68; Genemark calls start at 39932 /note=SSC: 39932-40216 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage Tweety19] ],,NCBI, q1:s1 100.0% 5.24474E-62 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.158, -4.3323003410763175, yes F: hypothetical protein SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Tweety19] ],,YP_010678458,100.0,5.24474E-62 SIF-HHPRED: SIF-Syn: hnh endonuclease, upstream gene is in pham 89129, no downstream gene (as of 5/6/2022). upstream gene is found in Tweety19 but Tweety19 has one more downstream gene. /note=NKF per investigation during AZ harmonization -AF /note=Primary Annotator Name: Ramesh, Naren /note=Auto-annotation start source: Glimmer and GeneMark. Both call the start at 39932. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.332. It is the best final score on PECAAN. /note=Gap/overlap: Gap: -4bp. This is the most ideal gap and implies overlap. /note=Phamerator: pham: 16105. Date 04/27/2022. It is conserved; found in multiple phages in AZ. /note=Starterator: Start site 20 in Starterator was manually annotated in 1 non-draft gene in this pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 39932. /note=Function call: HNH Endonuclease is what is the most likely function according to Phagesdb and both BLASTs, with extremely high e-values for HNH endonuclease genes. HHpred and CDD don`t match with the suggested function from other resources, but their e-values are no significant enough to reconsider this function as HNH endonuclease. /note=Transmembrane domains: None are predicted by TOPCONS or TmHmm. This makes sense given the function. /note=Secondary Annotator Name: /note=Secondary Annotator QC: CDS 40495 - 40800 /gene="68" /product="gp68" /function="HNH endonuclease" /locus tag="Snek_68" /note= /note=SSC: 40495-40800 CP: no SCS: neither ST: NI BLAST-Start: [HNH endonuclease [Arthrobacter phage Tweety19] ],,NCBI, q1:s1 100.0% 4.37152E-66 GAP: 278 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.108, -2.8035499123971284, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Tweety19] ],,YP_010678459,99.0099,4.37152E-66 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,67.3267,98.1 SIF-Syn: