CDS 85 - 537 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="Tallboi_1" /note=Original Glimmer call @bp 85 has strength 10.86; Genemark calls start at 85 /note=SSC: 85-537 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Arthrobacter phage Iter] ],,NCBI, q1:s1 100.0% 1.90893E-101 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.224, -2.0162541296952132, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Arthrobacter phage Iter] ],,URQ04989,100.0,1.90893E-101 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_B,52.0,98.7 SIF-Syn: Terminase small subunit, no upstream gene, downstream gene appears to be terminase large subunit, just like in final phage Adolin. /note=Primary Annotator Name: Erfanian, Kiana /note=Auto-annotation: Glimmer and GeneMark. Both called the same start site of 85. /note=Coding Potential: This gene has good coding potential within the putative ORF, and covers all this coding potential. /note=SD (Final) Score: The RBS Final Score of -2.016 for the original start at 85 is the highest of the suggested starts, and therefore the best. The same is true of the Z-score for this start, at 3.224. /note=Gap/overlap: This gene has an overlap of 4 bp with its downstream gene because its stop is at 537, whereas the start of the downstream gene is 534. /note=Phamerator: This gene was found in pham 95633, which has 159 members. Due to a phamerator error, there are either 17 or 23 draft phages in this pham; the number is unclear at this time. Additionally, 27 phages in this pham belong to cluster AZ. /note=Starterator: Using information from the Starterator analysis run most recently on 1/7/22, it was found that the most conserved start site number is 41. The auto-annotated start is called at start number 41 (85), which matches the most conserved start. Phage Tallboi’s track contains start site 3 by a yellow line, which denotes it as an auto-annotated start. Start site 3 in Tallboi’s track corresponds to that of other phages in the cluster, such as phages Adolin and Crewmate. Start site 41 has been determined to be the Final Human Annotated start, as represented by a green line on the track representing these phages. The analogous start site between Tallboi and other phages in this cluster is therefore promising, indicating that the auto-annotated start site 41 at 85 is indeed correct. /note=Location call: The evidence gathered indicates that the suggested start site of 85 as called by Glimmer and GeneMark appears to be the most probable site. /note=Function call: Terminase small subunit. Both PhagesDB BLASTp and NCBI BLASTp have several hits with low e values, high identity percentages, and reasonable scores; all of which serve as strong evidence for the gene’s function. The top non-draft hit on PhagesDB BLASTp was for a gene in Amyev, a phage within the same cluster as Tallboi (AZ). This hit has a significantly low E-value at 2e-83 and reasonable score. Furthermore, the top non-draft hit on NCBI BLASTp was also for a gene from a phage (named Phives) in the same cluster as Tallboi. This hit has an E-value even lower than the first hit on PhagesDB of 3.795e-97, a reasonable score, as well as a very high identity percentage of 94.67%. Each of these first hits have functions of terminase small subunit. This is also the listed function for several other strong hits on both PhagesDB BLASTp and NCBI BLASTp whose data can reasonably serve as further evidence. Top hits for HHPRED alignments further point to terminase small subunit as the gene function. CDD however, provides no data regarding this gene. Given the above data, there is enough evidence to conclude that this is indeed the function of this gene. /note=Transmembrane domains: No TMDs called by TmHmm or TOPCONS. The protein is not a membrane protein. /note=Secondary Annotator Name: Ostroske, Elyse /note=Secondary Annotator QC: I have QC`d this gene and agree with the primary annotator. CDS 534 - 2240 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="Tallboi_2" /note=Original Glimmer call @bp 534 has strength 8.86; Genemark calls start at 534 /note=SSC: 534-2240 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.094, -4.394706008439538, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Arthrobacter phage London] ],,QOP64305,99.1197,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,93.3099,100.0 SIF-Syn: terminase, large unit, upstream gene is terminase, small unit, downstream is portal protein, just like in Asa16, DrManhatten, Elezi and a number of other phages in the same Cluster AZ /note=Primary Annotator Name: Khaine, Aye Myat /note=Auto-annotation: GeneMark and Glimmer. Both programs call the start site at 534. /note=Coding Potential: The ORF has good coding potential. The start site 534 includes all the coding potential. /note=SD (Final) Score: The final SD score is -4.395 with a z-score of 2.094. This is the best option since this start has the least gap/overlap upstream and is the LORF. /note=Gap/overlap: The overlap is 4 bp which indicates this gene could be part of an operon. /note=Phamerator: This gene belongs to pham 91114 as of 01/10/2022. This pham has 1002 members with more than 10 non-draft phages that belong to the same Cluster AZ. /note=Starterator: Start site 109 at 534 has been manually annotated 21 out of 935 times, all of which are phages of Cluster AZ. This agrees with the start site called by Glimmer and GeneMark. /note=Location call: Based on the evidence, the gene is real with the start site at 534. /note=Function call: Terminase large subunit. Top phagesdb BLAST hits have terminase large subunit as the function with e-values of 0. Top NCBI BLASTp hits also call the function as terminase large subunit with e-values of 0, 95%+ identity and 100% coverage. HHpred hits correspond to terminase large subunit with e-value of <1e-32 and 88%+ coverage. One CDD hit shows the gene belongs to the phage terminase family with 17% identity, 84% coverage and e-value of 9.3e-11. /note=Transmembrane domains: No TMDs predicted in both TMHMM and TOPCONS. This gene is not a membrane protein. /note=Secondary Annotator Name: Santos, Charysa /note=Secondary Annotator QC: All evidence supports the location (start @ 534) and function call (terminase, large subunit). CDS 2284 - 3660 /gene="3" /product="gp3" /function="portal protein" /locus tag="Tallboi_3" /note=Original Glimmer call @bp 2284 has strength 8.76; Genemark calls start at 2284 /note=SSC: 2284-3660 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 0.0 GAP: 43 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.224, -2.0162541296952132, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage London] ],,QOP64306,99.5633,0.0 SIF-HHPRED: PORTAL PROTEIN; BACTERIOPHAGE SPP1, DNA TRANSLOCATION, MOLECULAR MOTOR, VIRAL PORTAL PROTEIN, VIRAL PROTEIN; HET: CA, HG; 3.4A {BACTERIOPHAGE SPP1},,,2JES_Q,91.048,100.0 SIF-Syn: Function is portal protein. When comparing TabBoi to DrSierra and Lego, all genomes depict portal proteins at gene 3 and the downstream gene is terminase, large subunit and the upstream gene is VIP2-like ADP-ribosyltransferase toxin. /note=Primary Annotator Name: McLinden, Katherine /note=Auto-annotation: Glimmer and GeneMark both call the start site as 2284 and the Glimmer score is 8.76. The start codon at 2284 is ATG. /note=Coding Potential: In GeneMark, there is coding potential and an open reading frame that covers the stop site, but the start site suggested by Glimmer and Genemark is not covered in the coding potential by less than 100 base pairs. The Self-trained GeneMark exhibits the same pattern. However, the other suggested start sites do not fit with the coding potential as well. /note=SD (Final) Score: -2.016. This is the best Final score. /note=Gap/overlap: The gap is 43 base pairs. This is the gap that makes the most sense based on start sites because all other suggested start sites present a gap larger than 163 base pairs. /note=Phamerator: The pham for this gene is 95678. The date is 1/10/22. This pham has 1456 members and is present in several clusters. /note=Starterator: Start site 96 was called 94.3% of the time when present. Start 96 in TallBoi is 2284, which is the suggested start site by Glimmer and GeneMark. /note=Location call: This is most likely a real gene. Based on the coding potential in Glimmer and GeneMark and the information present for the start site, the start codon is most likely 2284. /note=Function call: Based on PhagesDB BLASTp, NCBI BLASTp, PhagesDB Function Frequency, CDD, and HHPRED, it is highly likely that the gene has a portal gene function. E-values are very low with the PhagesDB BLASTp E-values being 0. Additionally, all of the programs agree that this gene is a portal function gene. /note=Transmembrane domains: There were no TMD`s called and no other evidence to suggest a TMD function within the other databases. /note=Secondary Annotator Name: Sheppy, Tyler /note=Secondary Annotator QC: I agree with this annotation. Both the location call of start site 2284 and the function call of portal protein are well supported with evidence. CDS 3679 - 5730 /gene="4" /product="gp4" /function="capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin" /locus tag="Tallboi_4" /note=Original Glimmer call @bp 3679 has strength 7.21; Genemark calls start at 3679 /note=SSC: 3679-5730 CP: yes SCS: both ST: SS BLAST-Start: [capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin [Arthrobacter phage Adolin]],,NCBI, q3:s4 99.7072% 0.0 GAP: 18 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.836, -2.9063687850157054, yes F: capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin SIF-BLAST: ,,[capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin [Arthrobacter phage Adolin]],,QHB36586,89.3431,0.0 SIF-HHPRED: SIF-Syn: /note=Sasha Semaan: Unchecked evidence under HHPRED and CDD because of low coverage. /note=Primary Annotator Name: Uvarov, Evgeniy /note=Auto-annotation start source: Glimmer and GeneMark both call the start at 3679 (site 1) with a common ATG start codon. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF by both Glimmer and GeneMark. The chosen start (site 1) covers all the coding potential. This ORF only has forward strand coding potential, thus this is a forward gene. /note=SD (Final) Score: Start site 1 has a final score of -2.906 and a good Z-score of 2.836. This start site has the best (highest) values on PECAAN for this gene. /note=Gap/overlap: Start (site 1) has the smallest gap of 18 bp indicative of no operon. This start site creates the LORF with a gene length of 2052 bp which is good. /note=Phamerator: The pham number as of 1/7/2022 is 55022. The gene is conserved in phages Adolin (AZ), Amyev (AZ), and Crewmate (AZ), as well as other non draft phages. Any of these genomes can be used for comparison since they are all non-draft. Based on PhagesDB the function call for the gene is either a VIP2-like ADP-ribosyltransferase toxin or an ADP-ribosyltransferase domain and MuF-like fusion protein. These are both on the approved SEA-PHAGES list. /note=Starterator: Based on the 1/7/2022 run the most annotated start site 3 is a reasonable choice that is conserved among members of pham 55022. There are 27 members total with 25 being non-draft. 22/25 of non-draft members and 2/2 draft members call start site 3, which correlates to 3679 (site 1) for Tallboi. /note=Location call: Considering all of the evidence above, this gene is a real gene that is conserved in phamerator as well as starterator, has good coding potential and covers all of it with a start site at 3679 (site 1). Starterator agrees with Glimmer and Genemark. /note=Function call: VIP2-like ADP-ribosyltransferase toxin. PhagesDB BLASTp top three non-draft hits ranked by score with known functions are VIP2-like ADP-ribosyltransferase toxin and so are the great majority of the top hits. The overall top result is Eraser_4 (e: 0, id: 90%), third is Amyev_4 (e: 0, id: 89%), and fourth is Crewmate_5 (e: 0, id: 88%). All these phages are in cluster AZ, same as Tallboi. Phagesdb Function Frequency shows VIP2-like ADP-ribosyltransferase toxin with the highest total frequency (21%) for the AZ cluster that Tallboi is part of. Muf-like minor capsid protein has the overall highest total frequency (34%) but it is for cluster L2. NCBI BLASTp shows three strong hits with zero e-values that are VIP2-like ADP-ribosyltransferase toxin in Eraser (e: 0, id: 90.9224%, cov: 100%), DrSierra (e: 0, id: 86.3836%, cov: 100%), and Adolin (e: 0, id: 81.4599%, cov: 99.7072%). Two strong CDD hits are ADP-ribosyltransferase exoenzyme (e: 6.13329e-23, id: 31.6583%, cov: 26.94%) and VIP2 (e: 3.90754e-21, id: 25.3731%, cov: 27.2328%). Since both exhibit such low e-values the VIP2-like ADP-ribosyltransferase toxin is a likely function as it combines these two CDD results. HHpred three top hits were all closely related to the structure and function of VIP2-like ADP-ribosyltransferase toxin. The top hit was d1gxya_ (prob: 99.9%, cov: 37.9209%, e: 4.9e-22) a eukaryotic mono-ADP-ribosyltransferase. The second hit was 4DV8_A (prob: 99.8%, cov: 38.9458%, e: 1.4e-18) a toxic anthrax lethal factor. The third hit was 1GXY_B (prob: 99.8%, cov: 38.653%, e: 8e-18) a T-cell ecto-ADP-ribosyltransferase. Function call was very close between VIP2-like ADP-ribosyltransferase toxin and ADP-ribosyltransferase but due to its prevalence in Phagesdb Function Frequency and Phagesdb BLAST, VIP2-like ADP-ribosyltransferase toxin was chosen. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore this is not a membrane protein. /note=Secondary Annotator Name: Stephenson, Juliet /note=Secondary Annotator QC: I agree with this annotation. All evidence has been considered. CDS 5786 - 6145 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="Tallboi_5" /note=Original Glimmer call @bp 5786 has strength 8.23; Genemark calls start at 5786 /note=SSC: 5786-6145 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_PHIVES_6 [Arthrobacter phage Phives]],,NCBI, q1:s1 100.0% 2.51387E-71 GAP: 55 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.224, -2.305049668942183, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_PHIVES_6 [Arthrobacter phage Phives]],,QOP65134,94.958,2.51387E-71 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Alvarez, Alondra /note=Auto-annotation: Glimmer and GeneMark both call start site 5786. Its start codon ATG is common and likely to be used. /note=Coding Potential: The Host-Trained GeneMark and Self-Trained GeneMark both have coding potential predicted within the putative ORF. Coding potential is found on the forward strand, indicating that it is a forward gene. The chosen start site at 5786 includes both coding potentials. /note=SD (Final) Score: Putative start site 5786 has a RBS final score of -2.305 and a Z-score of 3.224. These are the best scores out of the potential start sites. This start site also has the longest ORF. /note=Gap/overlap: There is a gap of 56 bp between the gene and the gene upstream. The putative start site minimizes the gap and creates the longest ORF. This gap also seems to be conserved in many other genomes within the AZ cluster. /note=Phamerator: As of 1/6/2021 the gene belongs to pham 94784. Of the 27 members in the pham, 25 are non-draft genes. The pham contains members of the AZ and EH clusters. Phamerator does not call a function for this gene. /note=Starterator: The most conserved start site among the members of the pham is at number 5. It is called in 20 of the 25 non-draft genes, including Tallboi. The position corresponds to basepair coordinate 5786 in Tallboi. /note=Location call: Based on the coding potential, conservation of genome architecture with other non-draft genes within cluster AZ, and statistically significant phagesDB BLAST hits, we can determine that this gene is “real.” Coding potential and starterator confirm that the best fit start site is 5786. /note=Function call: NKF. All hits returned in PhagesDB BLASTp and NCBI BLASTp had no known function, regardless of statistical significance. No output was returned from CDD and HHpred failed to return any statistically significant hits (lowest e-value was 24). /note=Transmembrane domains: TMHMM nor Topcons predict any transmembrane domains. Gene is not a membrane protein. /note=Secondary Annotator Name: Thorp, Jocelyn /note=Secondary Annotator QC: I agree with this annotation, and all categories have been considered. CDS 6263 - 6811 /gene="6" /product="gp6" /function="scaffolding protein" /locus tag="Tallboi_6" /note=Original Glimmer call @bp 6263 has strength 12.09; Genemark calls start at 6263 /note=SSC: 6263-6811 CP: yes SCS: both ST: SS BLAST-Start: [scaffolding protein [Arthrobacter phage Powerpuff] ],,NCBI, q1:s1 100.0% 2.66781E-104 GAP: 117 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.99, -3.3503726477288405, yes F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Arthrobacter phage Powerpuff] ],,QGZ17304,96.0894,2.66781E-104 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_f,59.8901,97.8 SIF-Syn: Synteny: Scaffolding protein (pham 84364), upstream gene is pham 94784, downstream is major capsid protein (pham 57253), just like in phage Lizalica. /note=Primary Annotator Name: Baughman, Lexie /note=Auto-Annotation: Glimmer and Genemark. Both agree on the same start site of 6263, with a start codon of ATG. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF. Chosen start site of 6263 covers all of the coding potential. /note=SD (Final) Score: The SD (Final) Score is -3.350, and the Z-score is 2.99. These are the best out of all the listed possible start sites. /note=Gap/Overlap: There is a 117 base pair gap between this gene and the upstream gene, which may be cause for concern. The auto-annotated start site does not create the longest ORF, with another start site at position 6170 creating only a 24 base pair gap. This is still a larger gap than would be expected; however, since it is the longest ORF, and since it is acceptable for there to be a gap in coding potential between this start site and the auto-annotated one, this start site may warrant further consideration. In all cases, the length of the gene is acceptable. /note=Phamerator: As of 01/10/2022, the gene is found in Pham 84364. The pham is conserved in other members of the cluster - comparison was done between Tallboi and a few other non-draft genomes, including Kaylissa and Lizalica. Both Phamerator and PhagesDB called the function of this gene as “scaffolding protein,” which is on the approved function list. /note=Starterator: The “Most Annotated” start site (12) is present in 29 of 32 non-draft genes in this pham, and it is present in Tallboi. This start site corresponds to base pair position 6263, which is the auto-annotated start site. /note=Location Call: The gathered evidence suggests that this is a real gene and that its start site is likely at position 6263. While there is another proposed start site that creates the longest ORF, this start site is conserved in Starterator and covers all of the coding potential. /note=Function Call: The top 2 NCBI BLASTp hits suggested function is scaffolding protein, with high query coverage (100%), high % identity (>54%), and low e-values (<2e-99). The top 2 PhagesDB BLASTp hits suggested function is scaffolding protein, with high % identity (90%) and low e-values (<7e-88). Thus, the two databases seem to be in agreement. While there were no hits in CDD, one of the top two hits in HHpred was informative - with high probability (97.79%), high coverage (59.8901%), and a low e-value (0.0032) - that listed a similar function. This function is also conserved in a finalized phage genome (Kaylissa and Lizalica, for example). /note=Transmembrane Domains: No predicted transmembrane domains. /note=Secondary Annotator Name: Zhuang, Chuzhi /note=Secondary Annotator QC: I agree with your annotation. Good job! CDS 6841 - 7788 /gene="7" /product="gp7" /function="major capsid protein" /locus tag="Tallboi_7" /note=Original Glimmer call @bp 6841 has strength 10.02; Genemark calls start at 6841 /note=SSC: 6841-7788 CP: yes SCS: both ST: SS BLAST-Start: [major head protein [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 0.0 GAP: 29 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.741, -3.3136498407041004, yes F: major capsid protein SIF-BLAST: ,,[major head protein [Arthrobacter phage Yang] ],,YP_009815625,96.5079,0.0 SIF-HHPRED: Major capsid protein; P22 Bacteriophage, VIRUS; 3.3A {Salmonella phage P22},,,5UU5_C,92.381,99.9 SIF-Syn: Upstream gene is scaffolding protein, downstream is head to tail adaptor, just like in Asa16, Adumb2043, Amyev and other phages in Cluster AZ /note=Primary Annotator Name: Dooley, Naomi /note=Auto-annotation: The gene is called by Glimmer and Gene mark at the same start site 6841 along with the same start codon at ATG. This is a very common start codon so there is reason to believe that this may be the correct start site. /note=Coding Potential: The gene has reasonable coding potential that is covered by the start site. /note=SD (Final) Score: The SD score was reasonable at -3.314 which is the best score listed. The z-score of 2.741 (>2) suggests that this may be the correct start site. This Z-score was the best score listed. /note=Gap/overlap: There is a reasonable gap of 29bp which is the smallest gap listed. This is a reasonable gap and it is unlikely for a new gene to be found in this gap. /note=Phamerator: The pham number as of 1/12/2022 is 57253 and is found in Adumb2043 /note=Starterator: The start number is 7 which corresponds to a 6841 start site. /note=Location call: Based on the data it appears that this is a real gene with a 7788 stop site and 6841 start site. /note=Function call: Based on the phages Adumb2043 and Phives, it is safe to conclude that the function of this gene is major capsid protein. The top two BLAST hits on both PhagesDB have E-values of 1e-169 and 1e-161 respectively; additionally, they both have high scores of 592 and 565, both of which have a major capsid protein function listed (both have high coverage, 99.6825% for both, 65%+ identity and e-value of 0). HHpred have major capsid protein listed with 99.9% probability, 90%+ coverage, and E-values of 1.2E-24 and 1.2E-21. CDD contains a hit with an e-value of 1.54e-13 that indicates that this protein may be conserved with the P22 coat protein being conserved. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs. Thus we can conclude that this is not a membrane protein. /note=Secondary Annotator Name: Erfanian, Kiana /note=Secondary Annotator QC: Everything looks good, great notes! Don`t forget to make a selection in the coding capacity drop-down menu. CDS 7862 - 8269 /gene="8" /product="gp8" /function="head-to-tail adaptor" /locus tag="Tallboi_8" /note=Original Glimmer call @bp 7862 has strength 11.08; Genemark calls start at 7862 /note=SSC: 7862-8269 CP: yes SCS: both ST: SS BLAST-Start: [head-tail adaptor Ad1 [Arthrobacter phage Yang] ],,NCBI, q1:s1 95.5556% 2.15506E-74 GAP: 73 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.224, -2.0162541296952132, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-tail adaptor Ad1 [Arthrobacter phage Yang] ],,YP_009815626,92.3664,2.15506E-74 SIF-HHPRED: 15 PROTEIN; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_C,80.7407,99.0 SIF-Syn: Synteny: Head-to-tail adaptor. Tallboi genes 7 (upstream), 8, and 9 (downstream) show synteny with genes from phage Adolin (cluster AZ). Tallboi gene 7 and Adolin gene 7 show synteny to each other and both have the function of major capsid protein. Tallboi gene 8 and Adolin gene 8 correspond together and are both head-to-tail adaptor genes. Tallboi gene 9 (pham 94918) shows synteny with Adolin gene 9 (pham 95490). /note=PECAAN Notes /note=Primary Annotator Name: Wu, Meigan /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 7862. /note=Coding Potential: Coding potential found for this ORF in the forward frame on host-trained GeneMark and self-trained GeneMark. According to this data, this gene is a forward gene. Start site 7862 is not included in the ORFs observed but is included for the coding potentials found on GeneMark Host and GeneMark Self due to the corresponding TTG start codon. /note=SD (Final) Score: Final score is -2.016, which is the best final score available. /note=Gap/overlap: Gap size of 73. Although this is a large gap, there is no coding potential found in this gap. /note=Phamerator: The pham as of January 9, 2022 is 76440. Gene conservation with Adolin, Amyev, and Crewmate, which are all in the same cluster as Tallboi (AZ). The commonly listed function call is head-to-tail adaptor, which is an approved function by SEA-PHAGES. /note=Starterator: Pham 76440 has 32 non-draft members. Start number 3 was called by Tallboi; this is the most annotated start number and was called in 30/32 non-draft genes. Start number 3 corresponds to start site 7862 bp, which is supported by the Glimmer and GeneMark calls. /note=Location call: This gene is likely a real gene with a start site at 7862. /note=Function call: Head-to-tail adaptor. 19 hits from NCBI BLASTp yielded e-values < 2e-38 corresponding to the head-to-tail adaptor. Multiple hits from PhagesDB BLASTp yielded e-values < 5e-34 for the head-to-tail adaptor function. No CDD hits. HHpred provided two hits that are required as evidence to support the head-to-tail adaptor functional assignment. HHpred outputted an alignment with 99% probability, 80.74% coverage, and a 2.8e-8 e-value for SPP1 15 (5A21 chain C or D in the macromolecular complex) and an alignment with 98% probability, 87.41% coverage, and a 0.00031 e-value for HK97 Gp6. /note=Transmembrane domains: TMHMM and Topcons both did not predict any transmembrane domains. Therefore, this gene is not a transmembrane protein gene. /note=Secondary Annotator Name: Khaine, Aye Myat /note=Secondary Annotator QC: All the evidence provided agree with the start site call at 7862 and function call of head-to-tail adaptor. CDS 8282 - 8389 /gene="9" /product="gp9" /function="hypothetical protein" /locus tag="Tallboi_9" /note=Genemark calls start at 8282 /note=SSC: 8282-8389 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_ITER_9 [Arthrobacter phage Iter] ],,NCBI, q1:s1 97.1429% 8.61713E-12 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.089, -4.6767843745817785, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ITER_9 [Arthrobacter phage Iter] ],,URQ04997,94.4444,8.61713E-12 SIF-HHPRED: SIF-Syn: NFK (pham 94918), upstream gene is a head-to-tail adaptor, downstream gene is a head-to-tail stopper just like phage Amyev in cluster AZ. /note=Primary Annotator Name: Lee, Adrienne /note=Auto-annotation: Only GeneMark calls the start site at 8282. No Glimmer call. /note=Coding Potential: There is coding potential in both GeneMark Self and Host. There is no switching between forward and reverse in the coding potential and there is an open reading frame around the auto-annotated start and stop. /note=SD (Final) Score: The final score is -4.677, which is the only score on PECAAN but it is still a good score. /note=Gap/overlap: There is a 12 base pair gap. This is reasonable because it is small and there is no coding potential in that gap. /note=Phamerator: This gene is part of Pham 94918 as of January 10, 2022. This pham has 20 phages in total including Adumb2043 and Amyev. All of the phages have a gene length of 108 or 111 which is consistent with this phage’s 108 gene length given the recommended start site. /note=Starterator: The most conserved start site is 1 at 8282 for Tallboi This start site is called for 18/18 non-draft genes in this pham. /note=Location call: Based on the evidence, this is a real gene and the start site is 8282, which was called by GeneMark. This start site was also determined by Starterator and conserved in all non-draft phages in the pham. /note=Function call: NKF: There were no good hits from CDD or HHpred. They all had high e-values. Based on Phagesdb BLAST, the other phages in the same cluster and pham also have unknown function for this gene. All the phages came up as hypothetical protein for NCBI BLAST, all with extremely small e-values. Adumb2043 from cluster AZ had an e-value of 3e-11 and had function unknown in Phagesdb BLAST and NCBI BLAST showed an identity of 97.1429%, aligned 97.1429%, and an e-value of 7.10391e-12. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: McLinden, Katherine /note=Secondary Annotator QC: Looks good! I would just make sure to check any evidence you used, like in PhagesDB BLAST. CDS 8389 - 8736 /gene="10" /product="gp10" /function="head-to-tail stopper" /locus tag="Tallboi_10" /note=Genemark calls start at 8389 /note=SSC: 8389-8736 CP: yes SCS: genemark ST: SS BLAST-Start: [head closure Hc1 [Arthrobacter phage Kaylissa] ],,NCBI, q1:s1 100.0% 1.86457E-68 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.535, -3.534757715066017, yes F: head-to-tail stopper SIF-BLAST: ,,[head closure Hc1 [Arthrobacter phage Kaylissa] ],,YP_010678056,95.6522,1.86457E-68 SIF-HHPRED: HEAD COMPLETION PROTEIN GP16; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_E,96.5217,99.7 SIF-Syn: head-to-tail stopper, upstream gene is NKF (pham 94918), downstream is NKF (95480), just like in phage Kaylissa /note=Primary Annotator Name: Magaling, Janelle /note=Auto-annotation: No Glimmer, only GeneMark at 8389. Start codon is ATG. /note=Coding Potential: There is coding potential in one forward ORF and is shown in both GeneMark Self and Host. Start site 8389 covers all coding potential. /note=SD (Final) Score: Final score is -3.535. Not the best, but likely an operon. Z score is 2.535 which supports the start. /note=Gap/overlap: 1bp overlap which indicates an operon. Length of 348 is acceptable. There is one other start but this is the LORF and the other has a huge gap. /note=Phamerator: 1/19/22 pham 96582. 32/40 phages like Adolin and Amyev in the pham are from AZ like Tallboi. All genes with a function in the pham called head-to-tail stopper. /note=Starterator: The most annotated start site is 4 and is called by 30/32 non-draft genes and talllboi calls this start 4 too @8389 /note=Location call: Start covers all coding potential. Phamerator and starterator shows conservation. This is a real gene. /note=Function call: head-to-tail stopper. Phagesdb function frequency table, phagesdb blast, and NCBI blast show many other genes with low e-values (> 4e^56) as head-to-tail stopper. HHpred has a good hit with low e values (>2.7e-17) and high coverage (96.5217) for 5A21_E which is required for head-to-tail stopper function. There were no CDD hits. /note=Transmembrane domains: No hits for TmHmm so cannot look at Topcons /note=Secondary Annotator Name: Uvarov, Evgeniy /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: Look at the pham since it might have updated, also check some supporting evidence boxes for NCBI blast. CDS 8748 - 9050 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="Tallboi_11" /note=Original Glimmer call @bp 8478 has strength 11.52; Genemark calls start at 8748 /note=SSC: 8748-9050 CP: yes SCS: both-gm ST: NA BLAST-Start: [hypothetical protein SEA_CASSIA_11 [Arthrobacter phage Cassia]],,NCBI, q1:s1 99.0% 1.11617E-54 GAP: 11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.224, -2.033982896655645, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_11 [Arthrobacter phage Cassia]],,WGH21084,94.0,1.11617E-54 SIF-HHPRED: SIF-Syn: Tallboi displays synteny with phage Adolin. Although this gene is an orpham and has NKF and the corresponding gene 11 in Adolin is not and is in pham 98136, the upstream and downstream genes match. Gene 12 in both phages is in pham 9463 and is a tail terminator, while Gene 10 in both phages is in pham 97801 and is a head to tail stopper. /note=PECAAN Notes /note=Primary Annotator Name: Ostroske, Elyse /note=Auto-annotation: Glimmer and GeneMark. Glimmer called the start at 8478 and GeneMark called the start at 8748. /note=Coding Potential: Coding Potential for this gene is mainly found on the forward strand, indicating that this is a forward gene. Coding Potential is found in both GeneMark Self and Host. /note=SD (Final) Score:-2.034. It is the best final score on PECAAN. /note=Gap/overlap: Gap of 11 bp. This is the smallest gap/overlap given out of the potential start sites. /note=Phamerator: This gene is sorted into pham 97778 and is the only member of this pham (orpham) as of 1/23/22. /note=Starterator: Starterator says that this phage is an orpham and therefore provides no report. /note=Location call: Based on the above evidence, this is a real gene. The most likely start site is 8748, as called by GeneMark, because it contains all the coding potential, has the best possible final score, the only Z-score greater than 2 out of all the start sites, and the only reasonable gap. /note=Function call: NKF. Niobe, London, Eraser, Elezi and Asa16 were all checked as evidence to their extremely low e-values under PhagesDB BLAST. Yang and Adumb2043 were checked as evidence under NCBI BLASTp for their high query cover percentages, as were the same phages checked under PhagesDB. No hits were found in CDD and the only result in HHpred with a high probability and low e-value turned out to be a DUF. /note=Transmembrane domains: TMDs were not predicted by TMHMM or TOPCONS. /note=Secondary Annotator Name: Alvarez, Alondra /note=Secondary Annotator QC:I have QC`ed this location call and agree with the first annotator. Make sure to try to run starterator and phamerator again, as well as fill in synteny box. CDS 9050 - 9463 /gene="12" /product="gp12" /function="tail terminator" /locus tag="Tallboi_12" /note=Original Glimmer call @bp 9050 has strength 5.62; Genemark calls start at 9050 /note=SSC: 9050-9463 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Arthrobacter phage Kaylissa] ],,NCBI, q1:s1 100.0% 5.75388E-93 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.671, -3.1085431521568267, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage Kaylissa] ],,YP_010678058,99.2701,5.75388E-93 SIF-HHPRED: TAIL-TO-HEAD JOINING PROTEIN GP17; VIRAL PROTEIN, VIRAL INFECTION, TAILED BACTERIOPHAGE, SIPHOVIRIDAE, SPP1, VIRAL ASSEMBLY, HEAD-TO-TAIL INTERFACE, DNA GATEKEEPER, ALLOSTERIC MECHANISM; 7.2A {BACILLUS PHAGE SPP1},,,5A21_G,95.6204,99.4 SIF-Syn: This gene shares synteny with all of the phages in cluster AZ, which indicates that this gene is highly conserved across these genomes. For instance, phage Adolin – also from pham 2023 – starts at 9024 and stops at 9440, which is very similar to the start/stop of gene 12 in Tallboi. /note=Primary Annotator Name: Santos, Charysa /note=Auto-annotation: Glimmer and GeneMark. Both start at 9050. This gene has a Glimmer score of 5.62. /note=Coding Potential: Coding potential in this ORF is found in the direct/forward sequence, thus this is a forward gene. Coding potential is found in GeneMark Host and Self. /note=SD (Final) Score: -3.109. It is the best/least negative final score on PECAAN for this gene. /note=Gap/overlap: -1. Very small overlap, meaning there is no room for another gene in between the previous gene and this one. The gene 11 ends at site 9050 and gene 13 starts at site 9476, so the following gene has a 13 bp gap with gene 12. /note=Phamerator: As of 01/07/2022, this gene is found in Pham 2023 and is conserved in other members of cluster AZ, including Adumb2043 and Nitro. /note=Starterator: The “Most Annotated” start site (2) is present in 28 of 35 non-draft genes in this pham, and is present in Tallboi. This start site corresponds to base pair position 9050, the auto-annotated start site. /note=Location call: Using the evidence I found through Glimmer and GeneMark, this is a real gene and the most probable start site is 9050. /note=Function call: Based on the BLAST results for this sequence, I believe that my gene has a tail terminator function because the top hits were all tail terminator function genes and they had high alignment with my gene and very low e-values. Kaylissa, Lego, Powerpuff, and YesChef in particular had e-values of 5e-74 for the tail terminator function gene. The first 2 HHPRED hits had high probability (>99.3), high coverage (>94.1606), and low e-values (<1.5e-10). There were no CDD results to confirm this. /note=Transmembrane domains: There were no TMD predictions on either TMHMM or TOPCONS, so we can conclude that this gene is not a membrane protein. /note=Secondary Annotator Name: Baughman, Lexie /note=Secondary Annotator QC: I have QC`ed the location and functional calls and agree with the first annotator. Perhaps add some detail to your PECAAN notes, particularly in the functional call section. It might be helpful to write out the actual hits. Also, I would make a specific comparison in the synteny box between Tallboi and a finalized genome. CDS 9476 - 10027 /gene="13" /product="gp13" /function="major tail protein" /locus tag="Tallboi_13" /note=Original Glimmer call @bp 9476 has strength 9.86; Genemark calls start at 9476 /note=SSC: 9476-10027 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage London] ],,NCBI, q1:s1 99.4536% 5.85678E-123 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.235, -2.0720764396375664, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage London] ],,QOP64316,96.7213,5.85678E-123 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_M,92.3497,98.5 SIF-Syn: Major tail protein, upstream gene is tail terminator, downstream gene is tail assembly chaperone, just like in phage Adolin. /note=Primary Annotator Name: Sheppy, Tyler /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 9476. The start codon for that start site is ATG. /note=Coding Potential: There is reasonable coding potential predicted within the ORF according to both the Host-Trained GeneMark and the Self-Trained GeneMark. The chosen start site covers all of this coding potential. /note=SD (Final) Score: The Final Score for the start site is -2.072, which is the best (least negative) of all of the other possible start sites. The Z-score is the highest with 3.235. /note=Gap/overlap: There is a gap of 12bp between the start of this gene and the end of the upstream gene which is a reasonable size. /note=Phamerator: As of January 10, 2022, this gene is found in pham 88338. This pham is seen in other phages in the AZ cluster, such as phage Adolin and phage Amyev. The function called was major tail protein and it is consistent with other genes in this pham. /note=Starterator: There is a reasonable start site conserved among members of the pham. Start site number 18 is the most conserved and it corresponds to 9476 on the phage. 75/114 non-draft genes call this start site. /note=Location call: This is a real gene that has a start site of 9476. This start site covers all of the coding potential predicted in both the Host-Trained and Self-Trained GeneMark and is predicted by both Glimmer and GeneMark. This start is also predicted by Starterator. /note=Function call: The function of this gene is a major tail protein. There are strong hits in both the NCBI BLAST and the PhagesDB BLAST, as well as in HHpred. The top hit in NCBI BLAST has a 94.5355% identity, 96.7213% alignment, 99.4536% coverage, and an e-value of 1.12797e-123. The top hit in HHpred has a 92.3497% coverage and an e-value of 0.000014. This function is also called in other genes in the same pham, according to Phamerator. CDD was not informative. /note=Transmembrane domains: There are no transmembrane domains predicted by either TOPCONS or TmHmm. This gene is not a transmembrane protein. /note=Secondary Annotator Name: Dooley, Naomi /note=Secondary Annotator QC: I have QC`d this gene and agree with the primary annotator. CDS 10118 - 10387 /gene="14" /product="gp14" /function="tail assembly chaperone" /locus tag="Tallboi_14" /note=Genemark calls start at 10118 /note=SSC: 10118-10387 CP: yes SCS: genemark ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage Iter] ],,NCBI, q1:s1 100.0% 1.02741E-51 GAP: 90 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.235, -1.9310779259753799, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Iter] ],,URQ05002,96.6292,1.02741E-51 SIF-HHPRED: SIF-Syn: This gene shows good synteny with AZ phages Adolin, Adumb2043, Asa16, and Crewmate. Specifically, the architecture of this gene (pham 84303) overlapping with downstream genes of pham 94639, which is also listed as a tail assembly chaperone, is very highly conserved. Since these genes share a function, it makes sense that they overlap. Upstream synteny is not as well-conserved, but the large gap between this gene and the gene before it is present in all the phages listed. /note=Primary Annotator Name: Stephenson, Juliet /note=Auto-annotation: GeneMark calls start site @ 10118, codon ATG /note=Coding Potential: There is coding potential present in the second ORF (Host and Self-Trained GeneMark), and the selected start codon does include all the coding potential. /note=SD (Final) Score: This Start Site does have the best Final score (-1.931) as well as a good Z-score (3.235). This start site gives the LORF. /note=Gap/overlap: The gap between this gene and the one upstream is 90 bp, which is too small to accommodate a full gene. The length of the gene is 270 bp, which is reasonable. /note=Phamerator: Pham 95622 on 1/10/22. Common in cluster AZ, mostly other tail assembly chaperones, which is consistent with function call. /note=Starterator: Most annotated start is called. 30/32 non-draft phages call site 6, which corresponds to 10118 in Tallboi. /note=Location call: Based on the good coding potential in the forward direction and the number of hits in Phagesdb and NCBI Blast, as well as synteny with other phages in Cluster AZ, this is a real gene with start site 10118. /note=Function call: NCBI Blast calls up several matches for tail assembly chaperone, with perfect (100%) query coverage, high % identity (>90%), and low e-values. /note=Transmembrane domains: No transmembrane domains present according to TmHmm. /note=Secondary Annotator Name: Wu, Meigan /note=Secondary Annotator QC: Overall, great job! Here are my suggestions: Explain why the gap size is reasonable, since gap sizes of more than 7 bp are considered large. I would include the year when noting the date, just for future reference. I would also specifically note that Glimmer did not call a start site and note whether the coding potential is found on a forward or reverse strand. Your location call mentions HHpred evidence, but HHpred did not yield any reliable hits, so I would take this out or clarify this. Not sure if this matters, but the annotation lab manual says to just record 2-3 NCBI BLASTp hits and 2 PhagesDB BLASTp hits, so you can reduce the amount of hits you selected for PhagesDB BLASTp and NCBI BLASTp. Don`t forget to note if downstream synteny is present in the synteny box as well. CDS join(10118..10381,10381..10734) /gene="15" /product="gp15" /function="tail assembly chaperone" /locus tag="Tallboi_15" /note= /note=SSC: 10118-10734 CP: yes SCS: neither ST: NI BLAST-Start: [tail assembly chaperone [Arthrobacter phage Elezi] ],,NCBI, q1:s1 100.0% 3.4897E-129 GAP: -270 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.235, -1.9310779259753799, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Elezi] ],,YP_010677992,93.6585,3.4897E-129 SIF-HHPRED: SIF-Syn: CDS 10731 - 13055 /gene="16" /product="gp16" /function="tape measure protein" /locus tag="Tallboi_16" /note=Original Glimmer call @bp 10731 has strength 6.58; Genemark calls start at 10731 /note=SSC: 10731-13055 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 0.0 GAP: -5 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.235, -2.0720764396375664, yes F: tape measure protein SIF-BLAST: ,,[tape measure protein [Arthrobacter phage London] ],,QOP64319,96.77,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,75.969,99.6 SIF-Syn: Tape measure protein, upstream gene is tail assembly chaperone, downstream gene appears to be minor tail protein, just like in final phage Adolin. /note=Primary Annotator Name: Erfanian, Kiana /note=Auto-annotation: Glimmer and GeneMark. Both called the same start site of 10731. /note=Coding Potential: This gene has good coding potential within the putative ORF, and covers all this coding potential. /note=SD (Final) Score: The RBS Final Score of -2.072 for the original start at 10731 is the highest of the suggested starts, and is therefore reasonable. The Z-score for this start however, is the highest at 3.224 and therefore the best score. /note=Gap/overlap: This gene has an overlap of 8 bp with its downstream gene because its stop is at 13055, whereas the start of the downstream gene is 13048. /note=Starterator: Start 6 (10731) called 37/55 times, 100% of time when present, most annotated. /note=Location call: The evidence gathered indicates that the suggested start site of 10731 as called by Glimmer and GeneMark appears to be the most probable site. /note=Function call: Tape Measure protein. Both PhagesDB BLASTp and NCBI BLASTp have several hits with low e values, high identity percentages, and reasonable scores; all of which serve as strong evidence for the gene’s function. The top non-draft hits on PhagesDB BLASTp are all for genes of the same cluster as the putative gene (AZ), and all have the lowest possible e-value of zero. Furthermore, the top non-draft hit on NCBI BLASTp was also for a gene from a phage (named Elezi) in the same cluster as Tallboi. This hit also has an E-value of zero, a reasonable score, as well as a very high identity percentage of 91.47%. Each of these hits on both PhagesDB BLASTp and NCBI BLASTp have the function of tape measure protein. Top hits for HHPRED alignments further point to tape measure protein as the gene function. The top hit on CDD lists a gene of 57.1059% coverage with the function of tape measure protein. Given the above data, there is enough evidence to conclude that this is indeed the function of this gene. /note=Transmembrane domains: TmHmm predicts 8 TMHs. No hits from TopCons. The protein is most likely a transmembrane protein. /note=Secondary Annotator Name: McLinden, Katherine /note=Secondary Annotator QC: Looks good! Just a few things, make sure to check all the evidence you used. Additionally, if it is a transmembrane protein, can it also be a tape measure protein? CDS 13048 - 13923 /gene="17" /product="gp17" /function="minor tail protein" /locus tag="Tallboi_17" /note=Original Glimmer call @bp 13048 has strength 6.43; Genemark calls start at 13048 /note=SSC: 13048-13923 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 0.0 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.836, -2.7653702713535186, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage London] ],,QOP64320,96.5636,0.0 SIF-HHPRED: Distal Tail Protein, gp58; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_BD,97.2509,100.0 SIF-Syn: minor tail protein, upstream gene is tape measure protein, downstream gene is minor tail protein, just like in phages Asa16, Niobe, Adumb2043 and others from the same cluster AZ /note=Primary Annotator Name: Khaine, Aye Myat /note=Auto-annotation: Both Glimmer and GeneMark agree and call the start site at 13048. /note=Coding Potential: The ORF has reasonable coding potential. The start at 13048 covers all coding potential. /note=SD (Final) Score: The final SD score is -2.765 with a z-score of 2.836. This is the best score. /note=Gap/overlap: There is an overlap upstream of 8 bp. /note=Phamerator: This gene belongs to pham 93352. This pham has 22 members with 18 non-draft phages from the same Cluster AZ as Tallboi. /note=Starterator: Start number 3 with start site 13048 is the most annotated start and has been manually annotated for 18 times. This start agrees with Glimmer and GeneMark. /note=Location call: Based on the evidence, this gene is real with the start at 13048. /note=Function call: Minor tail protein. Both phagesdb and NCBI protein BLAST hits have minor tail protein as function (phagesdb: e-values of 1e-156 and NCBI: e-values of 0, 100% coverage and 90%+ identity). HHpred hits correspond to distal tail protein with e-values of <2.6e-26, 100% probability and 97%+ identity. /note=Transmembrane domains: This is not a membrane protein. Both TMHMM and TOPCONS predicted no TMDs. /note=Secondary Annotator Name: Ostroske, Elyse /note=Secondary Annotator QC: I have QC`d this gene and agree with the primary annotator. CDS 13934 - 14932 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="Tallboi_18" /note=Original Glimmer call @bp 13934 has strength 4.58; Genemark calls start at 13934 /note=SSC: 13934-14932 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Lego]],,NCBI, q1:s1 99.6988% 0.0 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.224, -2.033982896655645, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Lego]],,QIN94418,92.7273,0.0 SIF-HHPRED: Receptor Binding Protein; beta sandwich domain, phage receptor binding protein, Lactococcus lactis pellicle cell wall polyphosphosaccharide, VIRAL PROTEIN; 1.75A {Lactococcus phage 1358},,,4L9B_A,54.8193,99.4 SIF-Syn: Minor tail protein. In Asa16 gene 18 is equivalent to Tallboi gene 19. Both genes have minor tail protein functions and are surrounding upstream and downstream by minor tail proteins. Phives gene 19 and Tallboi gene 19 are equivalent and both have minor tail protein functions. They are also surrounded upstream and downstream by minor tail proteins. /note=Primary Annotator Name: McLinden, Katherine /note=Auto-annotation: Glimmer and GeneMark both call the same start codon at 13934 with a Glimmer score of 4.58. The start codon at this stop site is ATG. /note=Coding Potential: Both GeneMark and Glimmer suggested coding potential at the putative ORF. The coding potential covers all of the start site and stop site. /note=SD (Final) Score: -2.034. This is the best Final Score. /note=Gap/overlap: The gap for this start site is 10 base pairs. This is the most likely site the other start sites show a gap larger than 136 base pairs. /note=Phamerator: The pham for the gene is 15182. The date is 1/10/22. There are 22 members with 2 being singletons and the rest also in the AZ cluster. /note=Starterator: Tallboi calls the most annotated start. Start site 4 was called 100% of the time when present. Start 4 in Tallboi is 13934, which was suggested by Glimmer and GeneMark. /note=Location call: This is most likely a real gene. Based on Glimmer and GeneMark auto annotation and the coding potential, it is likely that 13934 is the start site for this gene. /note=Function call: The majority of hits in PhagesDB function frequency show up as a minor tail protein. PhagesDB BLASTp shows minor tail protein as at least the top ten hits with extremely low E-values (lowest being 1e-179). These hits are also in the same cluster. At least the first 10 entries for NCBI BLASTp also have minor tail protein being the function with E-values of 0. CDD had no viable evidence. Based on this collective evidence, there is a high likelihood that this gene has a minor tail protein gene function. /note=Transmembrane domains: There were no TMD`s called and no other evidence to suggest a TMD function within the other databases. /note=Secondary Annotator Name: Santos, Charysa /note=Secondary Annotator QC: There is sufficient evidence that this is a real gene with a minor tail protein function and a start site at 13934. CDS 14933 - 16054 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="Tallboi_19" /note=Original Glimmer call @bp 14933 has strength 8.01; Genemark calls start at 14933 /note=SSC: 14933-16054 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 0.0 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.836, -3.1164791313608173, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage London] ],,QOP64322,97.8552,0.0 SIF-HHPRED: Tail protein, 43 kDa; tail protein, structural genomics, PSI, MCSG, Protein Structure Initiative, Midwest Center for Structural Genomics, UNKNOWN FUNCTION; 2.1A {Neisseria meningitidis MC58} SCOP: b.106.1.1,,,3D37_A,93.2976,99.7 SIF-Syn: Minor tail protein (pham 57436), upstream gene is minor tail protein (pham 15182), downstream is minor tail protein (pham 94340), just like in phage Adumb2043 and others. /note=Primary Annotator Name: Uvarov, Evgeniy /note=Auto-annotation start source: Glimmer and GeneMark both call the start at 14933 (site 1) with a common ATG start codon. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF by both Glimmer and GeneMark. The chosen start (site 1) covers all the coding potential. This ORF only has forward strand coding potential, thus this is a forward gene. /note=SD (Final) Score: Start site 1 has a final score of -3.116 and a good Z-score of 2.836. This start site has the best (highest) values on PECAAN for this gene. /note=Gap/overlap: Start (site 1) has the smallest gap of 0 bp indicative of a possible operon. This start site creates the LORF with a gene length of 1122 bp which is good. /note=Phamerator: The pham number as of 1/7/2022 is 57436. The gene is conserved in phages Adumb2043 (AZ), Amyev (AZ), and Asa16 (AZ), as well as other non draft phages. Any of these genomes can be used for comparison since they are all non-draft. Based on PhagesDB the function call for the gene is a minor tail protein. This is on the approved SEA-PHAGES list. /note=Starterator: Based on the 1/7/2022 run the most annotated start site 3 is a reasonable choice that is conserved among members of pham 57436. There are 23 members total with 20 being non-draft. 18/20 of non-draft members and 3/3 draft members call start site 3, which correlates to 14933 (site 1) for Tallboi. /note=Location call: Considering all of the evidence above, this gene is a real gene that is conserved in phamerator as well as starterator, has good coding potential and covers all of it with a start site at 14933 (site 1). Starterator agrees with Glimmer and Genemark. /note=Function call: Minor tail protein. PhagesDB BLASTp top three non-draft hits ranked by score with known functions are minor tail protein and so are the great majority of the top hits. The overall top result is Adumb2043 (e: 0, id: 95%), second is Asa16 (e: 0, id: 95%), and third is Eraser (e: 0, id: 95%). All these phages are in cluster AZ, same as Tallboi. Phagesdb Function Frequency shows minor tail protein with the highest total frequency (26%) for the AZ cluster that Tallboi is part of, as well as all other clusters. NCBI BLASTp shows three strong hits with zero e-values that are minor tail protein in Adumb2043 (e: 0, id: 95.7105%, cov: 100%), London (e: 0, id: 95.1743%, cov: 100%), and Elezi (e: 0, id: 94.9062%, cov: 99.7072%). There are no strong CDD hits. HHpred three top hits were all closely related to the structure and function of minor tail protein. The top hit was PF14594.9 (prob: 99.9%, cov: 93.2976%, e: 1.3e-21) a Siphovirus protein related to phage endopeptidase tail proteins. The second hit was 3D37_A (prob: 99.7%, cov: 93.2976%, e: 5.5e-14) a tail protein from Neisseria meningitidis. The third hit was 3CDD_D (prob: 99.7%, cov: 93.5657%, e: 5e-14) a tail protein from Shewanella oneidensis. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore this is not a membrane protein. /note=Secondary Annotator Name: Sheppy, Tyler /note=Secondary Annotator QC: I agree with this annotation. All of the evidence was considered when making the location call (start: 14933) and the function call (minor tail protein). CDS 16061 - 18886 /gene="20" /product="gp20" /function="minor tail protein" /locus tag="Tallboi_20" /note=Original Glimmer call @bp 16061 has strength 6.28; Genemark calls start at 16061 /note=SSC: 16061-18886 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage London]],,NCBI, q15:s15 98.5122% 0.0 GAP: 6 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.812, -3.7230958680478965, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage London]],,QOP64323,85.5026,0.0 SIF-HHPRED: ENDO-1,4-BETA-XYLANASE Y; CARBOHYDRATE-BINDING MODULE, XYLAN-BINDING, XYLANASE; HET: MSE; 2.1A {CLOSTRIDIUM THERMOCELLUM} SCOP: b.18.1.7,,,1DYO_B,16.5781,97.1 SIF-Syn: minor tail protein (pham 94340), upstream gene is minor tail protein (pham 57436), downstream gene has NKF (pham 55453) like Amyev, Eraser, Elezi, London, Asa16, Niobe, Phives, Adumb2043 and Yang. /note=Primary Annotator Name: Alvarez, Alondra /note=Auto-annotation: Glimmer and GeneMark agree on the start site 16061. /note=Coding Potential: There is Host-Trained GeneMark potential, but it is slightly outside of the ORF’s stop site and start site at 16061. However, in Self-Trained GeneMark the coding potential is included in the start site at 16061. Coding potential is found on forward strand, indicating that the gene is a forward gene. /note=SD (Final) Score: The chosen start site has a z-score of 2.812 and final RBS score of -3.723. These are the best scores amongst the other possible start sites. The ORF is also the longest. Its ATG start codon is common and likely to be used. /note=Gap/overlap: There is a gap of 6 bp between this gene and the gene upstream, the smallest gap of the potential start sites. The gap is not large enough to consider the addition of another gene. /note=Phamerator: As of 1/10/22 the gene belongs to pham 94340. Of the 18 members, the 16 non-draft genes are all minor tail proteins belonging to the AZ cluster. /note=Starterator: The most conserved start site is number 1 corresponding to base pair coordinate 16061 in Tallboi. It is annotated in 16 of the 16 non-draft genes in the pham. /note=Location call: Based on the coding potential, conservation of genome architecture with other non-draft genes within cluster AZ, and many statistically significant phagesDB BLAST hits, we can determine that this gene is “real.” The chosen start site is 16061. /note=Function call: Minor tail protein. Many strong, low e-values returned by NCBI BLASTp for minor tail proteins. Several of these hits have query coverage above 90% and identity percentages above 70%. Similarly, phagesDB BLASTp also returned many low e-values for minor tail protein function. Query coverage percentages were consistent with those of NCBI BLASTp. No output was returned by the CDD program. E-values of HHpred outputs were not low enough to be considered informative. /note=Transmembrane domains: No transmembrane domains are called by TmHmm nor Topcons. The gene is not a membrane protein. /note=Secondary Annotator Name: Stephenson, Juliet /note=Secondary Annotator QC: I agree with this annotation and find that all evidence has been considered. CDS 18896 - 19258 /gene="21" /product="gp21" /function="hypothetical protein" /locus tag="Tallboi_21" /note=Original Glimmer call @bp 18896 has strength 5.56; Genemark calls start at 18896 /note=SSC: 18896-19258 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_POWERPUFF_21 [Arthrobacter phage Powerpuff] ],,NCBI, q1:s1 94.1667% 2.68497E-58 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.235, -2.0111200136961407, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_POWERPUFF_21 [Arthrobacter phage Powerpuff] ],,QGZ17319,92.9204,2.68497E-58 SIF-HHPRED: SIF-Syn: NKF (pham 55453), upstream gene is minor tail protein (pham 94340), downstream gene is pham 82310, just like in phage Kaylissa. /note=Primary Annotator Name: Baughman, Lexie /note=Auto-Annotation: Glimmer and Genemark. Both agree on the same start site of 18896, with a start codon of ATG. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF. Chosen start site of 18896 covers all of the coding potential. /note=SD (Final) Score: The SD (Final) Score is -2.011, and the Z-score is 3.235. These are the best out of all the listed possible start sites. /note=Gap/Overlap: There is a 9 base pair gap with the upstream gene, which is nearly acceptable. This start site creates the longest ORF and the length of the gene is acceptable. /note=Phamerator: As of 01/10/2022, the gene is found in Pham 55453. The pham is conserved in other members of the cluster - comparison was done between Tallboi and a few other non-draft genomes, including Kaylissa and Lizalica. The function is not called by either Phamerator or PhagesDB. /note=Starterator: The “Most Annotated” start site (10) is present in 16 of 23 non-draft genes in this pham, and it is present in Tallboi. This start site corresponds to base pair position 18896, which is the auto-annotated start site. /note=Location Call: The gathered evidence suggests that this is a real gene and that its start site is likely at position 18896. /note=Function Call: The top 2 NCBI BLASTp hits suggested function is hypothetical protein with high query coverage (94%), high % identity (81.42% and 82.30%), and low e-values (<1e-57). The top PhagesDB BLASTp hits’ suggested function is unknown, with high % identities (>82%) and low e-values (<6e-48). Hits in PhagesDB with suggested functions have high e-values. Similarly, the CDD and HHpred hits were uninformative, with very high e-values and low probabilities and coverages. As such, there does not seem to be enough evidence to call the function of this gene. /note=Transmembrane Domains: No predicted transmembrane domains. /note=Secondary Annotator Name: Thorp, Jocelyn /note=Secondary Annotator QC: Based upon the evidence, I agree with the annotation above. CDS 19248 - 19517 /gene="22" /product="gp22" /function="membrane protein" /locus tag="Tallboi_22" /note=Genemark calls start at 19248 /note=SSC: 19248-19517 CP: yes SCS: genemark ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Powerpuff] ],,NCBI, q6:s6 94.382% 2.24198E-47 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.085, -4.394817490717228, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Powerpuff] ],,QGZ17320,88.764,2.24198E-47 SIF-HHPRED: SIF-Syn: Upstream gene is minor tail protein, downstream is membrane protein, just like in Asa16, Adumb2043, Amyev and other phages in Cluster AZ /note=Primary Annotator Name: Dooley, Naomi /note=Auto-annotation: The gene is called only by Gene mark at the start site 19248 with the start codon at GTG. This is a very common start codon so there is reason to believe that this may be the correct start site. /note=Coding Potential: The gene has reasonable coding potential that is covered by the start site. /note=SD (Final) Score: The SD score was reasonable at -4.395 which is the best score listed. The z-score of 2.085 (>2) suggests that this may be the correct start site. This Z-score was the best score listed. /note=Gap/overlap: There is a reasonable gap of -11bp which is the smallest gap listed. This is a bit of a significant overlap that is not usually indicative of an operon. /note=Phamerator: The pham number as of 1/12/2022 is 82310 and is found in Adumb2043 /note=Starterator: The start number is 11 which corresponds to a 19248 start site /note=Location call: Based on the data it appears that this might not be real gene (not entirely sure; not called by glimmer, a bit of an overlap in genes but has synteny) /note=Function call: Based on the phages Adumb2043 and Powerpuff, it is safe to conclude that the function of this gene is unknown. The top two BLAST hits on PhagesDB have E-values of 2e-37 and 5e-37 respectively; additionally, they both have high scores of 152 and 151, both of which have a unknown function listed (both have high coverage, 94.382% for both, 65%+ identity and e-value of 4.2693e-48 and 1.576e-47 respectively). HHpred have unknown function listed with 97.7% probability, 87%+ coverage, and E-value 0.0029. CDD contains no hits. /note=Transmembrane domains: TMHMM one TMD while TOPCONS did not predict any. /note=Secondary Annotator Name: Zhuang, Chuzhi /note=Secondary Annotator QC: You should also include the pham number of genes in the synteny box. I think it is a real gene because of reasonable coding potential as well as synteny with other phages` genomes. Good job! CDS 19526 - 19774 /gene="23" /product="gp23" /function="membrane protein" /locus tag="Tallboi_23" /note=Original Glimmer call @bp 19526 has strength 11.62; Genemark calls start at 19526 /note=SSC: 19526-19774 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 9.32968E-33 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.323, -3.9166971242758706, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage DrManhattan] ],,YP_009815365,82.9268,9.32968E-33 SIF-HHPRED: SIF-Syn: Tallboi genes 23 (upstream), 24, and 26 (downstream) show synteny with phage Amyev, which is a member of cluster AZ. Tallboi gene 23 shows synteny with Amyev gene 22, and both are members of pham 82310. Tallboi gene 24 corresponds to Amyev gene 23, and both are a part of pham 10993. Synteny is also observed between Tallboi gene 26 and Amyev gene 24, which are both members of pham 56585. /note=PECAAN Notes /note=Primary Annotator Name: Wu, Meigan /note=Auto-annotation: Start site at 19526 is called by both Glimmer and Genemark. /note=Coding Potential: Coding potential found on Genemark Self and GeneMark Host in both a forward reading frame and reverse reading frame. Not all coding potentials cover the predicted start site. Although the coding potential evidence is not concrete, this autoannotation shows synteny with phage Crewmate (AZ) and phage Amyev (AZ) forward genes 24 and 23, respectively, and this gene sequence has been recognized by PhagesDB BLAST in Crewmate and Amyev as well. Therefore, there is significant evidence showing that this gene is likely a forward gene. /note=SD (Final) Score: The final score is -3.917. This is the best final score on PECAAN. /note=Gap/overlap: Gap size is 8, which is a reasonable length. /note=Phamerator: As of January 9, 2022, the pham number is 10993. Gene conservation in phages Adolin, Amyev, and DrManhattan, which are all members of Tallboi’s cluster (AZ). No function call is made for this gene. /note=Starterator: Pham 10993 has 24 non-draft members. Start number 4 was called in 10/24 non-draft genes, which is the start number called most frequently. Start number 4 in Tallboi is start site at 19526 bp, which agrees with the start site called by Glimmer and GeneMark. /note=Location call: This gene is likely a real gene with a start site at 19526. /note=Function call: Membrane protein. Multiple hits with e-values <= 3e-22 with function unknown on PhagesDB BLAST. 5 hits with e-values <= e-16 with membrane protein function on NCBI BLASTp. No CDD hits. HHpred yielded no reliable hits. /note=Transmembrane domains: TMHMM yielded 2 predicted TMDs. TOPCONS also provided TMD evidence. This evidence supports this gene to have a membrane protein function, which is further supported by functions associated with plausible NCBI BLASTp hits. /note=Secondary Annotator Name: Erfanian, Kiana /note=Secondary Annotator QC: Great synteny notes. Based on Tallboi`s pham map, this gene does indeed appear to be forward. I understand your concern given the coding potential maps, but I believe your conclusion is valid. I`m not convinced on the function though, since there isn`t much evidence across all programs examined. I would personally think NKF but you should check with professor to get another opinion! Also if you decide to go with membrane protein, you should uncheck the unknown function hits on PhagesDB. Everything else looks great. CDS 19849 - 21330 /gene="24" /product="gp24" /function="endolysin" /locus tag="Tallboi_24" /note=Original Glimmer call @bp 19849 has strength 5.82; Genemark calls start at 19849 /note=SSC: 19849-21330 CP: yes SCS: both ST: SS BLAST-Start: [endolysin [Arthrobacter phage London] ],,NCBI, q1:s1 99.1886% 0.0 GAP: 74 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.224, -2.4811409279978642, yes F: endolysin SIF-BLAST: ,,[endolysin [Arthrobacter phage London] ],,QOP64328,93.3735,0.0 SIF-HHPRED: d.118.1.1 (A:9-175) Peptidoglycan recognition protein, PGRP-S {Human (Homo sapiens) [TaxId: 9606]},,,d1ycka1,28.3976,99.7 SIF-Syn: Endolysin, upstream gene is membrane protein, downstream is deoxynucleoside monophosphate kinase, just like in phage Eraser /note=Primary Annotator Name: Magaling, Janelle /note=Auto-annotation: GeneMark and Glimmer both agree on 19849. Start is ATG. /note=Coding Potential: There is one forward ORF for both Host and Self GeneMark. The start site covers all coding potential. /note=SD (Final) Score: Final score is -2.481 which is not good. Z score is 3.224 which is good. /note=Gap/overlap: 286bp gap. This is concerning. Another gene can fit. Length of 1482 is big enough for a gene. Pham maps show synteny with a similar long gene with a gap in Amyev. /note=Phamerator: 1/9/22 pham 56585. 14/40 phages such as Eraser and London in pham is in cluster AZ like tallboi. Many of the genes with functions called for endolysin or lysin A which are found in the approved function list. /note=Starterator: The most annotated start site is 5 which is called by 21/37 non-draft phages but tallboi does not. Tallboi calls start site 12 @19849 which was called by 12/37 non-draft phages in the pham. /note=Location call: synteny shows that there is a similar gene in Eraser. Start site covers all coding potential. start is conserved in phamerator and starterator. This is a real gene. /note=Function call: Endolysin. phagesDB BLAST and NCBI BLAST had good hits (e value 0) with functions lysin A and endolysin. Cannot be lysin A since phage infects arthrobacter. HHpred has good hits (e value 6.6e-17, okay coverage 28%, high probability 99.7) for peptidoglycan recognition protein. CDD has good hits (e value 7.8e-11, coverage 22.5, identity 26.2) for a cleaving protein. /note=Transmembrane domains: no TMHs so cannot look at topcons /note=Secondary Annotator Name: McLinden, Katherine /note=Secondary Annotator QC: Looks good! I would just look at your HHPRED evidence that is checked since the one checked is for homo sapiens and not for a bacteriophage. Also, on NCBI BLAST I would make sure the check the evidence for Eraser since it has an E-value of 0 and is mentioned in your notes. CDS 21468 - 22082 /gene="25" /product="gp25" /function="deoxynucleoside monophosphate kinase" /locus tag="Tallboi_25" /note=Original Glimmer call @bp 21468 has strength 12.41; Genemark calls start at 21468 /note=SSC: 21468-22082 CP: yes SCS: both ST: SS BLAST-Start: [deoxynucleoside monophosphate kinase [Arthrobacter phage London]],,NCBI, q1:s1 100.0% 1.17303E-134 GAP: 137 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.235, -1.993391246735709, yes F: deoxynucleoside monophosphate kinase SIF-BLAST: ,,[deoxynucleoside monophosphate kinase [Arthrobacter phage London]],,QOP64329,97.549,1.17303E-134 SIF-HHPRED: c.37.1.1 (A:) Deoxynucleoside monophosphate kinase {Bacteriophage T4 [TaxId: 10665]},,,d1deka_,93.1373,99.9 SIF-Syn: Tallboi displays synteny with Amyev. Downstream gene is an endolysin, this gene is an deoxynucleoside monophosphate kinase, and the upstream gene for both has NKF. /note=PECAAN Notes /note=Primary Annotator Name: Ostroske, Elyse /note=Auto-annotation: Glimmer and GeneMark. Both called the start at 21468. /note=Coding Potential: Coding Potential for this gene is mainly found on the forward strand, indicating that this is a forward gene. Coding Potential was found on Host-trained GeneMark and GeneMark S. /note=SD (Final) Score: -1.993. It is by far the best final score on PECAAN for this gene. /note=Gap/overlap: 137 bp gap. This is quite large, but it is still the most reasonable gap out of the given start sites. Every other possible start site has a gap of greater than 500 bp. /note=Phamerator: As of 1/23/22, this gene is in pham 54294, which has 177 members, including many other AZ phages. /note=Starterator: The most annotated start site was 43 in 52/153 non-draft genomes. However, Tallboi does not have this start site. The start site called for Tallboi was 38, which was manually annotated in 31/153 non-draft genomes. Called 91.7% when it is present. This start site corresponds to 21468, the start site called by both Glimmer and GeneMark. /note=Location call: Despite the large gap, based on the above evidence, this is likely a real gene. The most likely start site is 21468, as it is called by both Glimmer and GeneMark, contains all the coding potential, has the best final score and the only non-absurd gap. /note=Function call: Deoxynucleoside monophosphate kinase. Many phages were checked as evidence in both PhagesDB BLAST and NCBI BLAST, and were either deoxynucleoside monophosphate kinases or adenylate kinases. However, the top hits in CDD and HHpred were all deoxynucleoside monophosphate kinases. /note=Transmembrane domains:TMDs were not predicted by TMHMM or TOPCONS. /note=Secondary Annotator Name: Uvarov, Evgeniy /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: Just don`t forget to do the synteny box; plus consider checking "1DEK_B" as function evidence in HHPRED since its description is "DEOXYNUCLEOSIDE MONOPHOSPHATE KINASE" CDS 22181 - 22759 /gene="26" /product="gp26" /function="hypothetical protein" /locus tag="Tallboi_26" /note=Original Glimmer call @bp 22181 has strength 11.9; Genemark calls start at 22181 /note=SSC: 22181-22759 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE13_gp27 [Arthrobacter phage Elezi] ],,NCBI, q1:s1 99.4792% 2.83201E-118 GAP: 98 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.006, -7.4954095581460995, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE13_gp27 [Arthrobacter phage Elezi] ],,YP_010678005,93.75,2.83201E-118 SIF-HHPRED: SIF-Syn: Upstream genes 22 and 23 also have NKF, while downstream genes 33 and 34 also have NKF. /note=Primary Annotator Name: Santos, Charysa /note=Auto-annotation: Glimmer and GeneMark. Both start at 22181. It has a Glimmer score of 11.9. /note=Coding Potential: Coding potential in this ORF is found in the direct/forward sequence, thus this is a forward gene. Coding potential is found in GeneMark Host and Self. /note=SD (Final) Score: -7.495. It is not the best/least negative final score on PECAAN for this gene; however, this start site (@22181) minimizes the gap size, unlike the other start sites which have over 300+ bp gaps. /note=Gap/overlap: 98. Relatively small overlap compared to the other start sites, meaning there is less space for another gene in between the previous gene and this one. /note=Phamerator: As of 01/07/2022, This gene is found in Pham 2502 and is conserved in other members of cluster AZ, including YesChef and Kaylissa. /note=Starterator: The “Most Annotated” start site (14) is present in 31 of 36 non-draft genes in this pham, and is present in Tallboi. This start site corresponds to base pair position 22181, the auto-annotated start site. /note=Location call: Using the evidence I found, this is a real gene and the most probable start site is 22181. /note=Function call: The top NCBI BLASTp hits showed that this gene is likely a hypothetical protein with high query coverage (99% and 100%), high % identity (88% and 84%), and low e-values (1.99e-118 and 1.53e-113). PhagesDB BLASTp hits suggested that function is unknown, with low e-values (1e-92). CDD and HHpred hits were not informative (very high e-values and low probabilities/coverages). Thus, there is not enough evidence to call the function of this gene. /note=Transmembrane domains: There were no TMD predictions on either TMHMM or TOPCONS, so we can conclude that this gene is not a membrane protein. /note=Secondary Annotator Name: Alvarez, Alondra /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. CDS 22973 - 23800 /gene="27" /product="gp27" /function="exonuclease" /locus tag="Tallboi_27" /note=Original Glimmer call @bp 22973 has strength 10.06; Genemark calls start at 22973 /note=SSC: 22973-23800 CP: yes SCS: both ST: SS BLAST-Start: [exonuclease [Arthrobacter phage Lego]],,NCBI, q1:s1 100.0% 0.0 GAP: 213 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.07, -2.356391881054909, yes F: exonuclease SIF-BLAST: ,,[exonuclease [Arthrobacter phage Lego]],,QIN94426,99.6364,0.0 SIF-HHPRED: Mitochondrial genome maintenance exonuclease 1; human MGME1, DNA complex, DNA exonuclease, DNA BINDING PROTEIN; 2.702A {Homo sapiens},,,5ZYT_C,84.7273,99.8 SIF-Syn: exonuclease, upstream gene does not have a known function, downstream gene is nucleoside deoxyribosyltransferase, just like phage Adolin. /note=Primary Annotator Name: Sheppy, Tyler /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 22973. The start codon is GTG. /note=Coding Potential: Both the Host-Trained and Self-Trained GeneMark show reasonable coding potential in the open reading frame and the chosen start site covers all of this potential. /note=SD (Final) Score: The Final Score is -2.356, which is the least negative of all of the candidates and the Z-score is 3.07, which is the highest of all of the candidates. /note=Gap/overlap: The gap between this gene and the upstream gene is 213 bp. There are no other candidates that reduce the gap. There is no coding potential in the gap between this gene and the upstream gene. /note=Phamerator: As of January 10, 2022, this gene is found in pham 95523. This pham is located in other phages in the AZ cluster, such as phage Adolin and phage Amyev. The functions called in this pham are variations of exonuclease. /note=Starterator: There is a reasonable start site that is conserved. The most conserved start site is number 36 and this corresponds to position 22973 on the phage. 32 of 106 non-draft genes call this start. /note=Location call: This is a real gene and it has a start site of 22973. This start site is predicted by both Glimmer and GeneMark and covers the coding potential in both the Host-Trained and Self-Trained GeneMark. This location is also confirmed by Starterator. /note=Function call: The function of this gene is exonuclease. The top hit in NCBI BLAST has an identity of 99.6364%, alignment of 99.6364%, 100% coverage, and an e-value of 0. The top hit in PhagesDB BLAST has an e-value of 1e-159. The top hit in HHpred has an e-value of 1.2e-18 and a coverage of 84.7273. Proteins in the same pham also call this function. CDD was not informative. /note=Transmembrane domains: There are no transmembrane domains predicted by TmHmm or TOPCONS. This is not a transmembrane protein. /note=Secondary Annotator Name: Baughman, Lexie /note=Secondary Annotator QC: I have QC`ed the location and functional calls and agree with the first annotator. In the Gap/Overlap section, I would mention that there is no coding potential upstream of the start site to demonstrate that a gene doesn`t need to be added. The PECAAN notes look good otherwise! CDS 23797 - 24162 /gene="28" /product="gp28" /function="nucleoside deoxyribosyltransferase" /locus tag="Tallboi_28" /note=Original Glimmer call @bp 23797 has strength 9.43; Genemark calls start at 23797 /note=SSC: 23797-24162 CP: yes SCS: both ST: SS BLAST-Start: [nucleoside deoxyribosyltransferase [Arthrobacter phage Powerpuff] ],,NCBI, q1:s1 100.0% 5.45848E-75 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.879, -5.813835316181514, yes F: nucleoside deoxyribosyltransferase SIF-BLAST: ,,[nucleoside deoxyribosyltransferase [Arthrobacter phage Powerpuff] ],,QGZ17325,96.6942,5.45848E-75 SIF-HHPRED: c.23.14.1 (A:9-160) Nucleoside 2-deoxyribosyltransferase {Trypanosome (Trypanosoma brucei) [TaxId: 5691]},,,d2f62a2,90.9091,99.7 SIF-Syn: This gene shows good synteny with fellow AZ phage Asa16. This gene appears to be a part of a well-conserved operon consisting of a Cas4 family exonuclease (pham 95523 1/10), this nucleoside deoxyribosyltransferase (pham 67497 1/10), and a LAGLIDADG endonuclease (pham 67874 1/10). This operon is also present in Adolin and Amyev, among others. /note=Primary Annotator Name: Stephenson, Juliet /note=Auto-annotation: GeneMark and Glimmer call start site @ 23797, codon GTG /note=Coding Potential: There is coding potential present in the first ORF (Host and Self-Trained GeneMark), and the selected start codon does include all the coding potential. /note=SD (Final) Score: This Start Site does have the best Final score (-5.814) as well as the best Z-score (1.879). /note=Gap/overlap: There is a 4 bp overlap, which indicates this gene is a part of an operon. /note=Phamerator: Pham 67497 on 1/10. Common in cluster AZ, mostly nucleoside deoxyribosyltransferase, though some other functions are present, such as deoxycytidylate deaminase and hydrolase /note=Starterator: Most annotated start is called. 23/34 non-draft phages call site 25, which corresponds to 23797 in Tallboi. /note=Location call: Based on the good coding potential and the number of hits in Phagesdb and HHPred, as well as synteny with other phages in Cluster AZ, this is a real gene with start site 23797. /note=Function call: NCBI Blast, PhagesDB Blast, and HHPred call up several matches with known nucleoside deoxyribosyltransferase. These have good (90%) query coverage, high % identity (>90%), and low e-values. Most other phages in cluster AZ call this gene as a nucleoside deoxyribosyltransferase. /note=Transmembrane domains: No transmembrane domains present according to TmHmm. /note=Secondary Annotator Name: Dooley, Naomi /note=Secondary Annotator QC: I have QC`d this gene and agree with the primary annotator. CDS 24159 - 24563 /gene="29" /product="gp29" /function="LAGLIDADG endonuclease" /locus tag="Tallboi_29" /note=Original Glimmer call @bp 24159 has strength 13.11; Genemark calls start at 24159 /note=SSC: 24159-24563 CP: yes SCS: both ST: SS BLAST-Start: [LAGLIDADG endonuclease [Arthrobacter phage Iter] ],,NCBI, q1:s1 100.0% 1.49177E-86 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.062, -2.4329349954350334, yes F: LAGLIDADG endonuclease SIF-BLAST: ,,[LAGLIDADG endonuclease [Arthrobacter phage Iter] ],,URQ05017,96.2687,1.49177E-86 SIF-HHPRED: I-CREI; ENDONUCLEASE, GROUP I MOBILE INTRON, INTRON HOMING, CHLOROPLAST DNA, LAGLIDADG MOTIF; 3.0A {Chlamydomonas reinhardtii} SCOP: d.95.2.1,,,1AF5_A,76.1194,99.5 SIF-Syn: Endonuclease from pham 67874. Upstream is a gene of pham 69497, downstream is a recombination directionality factor of pham 4822, just like in phage Adumb2043. /note=Primary Annotator Name: Thorp, Jocelyn /note=Auto-annotation: Glimmer and GeneMark agree with 24,159 as the start site. /note=Coding Potential: The gene has reasonable coding potential within the putative ORF in the forward direction. The selected start site encompasses all of the coding potential, and there is coding potential on both GeneMark Self and Host. /note=SD (Final) Score: 3.06; this is the best final score of all possible start sites. The z-score (-2.43) is also the best among all the possible start sites. /note=Gap/overlap: There is a 4 bp overlap with the previous gene. This amount of overlap is preferred by the ribosome and supports this as the start site. /note=Phamerator: As of 1/9/2022, this gene is in pham 67874. The pham is conserved within all phages of the AZ cluster. There is no function currently called for genes of this pham. /note=Starterator: Start site 11 is conserved in 26 of the 34 non-draft genes in the pham and it is called 100% of the time when present. This start site corresponds to 24,159 in Tallboi. /note=Location call: Based upon the evidence above, particularly the 4 bp overlap and good final/z-score, this is a real gene with the start site at 24,159. /note=Function call: LAGLIDADG-like endonuclease. BLASTp returned many hits with low e-values, indicating a strong match to other endonucleases within the AZ cluster. For example, from the NCBI BLASTp results, there is a 100% identity match with endonuclease from phage Adumb2043 with 100% coverage and an e-value of 7.45556 e-92. There was a match of 97% shared identity with an e-value of 2.32068e-89 with the LAGLIDADG endonuclease gene seen in AZ phages Elezi, London, and others. HHPred had some hits with other endonucleases, a 99.5% probability to DNA endonuclease I-CreI from Chlamydomonas reinhardtii, with 78.35% coverage and an e-value of 9.3 e-13. CDD returned a LAGLIDADG-like domain, but there was a low percent identity match (19.73%) and high e-value (.0000164). It is worth noting that this gene does not actually contain the LAGLIDADG sequence, but is very similar. /note=Transmembrane domains: No TMDs are predicted by TMHMM or TOPCONS, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wu, Meigan /note=Secondary Annotator QC: Perhaps also note that there is minor coding potential observed in Genemark Self in the three reverse frames. Also, this is very specific, but the AZ cluster has 35 members, and the pham is called in 33 AZ phages, so I would edit your phamerator evidence to say most AZ phages instead of all AZ phages. There also are functions noted for this pham now in PhagesDB. I think the CDD hit is significant since a low e-value is < 10^-3 for CDD and HHpred, so I would select this as evidence. Maybe you should only click and select evidence pointing to the LAGLIDADG endonuclease function instead of evidence for just an endonuclease function in PECAAN where possible, if you are making a LAGLIDADG endonuclease function call. Otherwise, I agree with your call and evidence! There seems to be more evidence for the LAGLIDADG endonuclease function, but let`s see what SEA-PHAGES says! Great job! CDS 24696 - 25409 /gene="30" /product="gp30" /function="recombination directionality factor" /locus tag="Tallboi_30" /note=Original Glimmer call @bp 24696 has strength 14.72; Genemark calls start at 24696 /note=SSC: 24696-25409 CP: yes SCS: both ST: SS BLAST-Start: [recombination directionality factor [Arthrobacter phage Powerpuff] ],,NCBI, q1:s1 100.0% 1.74849E-163 GAP: 132 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.21, -4.423424636048095, no F: recombination directionality factor SIF-BLAST: ,,[recombination directionality factor [Arthrobacter phage Powerpuff] ],,QGZ17327,97.4684,1.74849E-163 SIF-HHPRED: Gp3-like ; Recombination directionality factor-like,,,PF18897.3,88.1857,100.0 SIF-Syn: recombination directionality factor(from Pham 4822), upstream gene is from pham 67874, downstream is from pham 9039, just like in phage Amyev /note=Primary Annotator Name: Zhuang, Chuzhi /note=Auto-annotation: both Glimmer and Genemark agree on the start site #24696, start codon ATG /note=Coding Potential: have reasonable coding potential, chosen start site cover all this coding potential /note=SD (Final) Score: -4.423, it is not the best final score but is very close to the best one, which is -4.190 /note=Gap/overlap: 132, the gap is the smallest among all possible start sites and has longest open reading frame /note=Phamerator: pham number - 4822, date - 1/11/2022, the gene is conserved in other phages in AZ cluster, Amyev is used for comparison. No function specified. /note=Starterator: The conserved start site in the pham is 25, and it corresponds to 24696 in my phage. 42/85 of final genes called site 25. /note=Location call: real gene, start at #24696 /note=Function call: recombination directionality factor. NCBI BLAST, Phagesdb BLAST and HHPRED returned many hits of recombination directionality factor with very low E value. No CDD hits were returned. /note=Transmembrane domains: No hit for transmembrane domains. /note=Secondary Annotator Name: Lee, Adrienne /note=Secondary Annotator QC: I agree with the location call and function call. Don`t forget to fill out the coding potential menu. CDS 25409 - 25546 /gene="31" /product="gp31" /function="membrane protein" /locus tag="Tallboi_31" /note=Original Glimmer call @bp 25409 has strength 9.01; Genemark calls start at 25409 /note=SSC: 25409-25546 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 5.85334E-16 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.094, -4.455662434380964, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Adolin]],,QHB36611,91.1111,5.85334E-16 SIF-HHPRED: SIF-Syn: No known function, upstream gene is recombination directionality factor, downstream is NKF, just like in phage Adolin. /note=Primary Annotator Name: Erfanian, Kiana /note=Auto-annotation: Glimmer and GeneMark. Both called the same start site of 25409. /note=Coding Potential: This gene has good coding potential within the putative ORF, and covers all this coding potential. /note=SD (Final) Score: The RBS Final Score of -4.456 for the original start at 25409 is the not the highest of the suggested starts, but is still reasonable. The Z-score for this start is the second to highest at 2.094, and is a reasonable score. /note=Gap/overlap: This gene has an overlap of 1 bp with its upstream gene, given that the upstream gene stops at 25409; the same as the suggested start. This overlap suggests that the gene may be part of an operon. /note=Phamerator: This gene was found in pham 9309, which has 25 members, two of which are drafts. Additionally, all phages in this pham belong to cluster AZ. /note=Starterator: Using information from the Starterator analysis run most recently on 1/7/22, it was found that the most conserved start site number is 3. The auto-annotated start is called at start number 3 (25409), which matches the most conserved start. Phage Tallboi’s track contains start site 3 by a yellow line, which denotes it as an auto-annotated start. Start site 3 in Tallboi’s track corresponds to that of other phages in the cluster, such as phages DrSierra and Yang. Start site 3 has been determined to be the Final Human Annotated start, as represented by a green line on the track representing these phages. The analogous start site between Tallboi and other phages in this cluster is therefore promising, indicating that the auto-annotated start site 3 at 25409 is indeed correct. /note=Location call: The evidence gathered indicates that the suggested start site of 25409 as called by Glimmer and GeneMark appears to be the most probable site. /note=Function call: No known function. Every single hit on PhagesDB BLASTp lists genes of unknown function. NCBI BLASTp has a few hits with known functions, but are not top hits. Many of the top hits are hypothetical proteins, including the first hit. The same is true for HHPRED alignments. Given the above data, there is enough evidence to conclude that the function of this gene is indeed unknown. /note=Transmembrane domains: Only one TMD called by TmHmm, and none by TOPCONS. The protein is not a membrane protein. /note=Secondary Annotator Name: Magaling, Janelle /note=Secondary Annotator QC: 1bp overlap suggests this is a part of an operon so its okay that final score isn`t the best. I wouldn`t press HHpred evidence since they have high e values. Otherwise looks good! CDS 25621 - 25971 /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="Tallboi_32" /note=Original Glimmer call @bp 25621 has strength 9.8; Genemark calls start at 25621 /note=SSC: 25621-25971 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE12_gp31 [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 100.0% 5.70626E-76 GAP: 74 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.081, -2.253486910374644, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE12_gp31 [Arthrobacter phage Adumb2043] ],,YP_010677941,98.2759,5.70626E-76 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (pham 9039), downstream is NrdH-like glutaredoxin, just like in phages Niobe, London and others from the same cluster AZ /note=Primary Annotator Name: Khaine, Aye Myat /note=Auto-annotation: Both Glimmer and GeneMark agree to call the start site at 25621. /note=Coding Potential: The ORF has a good coding potential in the forward direction and the start at 25621 includes all the coding potential. /note=SD (Final) Score: The final SD score is -2.253 with a z-score of 3.081. This is the best score. /note=Gap/overlap: There is a 74 bp gap upstream. No coding potential observed in the gap. This start site is the LORF with the least gap upstream. /note=Phamerator: This gene belongs to pham 64637 as of 01/10/2022 with 24 members. 22 members are non-draft phages of the same cluster AZ. /note=Starterator: Starterator calls start site 3 at 25621 which is the most annotated start with 21 manual annotations. This agrees with Glimmer and GeneMark. /note=Location call: Based on the evidence, this gene is real with the start site at 25621. /note=Function call: Unknown function (NKF). Top ten phagesdb and five NCBI BLAST hits have unknown function. There is no reliable evidence from HHpred hits. There are no CDD hits. /note=Transmembrane domains: No TMDs are predicted by TMHMM and TOPCONS. This is not a membrane protein. /note=Secondary Annotator Name: Uvarov, Evgeniy /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. CDS 25971 - 26237 /gene="33" /product="gp33" /function="NrdH-like glutaredoxin" /locus tag="Tallboi_33" /note=Original Glimmer call @bp 25971 has strength 14.97; Genemark calls start at 25971 /note=SSC: 25971-26237 CP: yes SCS: both ST: SS BLAST-Start: [glutaredoxin [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 3.84852E-44 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.836, -2.827683592113848, yes F: NrdH-like glutaredoxin SIF-BLAST: ,,[glutaredoxin [Arthrobacter phage Yang] ],,YP_009815651,87.5,3.84852E-44 SIF-HHPRED: c.47.1.1 (A:) Glutaredoxin-like NRDH-redoxin {Escherichia coli [TaxId: 562]},,,d1h75a_,86.3636,99.1 SIF-Syn: NrDH-like glutaredoxin. In comparing gene 35 Tallboi to the equivalent gene 34 in Eraser (AZ), both show function of NrDH-like glutaredoxin. The downstream gene has no known function and the upstream gene 38 for Tallboi and gene 35 for Eraser both have a function of Holliday junction resolvase. However, genes 36 (metallophosphoesterase) and 37 (NKF) exist only in Tallboi. This occurs in a similar fashion when comparing TallBoi to Lizalica, but there is a gene between the two above mentioned genes in the comparison genome. /note=Primary Annotator Name: McLinden, Katherine /note=Auto-annotation: Both Glimmer and GeneMark show the suggested start site at 25971 with a Glimmer score of 14.97. The start codon at this site is ATG. /note=Coding Potential: Both GeneMark and Glimmer suggested coding potential at the putative ORF. The coding potential covers all of the start site and stop site. /note=SD (Final) Score: -2.828. This is the best Final Score for this gene. /note=Gap/overlap: The overlap is -1. This is the most likely start site because the others cause an overlap of over 200 base pairs or a gap of 182 base pairs, which is much less likely. /note=Phamerator: The pham for the gene is 95583. The date is 1/10/22. There are 930 members spread across several clusters. /note=Starterator: Does not call the most-annotated start site. Start site 84 was called 0.3% of the time when present. Start 84 in Tallboi is 25971, which was suggested by Glimmer and GeneMark. /note=Location call: This is most likely a real gene. Based on Glimmer and GeneMark auto annotation and the coding potential, it is likely that 25971 is the start site for this gene. /note=Function call: PhagesDB Function Frequency list several hits which suggest either nrdh-like glutaredoxin, glutaredoxin, or nrdh glutaredoxin, although all with varying frequencies. 8/10 of the top hits for PhagesDB Blastp show nrdh-like glutaredoxin as the function call where the E-values are all below 2e-33. HHPRED had no informative evidence because most hits aligned to bacterium instead of phages. NCBI BLASTp also has 9/10 of the top hits suggest nrdh-like glutaredoxin with e-values ranging from 2.40881e-35 to 8.9124e-53. The other protein function call was for glutaredoxin with an E-value of 2.0451e-43. The top hit in CDD is also nrdh-like glutaredoxin with an e-value of 3.74619e-24. From this information, the function is most likely nrdh-like glutaredoxin for this gene. /note=Transmembrane domains: There were no TMD`s called and no other evidence to suggest a TMD function within the other databases. /note=Secondary Annotator Name: Ostroske, Elyse /note=Secondary Annotator QC: I have QC`d this gene and agree with the primary annotator. CDS 26234 - 26827 /gene="34" /product="gp34" /function="metallophosphoesterase" /locus tag="Tallboi_34" /note=Original Glimmer call @bp 26234 has strength 10.13; Genemark calls start at 26234 /note=SSC: 26234-26827 CP: yes SCS: both ST: SS BLAST-Start: [phosphoesterase [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 95.9391% 3.73902E-110 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.99, -2.5052746077145835, yes F: metallophosphoesterase SIF-BLAST: ,,[phosphoesterase [Arthrobacter phage DrManhattan] ],,YP_009815377,85.7143,3.73902E-110 SIF-HHPRED: MPP_AQ1575; Aquifex aeolicus AQ1575 and related proteins, metallophosphatase domain. This family includes bacterial and archeal proteins homologous to AQ1575, an uncharacterized Aquifex aeolicus protein.,,,cd07390,90.3553,99.8 SIF-Syn: Metallophosphoesterase (pham 95602), upstream gene is NrdH-like glutaredoxin (pham 95583), first downstream is NKF (pham 82637) and second downstream is holliday junction resolvase (pham 94894). In Adolin the first upstream gene is NKF (pham 18209) second upstream is NrdH-like glutaredoxin (pham 95583), the downstream gene is holliday junction resolvase (pham 94894). /note=Primary Annotator Name: Uvarov, Evgeniy /note=Auto-annotation start source: Glimmer and GeneMark both call the start at 26234 (site 3) with a common ATG start codon. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF by both Glimmer and GeneMark. The chosen start (site 3) covers all the coding potential. This ORF only has forward strand coding potential, thus this is a forward gene. /note=SD (Final) Score: Start site 3 has a final score of -2.505 and a good Z-score of 2.99. This start site has the best (highest) values on PECAAN for this gene. /note=Gap/overlap: Start (site 3) has a gap of -4 bp strongly indicative of a possible operon. This start site does not create the LORF and has a gene length of 594 bp which is good. /note=Phamerator: The pham number as of 1/7/2022 is 95602. The gene is conserved in phages Adolin(AZ), Adumb2043 (AZ), and A3Wally (GD), as well as other non draft phages. Any of these genomes can be used for comparison since they are all non-draft, but those of the AZ cluster are better since Tallboi is from AZ. Based on PhagesDB the function call for the gene is a phosphoesterase or a metallophosphoesterase. Both of these are on the approved SEA-PHAGES list. /note=Starterator: Based on the 1/7/2022 run the most annotated start site 45 is a reasonable choice that is conserved among members of pham 95602. There are 113 members total with 97 being non-draft. 23/97 of non-draft members and 6/16 draft members call start site 45, which correlates to 26234 (site 3) for Tallboi. /note=Location call: Considering all of the evidence above, this gene is a real gene that is conserved in phamerator as well as starterator, has good coding potential and covers all of it with a start site at 26234 (site 3). Starterator agrees with Glimmer and Genemark. /note=Function call: Metallophosphoesterase. PhagesDB BLASTp top second, third, fourth non-draft hits ranked by score with known functions are metallophosphoesterase and so are other significant hits. Second is DrManhattan (e: 7e-93, id: 81%), third is Adolin (e: 2e-91, id: 79%), and fourth is KeAlii (e: 1e-90, id: 79%). All these phages are in cluster AZ, same as Tallboi. Phagesdb Function Frequency shows metallophosphoesterase with the highest total frequency (12%) for the DI cluster, 5% for the AZ cluster, as well as the majority of other clusters. NCBI BLASTp shows three strong hits with low e-values that are metallophosphoesterase in DrManhattan (e: 2.62841e-110, id: 78.5714%, cov: 95.9391%), Adolin (e: 9.7933e-109, id: 76.5306%, cov: 95.9391%), and KeAlii (e: 5.12049e-108, id: 75.1244%, cov: 95.9391%). There is a strong CDD hit for metallophosphatase domain (e: 1.65044e-29, id: 34.1176%, cov: 85.7868%). HHpred top third hit was closely related to the structure and function of metallophosphatase. This hit was cd07390 (prob: 99.8%, cov: 90.3553%, e: 1.1e-17) an Aquifex aeolicus AQ1575, metallophosphatase domain. Lots of strong evidence for phosphoesterase was also present because it is similar to metallophosphatase but a more broad category. It was decided that there is enough evidence to choose the more specific metallophosphatase function. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore this is not a membrane protein. /note=Secondary Annotator Name: Santos, Charysa /note=Secondary Annotator QC: It seems like a close call, but I think that there might actually be more evidence to support phosphoesterase as the function call for this gene. For some of the evidence, phosphoesterase has stronger and more frequent hits, but I would ask an instructor for a third opinion, just in case. CDS 26824 - 27042 /gene="35" /product="gp35" /function="hypothetical protein" /locus tag="Tallboi_35" /note=Original Glimmer call @bp 26824 has strength 10.0; Genemark calls start at 26824 /note=SSC: 26824-27042 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_ASCELA_36 [Arthrobacter phage Ascela]],,NCBI, q1:s1 90.2778% 1.69823E-25 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.934, -2.5594668560256912, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ASCELA_36 [Arthrobacter phage Ascela]],,WGH21559,83.8235,1.69823E-25 SIF-HHPRED: SIF-Syn: In the AZ cluster the gene is only found in phage Crewmate_40. Like the phage, the gene has NKF and the downstream gene is Holliday junction resolvase. Upstream gene is not conserved. /note=Primary Annotator Name: Alvarez, Alondra /note=Auto-annotation: Both Glimmer and GeneMark agree on start site 26824. /note=Coding Potential: In both Host-Trained and Self-Trained GeneMark, the coding potential present in the ORF is included in the start site 26824. Coding potential is found on forward strand, indicating that it is a forward gene. /note=SD (Final) Score: The chosen start site has a z-score of 2.934 and a final RBS score of -2.559. While the start site does not have the best scores out of the other potential sites, it provides the shortest overlap and gap. The chosen start site has a common ATG start codon. /note=Gap/overlap: There is an overlap of 4 bp between the gene and its upstream gene. This overlap may indicate that it is part of an operon. /note=Phamerator: As of 1/10/2022 the gene belongs to pham 82637. Other than Tallboi, the only other member of the pham is Crewmate_40 belonging to the AZ cluster. Phamerator does not call a function. /note=Starterator: The non-draft gene belonging to the pham calls a different start site number (five) than Tallboi (four). Because there are only two members in the pham, including draft gene Tallboi, the starterator data was not informative. /note=Location call: Based on the coding potential within the ORF, possible association with an operon, a low e-value hit in phagesDB BLAST and synteny with phage Crewmate_40, this gene is real. The start site is called by both GLIMMER and GeneMark with a good RBS final score and z-score. /note=Function call: One hit returned by phagesDB BLASTp for gene Crewmate_40 with an e-value of 6x10^-22. Function is unknown. Two hits returned from NCBI BLASTp for hypothetical proteins, neither with low e-values. No significant hits returned by HHpred (lowest e-value was 35). No output returned in CDD. /note=Transmembrane domains: TmHmm nor Topcons predict transmembrane domains. The gene is not a membrane protein. /note=Secondary Annotator Name: Sheppy, Tyler /note=Secondary Annotator QC: I disagree with the notes for the location call. I think that this gene is a real gene because there is significant coding potential in the open reading frame. Also, the start site is called by both Glimmer and GeneMark and this start has a reasonable gap and a good z-score and final score. This gene was also manually annotated in phage Crewmate, which is another phage in the AZ cluster. Considering all of this evidence, I would call this a real gene with a start site at 26824. I agree with the function call. CDS 27029 - 27379 /gene="36" /product="gp36" /function="Holliday junction resolvase" /locus tag="Tallboi_36" /note=Original Glimmer call @bp 27029 has strength 10.0; Genemark calls start at 27029 /note=SSC: 27029-27379 CP: yes SCS: both ST: SS BLAST-Start: [RusA-like Holliday junction resolvase [Arthrobacter phage Elezi] ],,NCBI, q1:s1 100.0% 1.13302E-73 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.99, -3.095100142625534, yes F: Holliday junction resolvase SIF-BLAST: ,,[RusA-like Holliday junction resolvase [Arthrobacter phage Elezi] ],,YP_010678013,96.5517,1.13302E-73 SIF-HHPRED: HOLLIDAY-JUNCTION RESOLVASE; HYDROLASE, ENZYME, HOMOLOGOUS RECOMBINATION, HOLLIDAY JUNCTION RESOLVING ENZYME, NUCLEASE, ARCHAEA, THERMOPHILE; HET: EDO, SO4; 1.8A {SULFOLOBUS SOLFATARICUS} SCOP: c.52.1.18,,,1OB8_B,84.4828,99.5 SIF-Syn: Holliday junction resolvase (pham 94894), downstream gene is DNA primase/helicase (pham 83049), just like in phage Kaylissa. Upstream gene is pham 82637 but does not show synteny. /note=Primary Annotator Name: Baughman, Lexie /note=Auto-Annotation: Glimmer and Genemark. Both agree on the same start site of 27029, with a start codon of TTG. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF. Chosen start site of 27029 covers all of the coding potential. /note=SD (Final) Score: The SD (Final) Score is -3.095, and the Z-score is 2.99. These are the best out of all the listed possible start sites. /note=Gap/Overlap: There is a 14 base pair overlap between this gene and the upstream gene, which may be cause for concern. The auto-annotated start site does not create the longest ORF, but the start site that does (position 26903) creates a 140 base pair overlap. In both cases, the length of the gene is acceptable. /note=Phamerator: As of 01/10/2022, the gene is found in Pham 94894. The pham is conserved in other members of the cluster - comparison was done between Tallboi and a few other non-draft genomes, including Kaylissa and Lizalica. Both Phamerator and PhagesDB called the function of this gene as “holliday junction resolvase,” which is on the approved function list. /note=Starterator: The “Most Annotated” start site (37) is present in 26 of 86 non-draft genes in this pham, and it is present in Tallboi. This start site corresponds to base pair position 27029, which is the auto-annotated start site. /note=Location Call: The gathered evidence suggests that this is a real gene and that its start site is likely at position 27029. While this does create a potentially concerning overlap with the upstream gene, this start site is conserved in Starterator and covers all of the coding potential. Choosing another start site would exacerbate the overlap or create a large gap with the upstream gene. /note=Function Call: The top 2 NCBI BLASTp hits` suggested function is holliday junction resolvase, with high query coverage (100%), high % identity (>91.38%), and low e-values (<2e-73). The top 2 PhagesDB BLASTp hits’ suggested function is holliday junction resolvase, with high % identity (92%) and low e-values (<2e-59). Thus, the two databases seem to be in agreement. While there were no hits in CDD, the top two hits in HHpred were informative - with high probabilities (>99.22%), high coverage (>84.4828%), and a low e-value (<7.8e-10) - that listed a similar function. This function is also conserved in a finalized phage genome (Kaylissa and Lizalica, for example). /note=Transmembrane Domains: No predicted transmembrane domains. /note=Secondary Annotator Name: Stephenson, Juliet /note=Secondary Annotator QC: I agree with this annotation and find that all evidence has been considered. CDS complement (27376 - 27555) /gene="37" /product="gp37" /function="hypothetical protein" /locus tag="Tallboi_37" /note=Genemark calls start at 27555 /note=SSC: 27555-27376 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_LONDON_36 [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 5.89164E-25 GAP: 189 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.323, -4.938456714086051, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LONDON_36 [Arthrobacter phage London] ],,QOP64339,88.1356,5.89164E-25 SIF-HHPRED: SIF-Syn: Upstream gene is Holliday junction resolvase, downstream is DNA primase/helicase, just like in Asa16, Adumb2043, Amyev and other phages in Cluster AZ /note=Primary Annotator Name: Dooley, Naomi /note=Auto-annotation: The gene is called by Gene mark at the start site 27555 along with a start codon of ATG. This is a very common start codon so there is reason to believe that this may be the correct start site. The gene was not called by Glimmer /note=Coding Potential: The gene has reasonable coding potential that is covered by the start site. /note=SD (Final) Score: The SD score was reasonable at -4.938 which is the second best score listed. The best score has a very large gap that is not reasonable. The z-score of 2.323 (>2) suggests that this may be the correct start site. This Z-score was the best score listed. /note=Gap/overlap: There is a large gap of 189bp which is the smallest gap listed. This is an unreasonable gap and it is likely for a new gene to be found in this gap. /note=Phamerator: The pham number as of 1/12/2022 is 788 and is found in Adumb2043 /note=Starterator: The start number is 15 which corresponds to a 27555 start site /note=Location call: Based on the data it appears that there is a chance that this is not a real gene. While the gene does display synteny with phage Adumb2043, there is a large gap between the upstream and downstream genes. Additionally the gene is the only reverse gene found in between a large number of forward genes. /note=Function call: Based on the phages Adumb and Asa16, it is safe to conclude that the function of this gene is unknown. The top two BLAST hits on PhagesDB both have E-values of 7e-23 and 4e-22; additionally, they both have high scores of 104 and 101, both of which have a unknown function listed (both have high coverage, 100% for both, 65%+ identity for both and e-value of 1.4e-25 for both). CDD contains no hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs. Thus we can conclude that this is not a membrane protein. /note=Secondary Annotator Name: Thorp, Jocelyn /note=Secondary Annotator QC: Based upon the evidence above, I believe this is in fact a real gene. Though a sudden change of gene direction, there is a significant amount of synteny displayed with other phages of the cluster. I agree with the function call. CDS 27745 - 30258 /gene="38" /product="gp38" /function="DNA primase/helicase" /locus tag="Tallboi_38" /note=Original Glimmer call @bp 27745 has strength 8.67; Genemark calls start at 27745 /note=SSC: 27745-30258 CP: yes SCS: both ST: SS BLAST-Start: [DNA primase [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 0.0 GAP: 189 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.99, -2.523003374675015, yes F: DNA primase/helicase SIF-BLAST: ,,[DNA primase [Arthrobacter phage DrManhattan] ],,YP_009815380,97.8495,0.0 SIF-HHPRED: Primase; primase, helicase, ssDNA-binding protein, TRANSFERASE; HET: SO4; 2.406A {Nitratiruptor phage NrS-1},,,6K9C_A,39.9044,100.0 SIF-Syn: DNA primase/helicase. Tallboi genes 39 (upstream), 40, and 41 (downstream) show synteny with Adolin genes 36, 37, and 38, respectively. Tallboi gene 39 and Adolin gene 36 are both members of pham 788. Tallboi gene 40 and Adolin gene 37 are both DNA primase/helicase genes. Tallboi gene 41 and Adolin gene 38 are both members of pham 54541. /note=PECAAN Notes /note=Primary Annotator Name: Wu, Meigan /note=Auto-annotation: Glimmer and GeneMark both call the start site at 27745. /note=Coding Potential: Coding potential found in GeneMark Self in a forward and reverse reading frame. Coding potential is also found in GeneMark Host in a forward and reverse reading frame. The ORF observed in the reverse reading frames do not match the autoannotated start and stop site, however. Therefore, this gene is most likely a forward gene. /note=SD (Final) Score: Final score is -2.523, which is the best final score predicted by PECAAN. /note=Gap/overlap: Gap size is 189, which is the smallest gap size provided on PECAAN. This is a reasonable gap size to accommodate for the gene direction switch from forward to reverse to forward. /note=Phamerator: As of January 10, 2022, the pham number is 83049. This gene is conserved in phages Adolin, Amyev, and DrManhattan, which share the same cluster as Tallboi (cluster AZ). The function noted for this gene is DNA primase/helicase. /note=Starterator: Pham 83049 has 83 non-draft members. Start number 32 was called in 41/83 non-draft phages, which is the most-called start number. This gene called start number 32, which corresponds to the start site being 27745 bp. The start site predicted by Starterator matches the start site called by both Glimmer and GeneMark. /note=Location call: This gene is likely a real gene with a start site at 27745. /note=Function call: DNA primase/helicase. Numerous hits found on PhagesDB BLASTp correspond to the DNA primase/helicase function yielding an e-value of 0. Multiple hits found on NCBI BLASTp with an e-value of 0 for the DNA primase/helicase function as well. CDD yielded a COG3378 hit and primase_Cterm hit, which both relate to a phage/plasmid-associated primase, and the respective e-values are 7.39e-88 and 1.60e-86. HHpred shows multiple hits corresponding to DNA primase genes. HHpred hit 6K9C_A has a probability of 99.96%, e-value of 9.3e-28, and coverage of 39.90%. HHpred hit 2AU3_A has a probability of 99.89%, e-value of 1.4e-21, and coverage of 36.44%. /note=Transmembrane domains: No TMDs predicted by TMHMM and TOPCONS. This gene is not a membrane protein. /note=Secondary Annotator Name: Zhuang, Chuzhi /note=Secondary Annotator QC: I agree with your annotation. Good job! CDS 30267 - 30380 /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="Tallboi_39" /note=Original Glimmer call @bp 30267 has strength 12.37; Genemark calls start at 30267 /note=SSC: 30267-30380 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD88_gp38 [Arthrobacter phage Amyev] ],,NCBI, q1:s1 100.0% 4.321E-15 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.089, -4.4057176022952405, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD88_gp38 [Arthrobacter phage Amyev] ],,YP_010677741,91.8919,4.321E-15 SIF-HHPRED: SIF-Syn: NFK (pham 54541), upstream gene is DNA primase/helicase downstream gene is DNA polymerase I just like phage Adumb2043. /note=Primary Annotator Name: Lee, Adrienne /note=Auto-annotation: Both GeneMark and Glimmer call the start site at 30267. /note=Coding Potential: There is an open reading frame at the expected start and stop site, but it doesn’t look like there is coding potential at the stop site. /note=SD (Final) Score: The final score is -4.406, which is the least negative score on PECAAN. /note=Gap/overlap: There is an 8 base pair gap. This is reasonable because it is extremely small and there is no coding potential in that gap. /note=Phamerator: This gene is part of Pham 54541 as of January 10, 2022. This pham has 27 phages in total including Adumb2043 and Amyev. All of the phages have a gene length of 111, 114, or 117 which is consistent with this phage’s 114 gene length given the recommended start site. /note=Starterator: The most conserved start site is start site 9, which does not exist for Tallboi. Starterator calls start site 7 at 30267 for Tallboi. It is also manually annotated in 7/25 non-draft genes in the pham and called 100% of the time when present. /note=Location call: Based on the evidence, this is a real gene and the start site is 30267, which was called by GeneMark and Glimmer. This start site was also determined by Starterator and is conserved in other phages in the pham. /note=Function call: NKF: There were no hits from CDD and no good hits from HHpred. They all had high e-values. Based on Phagesdb BLAST, the other phages in the same cluster and pham also have unknown function for this gene. All the phages came up as hypothetical protein for NCBI BLAST, all with extremely small e-values. DrSierrafrom cluster AZ had an e-value of 9e-15 and had function unknown in Phagesdb BLAST and NCBI BLAST showed an identity of 86.4865%, aligned 94.5946%, and an e-value of 1.52819e-14. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Erfanian, Kiana /note=Secondary Annotator QC: Everything looks great. CDS 30542 - 32407 /gene="40" /product="gp40" /function="DNA polymerase I" /locus tag="Tallboi_40" /note=Original Glimmer call @bp 30542 has strength 10.06; Genemark calls start at 30542 /note=SSC: 30542-32407 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase I [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 0.0 GAP: 161 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.99, -2.5052746077145835, yes F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Arthrobacter phage Adolin]],,QHB36622,91.0377,0.0 SIF-HHPRED: Prex DNA polymerase; DNA polymerase, TRANSFERASE; HET: SO4; 2.9A {Plasmodium falciparum},,,5DKT_A,96.9404,100.0 SIF-Syn: DNA polymerase I, upstream gene is NKF (pham 54541), downstream is NKF (49784), just like in phage Phives /note=Primary Annotator Name: Magaling, Janelle /note=Auto-annotation: Glimmer and GeneMark agree on start 30542. /note=Coding Potential: There is one forward ORF in Self and Host trained GeneMark. Start site covers all coding potential. /note=SD (Final) Score: Final score is -2.505 which is not good but it is the best out of the starts. Z score is 2.99 which is good. This is the LORF. /note=Gap/overlap: 161 bp gap which is a bit large. Length of 1866 bp is good for a gene. Pham maps show that there is synteny of this gene with Niobe. /note=Phamerator: 1/9/22 pham 47481. There are 41/859 phages such as Adolin and Eraser in cluster AZ are in the same pham and cluster as Tallboi. All genes with functions called for DNA polymerase I which is in the approved function list. /note=Starterator: The most annotated start site is 60 which was called in 779/810 non-draft genes in the pham. Tallboi also calls site 60 @30542. /note=Location call: Start 60 at 30542 is conserved in phamerator and starterator and covers all coding potential. The Glimmer and Genemark agree on the same start site. This is a real gene. /note=Function call: Top phagesdb blast hits (e value >1e-167) is DNA polymerase I. CDD has a good hit (e value 5.80e-67) which is DNA polymerase I. HHpred had good hits (e value > 1.1e-71) with function DNA polymerase. NCBI blast had hits with e value 0, high cover (100%) and high identity (97%) with function DNA polymerase I. /note=Transmembrane domains: No TMHs so cannot look at topcons and it is not a membrane protein. /note=Secondary Annotator Name: Khaine, Aye Myat /note=Secondary Annotator QC: I generally agree with the annotation above. However, there is a small region of coding potential outside the start called. I would indicate the start site in the location call and also that it is not a membrane protein explicitly. CDS 32404 - 32592 /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="Tallboi_41" /note=Original Glimmer call @bp 32404 has strength 4.42; Genemark calls start at 32419 /note=SSC: 32404-32592 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein HOU48_gp41 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 2.70425E-21 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.305, -4.40029072999159, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp41 [Arthrobacter phage DrManhattan] ],,YP_009815384,88.8889,2.70425E-21 SIF-HHPRED: SIF-Syn: Tallboi displays synteny with Asa16. The downstream genes are in pham 47481 and are DNA Polymerase I, while the upstream genes are in pham 97248 and are RNA Polymerase sigma factors. /note=PECAAN Notes /note=Primary Annotator Name: Ostroske, Elyse /note=Auto-annotation: Glimmer and GeneMark. Glimmer called the start at 32404 and GeneMark called the start at 32419. /note=Coding Potential:Coding Potential is mainly on the forward strand, indicating that this is a forward gene. Coding Potential was found in both Host-trained GeneMark and GeneMark S. /note=SD (Final) Score: -4.400. This score is good but not the best. /note=Gap/overlap: -4 (likely an operon) /note=Phamerator: As of 1/23/22, this gene is in pham 49784. This pham has 42 members, 24 of which are in cluster AZ. /note=Starterator: Start Site 10 was manually annotated in 16/29 non-draft genomes. Start Site 10 corresponds to 32419, the start site called by GeneMark. Start site 8, which is automatically called for this gene, has 13 manual annotations out of 29 non-draft genomes, and corresponds to 32404, the start site called by Glimmer. /note=Location call: Based on the above evidence, this is a real gene. The start site could be either 32404 or 32419. Both start sites have similar Z-scores and final scores, although 32419’s final score is slightly better. I chose 32404 as the start site, though, because it incorporates all the coding potential, has a more likely overlap (-4, likely an operon), is the LORF, was automatically and manually called on Starterator, and Glimmer is preferred over GeneMark. /note=Function call: NKF. Many phages from cluster AZ were checked as evidence under PhagesDB BLAST due to their high e-values and query cover, but all had no known function. The top three hits in NCBI BLAST (DrManhattan, Tbone, Adolin) were also checked for their high e-values, and all were described as hypothetical proteins. No hits were found in CDD and no good hits were found in HHpred. /note=Transmembrane domains: TMDs were not predicted by TMHMM or TOPCONS. /note=Secondary Annotator Name: McLinden, Katherine /note=Secondary Annotator QC: Looks good! I agree on your start site choice. Please make sure to fill out phamerator notes and synteny box. Make sure to select the proper option for the drop down menu on coding potential. CDS 32585 - 32887 /gene="42" /product="gp42" /function="DNA ligase" /locus tag="Tallboi_42" /note=Original Glimmer call @bp 32585 has strength 11.52; Genemark calls start at 32585 /note=SSC: 32585-32887 CP: yes SCS: both ST: SS BLAST-Start: [DNA ligase [Arthrobacter phage Powerpuff] ],,NCBI, q4:s5 97.0% 2.3391E-56 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.99, -2.970161406017234, yes F: DNA ligase SIF-BLAST: ,,[DNA ligase [Arthrobacter phage Powerpuff] ],,QGZ17343,91.0891,2.3391E-56 SIF-HHPRED: d.142.2.2 (A:1-314) Adenylation domain of NAD+-dependent DNA ligase {Thermus filiformis [TaxId: 276]},,,d1dgsa3,98.0,99.5 SIF-Syn: This gene shows synteny with all phages in cluster AZ except for KeAlii, Liebe, Maureen, and Tweety19 as a DNA ligase gene. This indicates that this gene is highly conserved within these genomes. For instance, compared to phage Amyev, downstream genes 39 and 40 for Tallboi and Amyev are a primase/helicase and DNA polymerase I, respectively. The upstream gene 43 also does not have a known function for both Amyev and Tallboi. /note=Primary Annotator Name: Santos, Charysa /note=Auto-annotation: Glimmer and GeneMark. Both start at 32585. It has a Glimmer score of 11.52. /note=Coding Potential: Coding potential in this ORF is found in the direct/forward sequence, thus this is a forward gene. Coding potential is found in GeneMark Host and Self. /note=SD (Final) Score: -2.970. It is the best/least negative final score on PECAAN for this gene. This start site (@32585) minimizes the gap size (-8), unlike the other start sites with larger gaps. /note=Gap/overlap: -8. Relatively small overlap compared to the other start sites, meaning there is less space for another gene in between the previous gene and this one. Phamerator: As of 01/07/2022, This gene is found in Pham 9766 and is conserved in other members of cluster AZ, including Tbone and YesChef. /note=Starterator: The “Most Annotated” start site (14) is present in 12 of 28 non-draft genes in this pham, and is present in Tallboi. This start site corresponds to base pair position 32585, the auto-annotated start site. /note=Location call: Using the evidence I found, this is a real gene and the most probable start site is 32585. /note=Function call: The top NCBI BLASTp hits showed that this gene is likely a DNA ligase due to its high query coverage, high % identity, and low e-values. PhagesDB BLASTp hits suggested that function is DNA ligase, also with low e-values. HHpred hits were informative and suggested DNA ligase as the function with low e-values and high coverages (>98%) and probabilities (>99.5%). Thus, there is enough evidence to call the function of this gene as DNA ligase. CDD also had hits with high coverages (>96%) and low e-values (e-57). /note=Transmembrane domains: There were no TMD predictions on either TMHMM or TOPCONS, so we can conclude that this gene is not a membrane protein. /note=Secondary Annotator Name: Uvarov, Evgeniy /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: In the synteny box make sure to mention the upstream and downstream gene functions/phams as well and how they match or do not match with other phage genomes; possibly consider checking a few more evidence boxes for HHPRED and CDD that have "DNA ligase" in the descriptor CDS 32884 - 33189 /gene="43" /product="gp43" /function="hypothetical protein" /locus tag="Tallboi_43" /note=Original Glimmer call @bp 32884 has strength 8.76; Genemark calls start at 32884 /note=SSC: 32884-33189 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD88_gp43 [Arthrobacter phage Amyev] ],,NCBI, q1:s1 100.0% 1.0913E-61 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.741, -3.3136498407041004, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD88_gp43 [Arthrobacter phage Amyev] ],,YP_010677746,98.0198,1.0913E-61 SIF-HHPRED: SIF-Syn: No known function, upstream gene is DNA ligase, downstream gene is RNA polymerase sigma factor, which is similar to phage Adolin. The corresponding gene in phage Adolin does not have a known function, its upstream gene is DNA ligase, and its downstream gene is a DNA binding protein. /note=Primary Annotator Name: Sheppy, Tyler /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 32884. The start codon is ATG. /note=Coding Potential: There is reasonable coding potential observed on both the Host-Trained and Self-Trained Genemark for the open reading frame. The start site covers all of this potential. /note=SD (Final) Score: The Final Score for the start site is -3.314 which is the least negative of all of the other candidates. The Z-score is 2.741, which is the highest of all of the other candidates. /note=Gap/overlap: There is an overlap of 4bp (gap of -4bp). This is indicative of this gene being part of an operon. /note=Phamerator: As of January 10, 2022, this gene is found in pham 16103. Genes in this pham are found in other phages in the AZ cluster, such as phage Adolin and phage Amyev. Phamerator does not have a function called for this gene. /note=Starterator: There is a start site that is conserved among members of this pham. The most-conserved start site is start site 29 and it corresponds to position 32884 in the phage. 64 of 71 non-draft genes call this start. /note=Location call: This is a real gene and it has a start site of 32884. The start site is predicted by both Glimmer and GeneMark. The coding potential is covered by the open reading frame, as shown by the Host-Trained and Self-Trained GeneMark. This location is also confirmed by Starterator. /note=Function call: The function of this protein is unknown. The top hits in Phagesdb BLAST and NCBI BLAST do not have a known function. CDD and HHpred were not informative. Phamerator does not have any functions called for genes in this pham. /note=Transmembrane domains: There are no transmembrane domains predicted by TmHmm or TOPCONS. This protein is not a transmembrane protein. /note=Secondary Annotator Name: Alvarez, Alondra /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. CDS 33369 - 34190 /gene="44" /product="gp44" /function="DNA binding protein" /locus tag="Tallboi_44" /note=Original Glimmer call @bp 33369 has strength 10.29; Genemark calls start at 33369 /note=SSC: 33369-34190 CP: yes SCS: both ST: SS BLAST-Start: [RNA polymerase [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 99.6337% 2.16787E-154 GAP: 179 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.507, -4.103700314387452, yes F: DNA binding protein SIF-BLAST: ,,[RNA polymerase [Arthrobacter phage DrManhattan] ],,YP_009815387,88.8476,2.16787E-154 SIF-HHPRED: RNA polymerase sigma factor RpoS; Transcription-activator, DNA/RNA, SigmaS, beta`, TRANSCRIPTION, Transferase-DNA complex; 3.26A {Escherichia coli},,,6OMF_F,93.0403,100.0 SIF-Syn: This gene shows good synteny with phages Asa16 and Eraser. Upstream is pham 16103 (NKF, 1/10). Downstream is pham 11290 (NKF, 1/10). This order does not appear to be especially well-conserved in many other phage in the cluster, which is not a problem because there is no indication that these genes form an operon. /note=Primary Annotator Name: Stephenson, Juliet /note=Auto-annotation: GeneMark and Glimmer call start site @ 33369, codon GTG. /note=Coding Potential: There is coding potential present in the third ORF (Host and Self-Trained GeneMark), and the selected start codon does include all the coding potential. /note=SD (Final) Score: This Start Site does have the best Final score (-4.104) as well as the best Z-score (2.507). /note=Gap/overlap: There is a sizable gap of 179 bp, but there is no coding potential present in the gap, so I do not think there is a gene missing. /note=Phamerator: Pham 55717 on 1/10. Common in cluster AZ, most common function is DNA binding protein. RNA polymerase sigma factor is a close second. /note=Starterator: Most annotated start for this pham (28) is not present in this phage. Start site 25 is called instead. 21/71 non-draft phages call site 25, which corresponds to 33369 in Tallboi. Comparatively, 28/71 non-draft phages called start site 28. /note=Location call: Based on the good coding potential and the number of hits in Phagesdb and HHPred, as well as synteny with other phages in Cluster AZ, this is a real gene with start site 33369. /note=Function call: NCBI Blast, PhagesDB Blast, and HHPred have the best matches for RNA polymerase sigma factor. These have good (90%) query coverage, high % identity (>90%), and low e-values. Most other phages in cluster AZ call this gene as a DNA binding protein, but it seems that this gene has a higher coverage % with RNA polymerase sigma factor. /note=Transmembrane domains: No transmembrane domains present according to TmHmm. /note=Secondary Annotator Name: Baughman, Lexie /note=Secondary Annotator QC: I have QC`ed the location and functional calls and agree with the first annotator. Although I do think there is stronger evidence for RNA polymerase sigma factor, you may still want to check the evidence for DNA binding protein in PhagesDB BLAST. Also, I would comment on the hits in CDD - this makes your call stronger. CDS 34244 - 34543 /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="Tallboi_45" /note=Original Glimmer call @bp 34244 has strength 11.92; Genemark calls start at 34244 /note=SSC: 34244-34543 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE13_gp43 [Arthrobacter phage Elezi] ],,NCBI, q1:s1 100.0% 3.54125E-55 GAP: 53 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.893, -2.7863799983944713, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE13_gp43 [Arthrobacter phage Elezi] ],,YP_010678021,93.9394,3.54125E-55 SIF-HHPRED: SIF-Syn: NKF, the gene is of pham 11290, upstream is a gene of pham 55717 and downstream is a gene of pham 19450, just like in phage Asa16 of the same cluster (AZ). /note=Primary Annotator Name: Thorp, Jocelyn /note=Auto-annotation: Glimmer and GeneMark agree on 34,244 as the start site. /note=Coding Potential: The gene has reasonable coding potential in the forward direction within the putative ORF, with the chosen start site encompassing all of the coding potential. This is true for both GeneMark Self and Host. /note=SD (Final) Score: -2.78. Final score and z-score (2.893) are the best of all start sites for this gene. /note=Gap/overlap: 53 bp, an acceptable gap. This start codon produces a transcript of 300 bp, which is a reasonable length for a gene. /note=Phamerator: As of 1/9/2022 this gene is in pham 11290. It is conserved in 12 of 27 members of the AZ cluster. There is no function currently called for genes of this pham. /note=Starterator: Start site 5 is conserved in 12 of 12 non-draft genes of this pham. This start site corresponds to 34,244 in Tallboi. /note=Location call: Based upon the evidence above, this is a real gene with the start site at 34,244. /note=Function call: No known function. NCBI BLAST featured 6 hits with low e-values (less than 10e-10). Percent identity ranged from 49.46% to 87.88%. Hits had no known function, but were hypothetical proteins for Arthrobacter phages. The best matches were from phages in the same cluster (AZ), specifically Elezi, London, Niobe, Asa16, and Eraser having an e-value of 3e-55, a query coverage of 100% and percent identity 87.88%. CDD and HHPRED did not return informative hits. /note=Transmembrane domains: No TMDs are predicted by TMHMM or TOPCONS, therefore it is not a membrane protein. /note=Secondary Annotator Name: Dooley, Naomi /note=Secondary Annotator QC: I have QC`d this gene and agree with the primary annotator. CDS 34673 - 35269 /gene="46" /product="gp46" /function="SprT-like protease" /locus tag="Tallboi_46" /note=Original Glimmer call @bp 34673 has strength 8.91; Genemark calls start at 34673 /note=SSC: 34673-35269 CP: yes SCS: both ST: SS BLAST-Start: [SprT-like domain-containing protein [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 1.05124E-142 GAP: 129 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.893, -2.707694805492614, yes F: SprT-like protease SIF-BLAST: ,,[SprT-like domain-containing protein [Arthrobacter phage Yang] ],,YP_009815666,99.4949,1.05124E-142 SIF-HHPRED: SprT-like domain-containing protein Spartan; DPC repair protease, DNA BINDING PROTEIN; HET: ADP, MLZ, FLC; 1.5A {Homo sapiens},,,6MDW_A,49.4949,99.7 SIF-Syn: SprT-like protease(from Pham 19450), upstream gene is from pham 11290, downstream is from pham 72297, just like in phage Adolin /note=Primary Annotator Name: Zhuang, Chuzhi /note=Auto-annotation: both Glimmer and Genemark agree on the start site #34673, start codon ATG /note=Coding Potential: coding potential is observed for forward and reverse frames on both Host and Self-trained Genemark, chosen start site cover all this coding potential /note=SD (Final) Score: -2.708, it is the best final score /note=Gap/overlap: 129, the gap is the smallest among all possible start sites and has longest open reading frame /note=Phamerator: pham number - 19450, date - 1/11/2022, the gene is conserved in other phages in AZ cluster, Adolin is used for comparison. For the corresponding gene in Adolin, the function is SprT-like protease. /note=Starterator: The conserved start site in the pham is 36, and it corresponds to 34673 in my phage. 28/56 of final genes called site 36. /note=Location call: real gene, start at #34673 /note=Function call: SprT-like protease. NCBI BLAST, Phagesdb BLAST, CDD and HHPRED all returned many hits of SprT-like protease with very low E value. /note=Transmembrane domains: No hit for transmembrane domains from either TMHMM or TOPCONS. /note=Secondary Annotator Name: Wu, Meigan /note=Secondary Annotator QC: Overall, looks good! I would mention that coding potential is observed for forward and reverse frames on both Host and Self-trained Genemark. I would also clarify if there is coding potential in the gap and whether this gap is conserved in other phages through synteny. For your function call evidence, specify the values associated with the hits found (e.g. probability, coverage, and e-value when applicable). Not sure if this matters, but the annotation lab manual only requires 2-3 hits to be recorded for NCBI BLASTp and 2 hits for PhagesDB BLASTp, so you can reduce the amount of evidence you selected for these applications. I would specify the programs used for the transmembrane domain evidence (TMHMM and TOPCONS). CDS 35393 - 35728 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="Tallboi_47" /note=Original Glimmer call @bp 35393 has strength 15.16; Genemark calls start at 35393 /note=SSC: 35393-35728 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp47 [Arthrobacter phage DrManhattan] ],,NCBI, q3:s2 98.1982% 1.2741E-57 GAP: 123 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.893, -2.707694805492614, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp47 [Arthrobacter phage DrManhattan] ],,YP_009815390,90.991,1.2741E-57 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Erfanian, Kiana /note=Auto-annotation: Glimmer and GeneMark. Both called the same start site of 35393. /note=Coding Potential: This gene has good coding potential within the putative ORF, and covers all this coding potential. /note=SD (Final) Score: The RBS Final Score of -2.708 for the original start at 35393 is the highest of the suggested starts, and therefore the best. The same is true of the Z-score for this start, at 2.893. /note=Gap/overlap: This gene has a gap of 123 bp with its upstream gene, given that the upstream gene stops at 35269. This gene also has a gap of 115 bp with its downstream gene, which starts at 35843. /note=Phamerator: This gene was found in pham 72297, which has 31 members, two of which are drafts. Additionally, the majority of phages in this pham belong to cluster ED. /note=Starterator: Using information from the Starterator analysis run most recently on 1/7/22, it was found that the most conserved start site number is 7. This was called in 11 of the 28 non-draft genes in the AZ pham. The auto-annotated start is called at start number 11 (25409), which does not match the most conserved start. Phage Tallboi’s track contains start site 11 by a yellow line, which denotes it as an auto-annotated start. Start site 11 in Tallboi’s track corresponds to that of other phages in the cluster, such as phages Reedo and DrManhattan; and has been determined to be the Final Human Annotated start as represented by a green line on the track representing these phages. While this is not the most annotated start, the analogous start site between Tallboi and other phages in this cluster is still promising, indicating that the auto-annotated start site 11 at 35393 is most likely correct. /note=Location call: The evidence gathered indicates that the suggested start site of 35393 as called by Glimmer and GeneMark appears to be the most probable site. /note=Function call: Unknown function. The top hits on the first page of the PhagesDB BLASTp list genes of unknown function. All of the hits on NCBI BLASTp are hypothetical proteins. The top hit for HHPRED alignments lists a gene with the function of NADH dehydrogenase, with a 70.8% probability. Despite this hit, the above data indicates that the function of this gene is indeed unknown. /note=Transmembrane domains: No TMDs called by TmHmm or TOPCONS. The protein is not a membrane protein. /note=Secondary Annotator Name: Lee, Adrienne /note=Secondary Annotator QC: I agree with the location call and functional call. In your Starterator section, I recommend specifying how many phages out of all the phages in the pham calls that start site also. This evidence would be helpful to know. For your synteny box, it looks like this gene has synteny with the downstream gene of Adolin and DrManhattan; both are Spr-T-like protease. It also had synteny in both the upstream and downstream gene with Reedo. You may want to update your synteny information. CDS 35843 - 36145 /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="Tallboi_48" /note=Original Glimmer call @bp 35843 has strength 15.06; Genemark calls start at 35843 /note=SSC: 35843-36145 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LONDON_46 [Arthrobacter phage London]],,NCBI, q1:s1 100.0% 7.36081E-59 GAP: 114 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.127, -2.156361006712914, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LONDON_46 [Arthrobacter phage London]],,QOP64349,94.0,7.36081E-59 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (pham 72297), downstream gene is serine integrase, just like phages Amyev, Adumb2043, Elezi and others from the same cluster AZ /note=Primary Annotator Name: Khaine, Aye Myat /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 35843. /note=Coding Potential: There is good coding potential in the ORF. The start includes all coding potential. /note=SD (Final) Score: The final SD score is -2.156 with a z-score of 3.127. This is the only gene candidate (an LORF) and has a good SD score. /note=Gap/overlap: There is a 114 bp gap upstream which is considerably large. However, no coding potential is detected within the gap. Similar upstream gap sizes are found in other phages of the same cluster AZ. /note=Phamerator: This gene belongs to pham 77436 as of 01/10/2022. This pham has 33 members of which about 20 phages belong to the same Cluster AZ. /note=Starterator: Only one start number 15 is called by Starterator. This number corresponds to site 35843. This start number is not the most annotated start (number 14) but has been manually annotated 8 times all in phages belonging to Cluster AZ. /note=Location call: Based on the evidence, this is a real gene with the start at 35843. /note=Function call: Unknown function (NKF). Top phagesdb BLAST hits (at least ten) call the function unknown. Top NCBI BLAST hits also have hypothetical protein as the function. No HHpred hits are reliable. There are no CDD hits. /note=Transmembrane domains: No TMDs predicted by TMHMM and TOPCONS. This is not a membrane protein. /note=Secondary Annotator Name: Magaling, Janelle /note=Secondary Annotator QC: dont forget to put the date for when you recorded the pham number! also add info about the most annotated start site eventhough tallboi doesnt call it. otherwise looks good! CDS 36309 - 37727 /gene="49" /product="gp49" /function="serine integrase" /locus tag="Tallboi_49" /note=Original Glimmer call @bp 36309 has strength 8.1; Genemark calls start at 36306 /note=SSC: 36309-37727 CP: yes SCS: both-gl ST: SS BLAST-Start: [integrase [Arthrobacter phage Yang] ],,NCBI, q1:s1 100.0% 0.0 GAP: 163 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.791, -2.9392830384527056, yes F: serine integrase SIF-BLAST: ,,[integrase [Arthrobacter phage Yang] ],,YP_009815667,94.7034,0.0 SIF-HHPRED: INTEGRASE; HYDROLASE, SERINE RECOMBINASE, UNIDIRECTIONAL, SITE-SPECIFIC RECOMBINATION; 2.15A {STREPTOMYCES PHAGE PHIC31},,,4BQQ_A,64.4068,100.0 SIF-Syn: Serine Integrase. In comparing Tallboi to LizaLica, gene 51 is conserved in Tallboi as gene 49 in LizaLica. They both call serine integrase as the function. The upstream and downstream genes have NKF for both. This is the same between Tallboi and Amyev except both genes are number 51. /note=Primary Annotator Name: McLinden, Katherine /note=Auto-annotation: Glimmer and GeneMark do not call the same start site. Glimmer calls 36309 and GeneMark calls 36306. 36309 has a start codon of GTG and 36306 has a start codon of ATG. The Glimmer score is 8.1. /note=Coding Potential: There is coding potential for both start sites and the stop site in Host trained and self trained GeneMark. /note=SD (Final) Score: Start site 36309 has the best Final score of -2.939. The other called site 36306 has the third best Final score of -3.909. /note=Gap/overlap: The gap of start site 36309 is 163 bp and the gap of start site 36306 is 160 bp, so they are very similar. The smallest gap is for start site 36291, but it has a final score of -6.952 and is not called by Glimmer or GeneMark. /note=Phamerator: The pham for this gene is 78437. The date is 1/11/22. There are 517 members from various clusters with a majority being from cluster A. /note=Starterator: Tallboi does not call the most annotated start. Tallboi calls start 96 and is found in 5% of the genes in the pham when present. This start site 96 is located at 36309, which is called by Glimmer and has the best Final score. /note=Location call: This is most likely a real gene. While Glimmer and Genemark called different start sites, 36309 has a better final score and is called through starterator. 36309 is therefore the most likely start site. /note=Function call: The function call for this gene is a serine integrase protein. The first 9/10 of the PhagesDB Frequency results show serine integrase. Additionally, PhagesDB BLASPp also calls at least 10 other serine integrase proteins with an E-value of 0. HHPRED calls two hits with e-values of 1.6 e-37 and 6.9 e-34 that call serine integrase. NCBI BLASTp also calls at least 10 serine integrase proteins with an E-value of 0. CDD does not have any useful evidence. /note=Transmembrane domains: There were no TMD`s called and no other evidence to suggest a TMD function within the other databases. /note=Secondary Annotator Name: Ostroske, Elyse /note=Secondary Annotator QC: I have QC`d this gene and agree with the primary annotator. CDS 37975 - 38235 /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="Tallboi_50" /note=Original Glimmer call @bp 37975 has strength 9.48; Genemark calls start at 37975 /note=SSC: 37975-38235 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein PQE13_gp49 [Arthrobacter phage Elezi] ],,NCBI, q1:s1 97.6744% 7.27779E-44 GAP: 247 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.089, -4.387988835334809, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE13_gp49 [Arthrobacter phage Elezi] ],,YP_010678027,90.6977,7.27779E-44 SIF-HHPRED: SIF-Syn: NKF (pham 95559), upstream gene is serine integrase (pham 78437), downstream is NKF (pham 55203), just like in phage Adumb2043 and others /note=Primary Annotator Name: Uvarov, Evgeniy /note=Auto-annotation start source: Glimmer and GeneMark both call the start at 37975 (site 3) with a common GTG start codon. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF by both Glimmer and GeneMark. The chosen start (site 3) covers all the coding potential. This ORF only has forward strand coding potential, thus this is a forward gene. /note=SD (Final) Score: Start site 3 has a final score of -4.388 and a good Z-score of 2.089. This start site has the best (highest) values on PECAAN for this gene. /note=Gap/overlap: Start (site 3) has a gap of 247 bp which is fairly large but contains no coding potential so it is acceptable. This start site does not create the LORF and has a gene length of 261 bp which is acceptable. /note=Phamerator: The pham number as of 1/11/2022 is 95559. The gene is conserved in phages Adumb2043 (AZ), Amyev (AZ) and Asa16 (AZ), as well as other non draft phages. Any of these genomes can be used for comparison since they are all non-draft, but those of the AZ cluster are better since Tallboi is from AZ. Based on PhagesDB the function call for the gene is unknown. /note=Starterator: Based on the 1/7/2022 run the most annotated start site 5 is a reasonable choice that is conserved among members of pham 95559. There are 24 members total with 22 being non-draft. 21/22 of non-draft members and 2/2 draft members call start site 5, which correlates to 37975 (site 3) for Tallboi. /note=Location call: Considering all of the evidence above, this gene is a real gene that is conserved in phamerator as well as starterator, has good coding potential and covers all of it with a start site at 37975 (site 3). Starterator agrees with Glimmer and Genemark. /note=Function call: Not enough data to form a function hypothesis, but this is likely a real protein. PhagesDB BLASTp top hits are all function unknown with small e-values with the top three being Asa16, Elezi, Eraser all of cluster AZ and an e-value of 1e-37. Phagesdb Function Frequency shows no results for pham 95559 or cluster AZ. NCBI BLASTp top three hits have low e-values (5.11605e-44, 2.14481e-39, 2.67953e-28) that are hypothetical proteins thus supporting the conclusion that this is a real gene. There are no CDD hits. There are no significant HHpred hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore this is not a membrane protein. /note=Secondary Annotator Name: Santos, Charysa /note=Secondary Annotator QC: I agree with the NKF function call based on the evidence provided. CDS 38238 - 38468 /gene="51" /product="gp51" /function="RNA binding protein" /locus tag="Tallboi_51" /note=Genemark calls start at 38238 /note=SSC: 38238-38468 CP: yes SCS: genemark ST: NI BLAST-Start: [RNA binding protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 3.94004E-35 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.224, -2.86135216970947, yes F: RNA binding protein SIF-BLAST: ,,[RNA binding protein [Arthrobacter phage London] ],,QOP64353,89.4737,3.94004E-35 SIF-HHPRED: Asl2047 protein; HFQ, SM, RNA-BINDING PROTEIN, SRNA, TRANSLATIONAL REGULATION, RNA BINDING PROTEIN; 2.31A {Nostoc sp.},,,3HFN_A,84.2105,96.6 SIF-Syn: gene has NKF, upstream gene has NKF, downstream gene has NKF like many phage within the AZ cluster including Amyev, London and Eraser. /note=Primary Annotator Name: Alvarez, Alondra /note=Auto-annotation: Only GeneMark calls a start site located at base pair coordinate 38238. /note=Coding Potential: In both Host-Trained and Self-Trained GeneMark, the coding potential present in the ORF is included in start site 38238 called by GeneMark. The coding potential is found on the forward strand, indicating that it is a forward gene. /note=SD (Final) Score: The chosen start site has a z-score of 3.224 and a final RBS score of -2.861. These are the best scores out of the potential start sites. This start site also has the longest open reading frame and a common ATG start codon. /note=Gap/overlap: There is a gap of 2 bp between the gene and its upstream gene. This gap is not large enough to consider the addition of another gene, nor is there another potential start site to include the base pairs. /note=Phamerator: As of 1/12/2022 the gene belongs to pham 55203. Of the 25 members, 23 are non-draft genes. All belong to the AZ cluster. Phamerator does not call a function. /note=Starterator: The most conserved start site is number 8. It is annotated in 9 of the 23 non-draft genes in the pham. However, Tallboi did not have this annotated start position. Starterator was not informative. /note=Location call: Based on the coding potential, SD scores, significant e-values in phagesDB BLAST, and conserved gene architecture within the AZ cluster, this gene is considered “real”. The chosen start site is 38238. /note=Function call: NKF. PhagesDB BLASTp returned statistically significant hits with e-values <10^-23 and identity percentages >72%, however all without a known function. The phagesDB function frequency box in PECAAN did not have any data. The top 5 hits returned from NCBI BLASTp with e-values <10^-26 (query coverage >96% and identity percentage >69%) are hypothetical proteins. While there were many hits returned by HHpred with high probability percentages, none had an e-value lower than the 10^-7 threshold for statistical significance (lowest was 0.0012). Hits with the lowest e-values were also hypothetical proteins. There were some hits for RNA-binding protein function but the e-values were insignificant. No output was returned from CDD. /note=Transmembrane domains: TmHmm nor Topcons predict transmembrane domains. The gene is not a membrane protein. /note=Secondary Annotator Name: Sheppy, Tyler /note=Secondary Annotator QC: I agree with this annotation. The location call with the start site 38238 is supported by the evidence available. The evidence also supports the function call of NKF. Make sure to finish filling out the synteny box and consider checking the evidence boxes for the top hits in BLAST since they indicate that this is a real gene. CDS 38465 - 38656 /gene="52" /product="gp52" /function="RNA binding protein" /locus tag="Tallboi_52" /note=Original Glimmer call @bp 38465 has strength 5.81; Genemark calls start at 38465 /note=SSC: 38465-38656 CP: yes SCS: both ST: SS BLAST-Start: [RNA binding protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 6.97114E-29 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.402, -3.8116689268127657, yes F: RNA binding protein SIF-BLAST: ,,[RNA binding protein [Arthrobacter phage London] ],,QOP64354,90.4762,6.97114E-29 SIF-HHPRED: Hfq; bacterial Hfq-like. Hfq, an abundant, ubiquitous RNA-binding protein, functions as a pleiotropic regulator of RNA metabolism in prokaryotes, required for transcription of some transcripts and degradation of others.,,,cd01716,87.3016,96.8 SIF-Syn: NKF (pham 95536), upstream gene is pham 55203, just like in phage Reedo. Downstream gene is pham 30015 but does not show synteny. /note=Primary Annotator Name: Baughman, Lexie /note=Auto-Annotation: Glimmer and Genemark. Both agree on the same start site of 38465, with a start codon of GTG. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF. Chosen start site of 38465 covers all of the coding potential. /note=SD (Final) Score: The SD (Final) Score is -3.812, and the Z-score is 2.402. These are the best out of all the listed possible start sites. /note=Gap/Overlap: There is a 4 base pair overlap with the upstream gene, indicating that this gene is likely part of an operon. This start site creates the longest ORF and the length of the gene is acceptable. /note=Phamerator: As of 01/11/2022, the gene is found in Pham 95536. The pham is conserved in other members of the cluster - comparison was done between Tallboi and a few other non-draft genomes, including KeAlii and Reedo. The function is not called by either Phamerator or PhagesDB. /note=Starterator: The “Most Annotated” start site (1) is present in 4 of 4 non-draft genes in this pham, and it is present in Tallboi. This start site corresponds to base pair position 38465, which is the auto-annotated start site. /note=Location Call: The gathered evidence suggests that this is a real gene and that its start site is likely at position 38465. /note=Function Call: The top 2 NCBI BLASTp hits suggested function is hypothetical protein with high query coverage (100%), high % identity (85.71%), and low e-values (<5e-29). The top PhagesDB BLASTp hits’ suggested function is unknown, with high % identity (85%) and a low e-value (<6e-24). While there were no hits in CDD, one of the top two hits in HHpred was informative - with high probability (96.87%), high coverage (90.4762%), and a low e-value (0.0095) - that listed a similar function. As such, there does not seem to be enough evidence to call the function of this gene. /note=Transmembrane domains: No predicted transmembrane domains. /note=Secondary Annotator Name: Stephenson, Juliet /note=Secondary Annotator QC: I agree with this annotation and find that all evidence has been considered. CDS 38653 - 39066 /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="Tallboi_53" /note=Original Glimmer call @bp 38653 has strength 11.46; Genemark calls start at 38653 /note=SSC: 38653-39066 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_POWERPUFF_55 [Arthrobacter phage Powerpuff] ],,NCBI, q3:s2 98.5401% 4.09387E-50 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.402, -3.8116689268127657, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_POWERPUFF_55 [Arthrobacter phage Powerpuff] ],,QGZ17353,80.8824,4.09387E-50 SIF-HHPRED: SIF-Syn: Upstream gene is serine integrase, just like in Asa16, Adumb2043, Amyev and other phages in Cluster AZ /note=Primary Annotator Name: Dooley, Naomi /note=Auto-annotation: The gene is called by Glimmer and Gene mark at the same start site 38653 along with the same start codon at ATG. This is a very common start codon so there is reason to believe that this may be the correct start site. /note=Coding Potential: The gene has reasonable coding potential that is covered by the start site. /note=SD (Final) Score: The SD score was reasonable at -3.821 which is the best score listed. The z-score of 2.402 (>2) suggests that this may be the correct start site. This Z-score was the best score listed. /note=Gap/overlap: There is a reasonable gap of -4bp which is the smallest gap listed. This is a reasonable gap and it is unlikely for a new gene to be found in this gap. This could indicate an operon /note=Phamerator: The pham number as of 1/12/2022 30015, also found in Adumb2043 /note=Starterator: The start number is 3 corresponding to a 38653 start site. /note=Location call: Based on the data it appears that this is a real gene with a 39066 stop site and 38653 start site. /note=Function call: Based on the phages Powerpuff and YesChef, it is safe to conclude that the function of this gene is unknown. The top two BLAST hits on PhagesDB both have E-values of 1e-47; additionally, they both have high scores of 186, both of which have a unknown function listed (both have high coverage, 98% for both, 65%+ identity for both and e-value of 2.872-50 for both). CDD contains no hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs. Thus we can conclude that this is not a membrane protein. /note=Secondary Annotator Name: Thorp, Jocelyn /note=Secondary Annotator QC: I agree with the annotation above, and all categories have been considered. CDS 39063 - 39254 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="Tallboi_54" /note=Original Glimmer call @bp 39063 has strength 8.23 /note=SSC: 39063-39254 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_LONDON_53 [Arthrobacter phage London] ],,NCBI, q6:s3 88.8889% 3.29876E-6 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.774, -7.152009625892923, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LONDON_53 [Arthrobacter phage London] ],,QOP64356,55.9322,3.29876E-6 SIF-HHPRED: SIF-Syn: Synteny: NKF. Tallboi genes 55 (upstream), 56, and 58 (downstream) show synteny with phage Crewmate (cluster AZ) genes 58, 59, and 60, respectively. Tallboi gene 55 and Crewmate gene 58 are both members of pham 30015. Tallboi gene 56 and Crewmate gene 59 are both a part of pham 82473. Tallboi gene 58 and Crewmate gene 60 are both members of pham 95108. /note=PECAAN Notes /note=Primary Annotator Name: Wu, Meigan /note=Auto-annotation: Glimmer called the start site to be 39063. GeneMark did not call a start site. /note=Coding Potential: Coding potential in a forward reading frame and reverse reading frame in both GeneMark Self and GeneMark Host. However, this gene is likely a forward gene, since only the ORF for the coding potential in the forward reading frame matches with the stop site for this gene. This gene shows synteny with phage London and phage Crewmate, which are both members of cluster AZ. /note=SD (Final) Score: Final score is -7.152, and the corresponding z-score is 0.774. Start site 39063 provides the least favorable final score and z-score available, but this start site corresponds to the smallest gap/overlap size and is also supported by the manual annotations observed from Starterator. /note=Gap/overlap: The overlap is 4 bp. This overlap is suggestive of the gene being a part of an operon. /note=Phamerator: As of January 12, 2022, the pham number is 82473. This pham is conserved in phage Crewmate, Eraser, and London, which are all members of cluster AZ. No function is noted for this pham. /note=Starterator: Pham 82473 has 9 non-draft members. The most called start number was called for 6/9 non-draft phages (start number 9). Start number 9 is start site 39063 in Tallboi, which matches with the start site called by Glimmer. /note=Location call: This gene is likely a real gene with a start site at 39063. /note=Function call: NKF. PhagesDB BLASTp yielded 5 hits with e-values equal to or less than 5e-08. NCBI BLASTp yielded two hypothetical protein hits with e-values 5e-27 and 7e-06. No hits were detected by CDD. No reliable hits provided by HHpred. /note=Transmembrane domains: No TMDs recognized by TMHMM and TOPCONS. This gene is not a TMD gene. /note=Secondary Annotator Name: Zhuang, Chuzhi /note=Secondary Annotator QC: I think the starterator is updated, now the start of this gene is suggested at number 9, corresponding to start 39063. I think starterator is informative. This start also has a more reasonable gap, as a -4 gap may indicate an operon and is better than the -43 gap. I remember that the professors said starterator data should be weighted more than Z score and final score, so you may want to check with them. Overall, I think the start 39063 has more evidence to support it, and its final score is not too bad either. I agree with your function call. CDS 39247 - 39675 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="Tallboi_55" /note=Original Glimmer call @bp 39247 has strength 11.18; Genemark calls start at 39247 /note=SSC: 39247-39675 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_POWERPUFF_56 [Arthrobacter phage Powerpuff] ],,NCBI, q3:s4 98.5916% 1.70743E-57 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.893, -2.7863799983944713, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_POWERPUFF_56 [Arthrobacter phage Powerpuff] ],,QGZ17354,78.169,1.70743E-57 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF, downstream is NKF, just like in phage Lego /note=Primary Annotator Name: Magaling, Janelle /note=Auto-annotation: Glimmer and GeneMark agree on start at 39247 with codon ATG. /note=Coding Potential: self trained and host trained genemark show good coding potential in one forward ORF and start site covers all coding potential /note=SD (Final) Score: Final score is -2.786 which is okay, but the best option. Z score is 2.893 which is good. /note=Gap/overlap: 19bp gap is reasonable. This is LORF and is the best start because it has the highest Z score and least negative final score. The length is 429bp which is reasonable. There is also synteny with Lego. /note=Phamerator: 1/11/22 pham 95108. 20/20 members of pham are from AZ such as Eraser and Lego. Phameratro shows no function for any members of the pham. /note=Starterator: the most annotated start site is 5 which is called in 16/18 non-draft genes in the pham. Tallboi calls this start site at 39247. /note=Location call: gene is conserved in phamerator and starterator, start 5 at 39247 covers good coding potential. this is a real gene. /note=Function call: NKF. NCBI blast (coverage 100%, identity 71.9%, e value 2.4e-61) and PhagesDB blast (e value 7e-52) shows good hits for hypothetical proteins. There were no good hhpred or cdd hits. /note=Transmembrane domains: no TMHs so cant look at topcons and it is not a membrane protein /note=Secondary Annotator Name: Khaine, Aye Myat /note=Secondary Annotator QC: I agree with the primary annotation. I would include the start site called in the location call section and indicate that this is not a membrane protein. CDS 39686 - 39874 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="Tallboi_56" /note=Original Glimmer call @bp 39686 has strength 11.53; Genemark calls start at 39686 /note=SSC: 39686-39874 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_LONDON_56 [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 9.36041E-25 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.235, -1.9310779259753799, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LONDON_56 [Arthrobacter phage London] ],,QOP64359,90.1639,9.36041E-25 SIF-HHPRED: SIF-Syn: Tallboi seems to display synteny with Eraser. Although most of the surrounding genes have NKF or do not list a function at all, the downstream genes are in the same phams and so are the upstream genes. The gene directly downstream in both phages is in pham 30015 and the gene directly upstream in both phages in pham 96259. /note=PECAAN Notes /note=Primary Annotator Name: Ostroske, Elyse /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 39686. /note=Coding Potential: Coding Potential is mainly on the forward strand, indicating that this is a forward gene. Coding Potential is found in both Host-trained GeneMark and GeneMark S. /note=SD (Final) Score: -1.931. This is the best final score in PECAAN, as only this start site was given as an option. /note=Gap/overlap: 10 bp gap. /note=Phamerator: As of 1/23/22, this gene is in pham 97798, which has 34 members, of which 33 are phages in cluster AZ. /note=Starterator: This gene does not have the most annotated start site. Start Site 9 was automatically called, and it corresponds to 39686, the start site called by both Glimmer and GeneMark. Start Site 9 was manually annotated 4/25 non-draft genomes and is called 100% of the time when it is present. /note=Location call: Based on the above evidence, this is a real gene. The most likely start site is 39686, because it is the only start site listed in PECAAN. /note=Function call: NKF. Many phages in cluster AZ were checked as evidence under Phage DB BLAST due to their low e-values and high query cover but all had NKF. Under NCBI, Adumb, Elezi, Phives, London, and Yang were checked due to their low e-values and high query cover, but all were hypothetical proteins. No hits were found in CDD, no good hits were found in HHpred. /note=Transmembrane domains: TMDs were not predicted by TMHMM or TOPCONS. /note=Secondary Annotator Name: McLinden, Katherine /note=Secondary Annotator QC: Looks good! Just make sure to fill out the drop down menus, phamerator, and the synteny box. CDS 39874 - 40062 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="Tallboi_57" /note=Original Glimmer call @bp 39874 has strength 11.56; Genemark calls start at 39874 /note=SSC: 39874-40062 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_LEGO_56 [Arthrobacter phage Lego]],,NCBI, q2:s3 98.3871% 3.76786E-13 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.976, -4.702334767722353, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LEGO_56 [Arthrobacter phage Lego]],,QIN94456,68.1159,3.76786E-13 SIF-HHPRED: SIF-Syn: The gene is NKF and is from Pham 12611. Its upstream gene is from pham 2649, and its downstream gene is from pham 94661, which is also seen in other similar phages, such as Adumb2043. /note=Primary Annotator Name: Santos, Charysa /note=Auto-annotation: Glimmer and GeneMark. Both start at 39874. It has a Glimmer score of 11.56. /note=Coding Potential: Coding potential in this ORF is found in the direct/forward sequence, thus this is a forward gene. Coding potential for this region is only found in GeneMark Self. /note=SD (Final) Score: -4.702. It is the best/least negative final score on PECAAN for this gene. This start site (@39874) minimizes the gap size (-1), unlike the other start sites with larger gaps. /note=Gap/overlap: -1. Very small overlap compared to the other start sites, meaning there is less space for another gene in between the previous gene and this one. /note=Phamerator: As of 01/07/2022, This gene is found in Pham 12611 and is conserved in other members of cluster AZ, including Adumb2043 and Amyev. /note=Starterator: The “Most Annotated” start site (4) is present in 10 of 14 non-draft genes in this pham, and is present in Tallboi. This start site corresponds to base pair position 39874, the auto-annotated start site. /note=Location call: Using the evidence I found, this is a real gene and the most probable start site is 39874, which is associated with a final score that is closest to 0 (-4.702), and a Z-score of 1.976, which is the most optimal of the starting sites listed. /note=Function call: The top 2 NCBI BLASTp hits showed that this gene is likely a hypothetical protein with high query coverage (98.3871), % identity (57.971), and high e-values (2.65e-13). PhagesDB BLASTp hits suggested that function is unknown, with e-values of >1e-19. CDD and HHpred hits were not informative (very high e-values and low probabilities/coverages). Thus, there is not enough evidence to call the function of this gene. /note=Transmembrane domains: There were no TMD predictions on either TMHMM or TOPCONS, so we can conclude that this gene is not a membrane protein. /note=Secondary Annotator Name: Uvarov, Evgeniy /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. Note: Don`t forget the synteny box, mention that a -1 overlap is indicative of a likely operon, mention the z-score, can check one or two more supporting evidence boxes for Phagesdb BLAST and NCBI Blast CDS 40066 - 40290 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="Tallboi_58" /note=Original Glimmer call @bp 40066 has strength 12.22; Genemark calls start at 40066 /note=SSC: 40066-40290 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE12_gp55 [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 97.2973% 1.32978E-39 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.836, -2.7653702713535186, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE12_gp55 [Arthrobacter phage Adumb2043] ],,YP_010677965,88.6076,1.32978E-39 SIF-HHPRED: SIF-Syn: No known function, upstream gene does not have a known function, downstream gene does not have a known function, which is similar to phage Eraser. Phage Eraser has a gene downstream between the corresponding genes for phage Tallboi. /note=Primary Annotator Name: Sheppy, Tyler /note=Auto-annotation: Both Glimmer and GeneMark call the start at 40066. The codon for the start site is ATG. /note=Coding Potential: There is reasonable coding potential predicted in the open reading frame according to the Self-Trained GeneMark and the start site covers all of this coding potential. There is also coding potential predicted by the Host-Trained GeneMark, but it somewhat overlaps with the coding potential of the upstream gene. /note=SD (Final) Score: The Final Score is -2.765 and the Z-score is 2.836. This was the only start site candidate. /note=Gap/overlap: There is a reasonable gap of 3bp between this gene and the upstream gene. The length of this gene is 225bp. There are no other start site candidates. /note=Phamerator: As of January 12, 2022, this gene is found in pham 2649. This pham is conserved in other phages in the AZ cluster, such as Adolin and Amyev. There was no function called for this gene in Phamerator. /note=Starterator: There is a reasonable start site that is conserved among members of the pham. The most conserved start is start site 2, which corresponds to 40066 in the phage. 8 of 10 genes call the most conserved start site. /note=Location call: Since there is coding potential predicted by the Self-Trained GeneMark and that this gene is conserved in the Phamerator, this is a real gene. The start site of the gene is 40066, which is called by both Glimmer and GeneMark and conserved in the Starterator. /note=Function call: This gene does not have a known function. The top hits in both PhagesDB BLAST and NCBI BLAST do not have known functions. According to Phamerator, other genes in the same pham did not have a function called. HHpred and CDD were not informative. /note=Transmembrane domains: There are no transmembrane domains predicted by TmHmm or TOPCONS. This is not a transmembrane protein. /note=Secondary Annotator Name: Alvarez, Alondra /note=Secondary Annotator QC: I have QC`ed this location call and agree with the first annotator. CDS 40401 - 40598 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="Tallboi_59" /note=Original Glimmer call @bp 40401 has strength 13.76; Genemark calls start at 40401 /note=SSC: 40401-40598 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CASSIA_60 [Arthrobacter phage Cassia]],,NCBI, q1:s1 100.0% 1.18311E-37 GAP: 110 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.127, -2.156361006712914, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_60 [Arthrobacter phage Cassia]],,WGH21133,96.9231,1.18311E-37 SIF-HHPRED: SIF-Syn: This gene is not part of a well-conserved sequence of genes. It is always found towards the end of the genome. I was able to find two AZ phages, London and Niobe, where genes of pham 12611 and 2649 (both NKF) were nearby upstream, and genes of pham 18955 and 80686 (NKF) were nearby downstream. /note=Primary Annotator Name: Stephenson, Juliet /note=Auto-annotation: GeneMark and Glimmer call start site @ 40401, codon ATG. /note=Coding Potential: There is coding potential present in the third ORF (Host and Self-Trained GeneMark), and the selected start codon does include all the coding potential. There is also some coding potential in the fifth ORF, but it is too short to be a gene, and thus should be ignored. /note=SD (Final) Score: This Start Site does have the best Final score (-2.156) as well as the best Z-score (3.127). /note=Gap/overlap: There is a gap of 110 bp, but there is no coding potential present in the gap, so I do not think there is a gene missing. The length of the gene is acceptable at 198 bp. /note=Phamerator: Pham 50943 on 1/12. Unique to cluster AZ, no function identified. /note=Starterator: Most annotated start for this pham (5) is called in this phage. 10/11 non-draft phages call site 5, which corresponds to 40401 in Tallboi. /note=Location call: Based on the good coding potential and the number of hits in Phagesdb Blast, this is a real gene with start site 40401. /note=Function call: There are no good hits with genes of known function, but there are good hits for other phages in this cluster with genes of the same pham. NKF /note=Transmembrane domains: No transmembrane domains present according to TmHmm. /note=Secondary Annotator Name: Baughman, Lexie /note=Secondary Annotator QC: I have QC`ed the location and functional calls and agree with the first annotator. Perhaps add some detail to the functional call section - it might be helpful to write out the actual hits. The PECAAN notes look great otherwise! CDS 40687 - 40878 /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="Tallboi_60" /note=Original Glimmer call @bp 40687 has strength 14.41; Genemark calls start at 40687 /note=SSC: 40687-40878 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE12_gp59 [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 100.0% 9.96293E-33 GAP: 88 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.99, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE12_gp59 [Arthrobacter phage Adumb2043] ],,YP_010677969,98.4127,9.96293E-33 SIF-HHPRED: SIF-Syn: NKF, the gene is of pham 11795, upstream is a gene of pham 50943 and downstream is a gene of pham 18955, just like in phage Adumb2043. Some of the other phages in the cluster that share genes of this pham have a different pham upstream (56784) but the downstream pham is conserved in all phages with genes of a pham 11795. /note=Primary Annotator Name: Thorp, Jocelyn /note=Auto-annotation: Both Glimmer and GeneMark have 40,687 as the start site. /note=Coding Potential: The gene has reasonable coding potential in the forward direction within the putative ORF, with the chosen start site encompassing all of the coding potential. This is true for both GeneMark Self and Host. /note=SD (Final) Score:-2.505, the best score among the possible start sites. The z-score (2.99) for this start site was also the best. /note=Gap/overlap: 88, an acceptable gap. The selected start codon produces a transcript of 192 bp. /note=Phamerator: As of 1/9/2022, the gene belonged to pham 17795. This pham is conserved within 10 of the 27 current members of the AZ cluster. There is no function currently called for genes of this pham. /note=Starterator: Start site 6 is conserved in 9 of 9 non-draft genes of this pham. This start site corresponds to 40,687 in Tallboi. /note=Location call: Based upon the evidence above, this is a real gene with the start site at 40,687. /note=Function call: No known function. NCBI BLAST returned 4 hits with low e-values, ranging from 4e-21 to 7e-33. The top hit was to Arthrobacter phage Adumb2043, witha query coverage of 100% and 92.06% percent identity. HHPRED and CDD did not return any informative hits. /note=Transmembrane domains: There are no predicted TMDs in TOPCONS or TMHMM, therefore it is not a membrane protein. /note=Secondary Annotator Name: Dooley, Naomi /note=Secondary Annotator QC: I have QC`d this gene and agree with the primary annotator. CDS 41005 - 41346 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="Tallboi_61" /note=Original Glimmer call @bp 41005 has strength 12.71; Genemark calls start at 41005 /note=SSC: 41005-41346 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD80_gp60 [Arthrobacter phage Lizalica] ],,NCBI, q3:s2 94.6903% 8.10565E-56 GAP: 126 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.893, -2.7863799983944713, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD80_gp60 [Arthrobacter phage Lizalica] ],,YP_010677625,91.8182,8.10565E-56 SIF-HHPRED: SIF-Syn: NKF(from Pham 18955), upstream gene is from pham 17795, downstream is from pham 80686, just like in phage Adumb2043 /note=Primary Annotator Name: Zhuang, Chuzhi /note=Auto-annotation: both Glimmer and Genemark agree on the start site #41005, start codon ATG /note=Coding Potential: have reasonable coding potential, chosen start site cover all this coding potential /note=SD (Final) Score: -2.786, it is the best final score /note=Gap/overlap: 126, there is another start with less gaps, but this start has better Z score and final score /note=Phamerator: pham number - 18955, date - 1/12/2022, the gene is conserved in other phages in AZ cluster, Adumb2043 is used for comparison. No function specified. /note=Starterator: The conserved start site in the pham is 14, and it corresponds to 41005 in my phage. 19/23 of final genes called site 14. /note=Location call: real gene, start at #41005 /note=Function call: NKF, no blast returned hits with known function, but it matched with some proteins in the phages of AZ cluster with no known function. /note=Transmembrane domains: No hit for transmembrane domains. /note=Secondary Annotator Name: Wu, Meigan /note=Secondary Annotator QC: Overall, great job! I would clarify if the coding potential was observed on a forward and/or reverse frame and whether it was found on Host and/or Self-trained Genemark. I would also explain whether the gap size is reasonable based on presence/absence of coding potential and synteny with other phages. Note whether hits were found using HHpred and CDD for function call evidence. For the transmembrane domain sections, I would specify the programs used (TMHMM and TOPCONS). Also, don`t forget to select the top 2 hits for NCBI BLASTp, even if they aren`t associated with a function. CDS 41422 - 41733 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="Tallboi_62" /note=Genemark calls start at 41422 /note=SSC: 41422-41733 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein PQE12_gp61 [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 100.0% 2.02732E-68 GAP: 75 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.99, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE12_gp61 [Arthrobacter phage Adumb2043] ],,YP_010677971,100.0,2.02732E-68 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Erfanian, Kiana /note=Auto-annotation: No Glimmer report. Only one gene candidate, which has been called by GeneMark at 41422. /note=Coding Potential: This gene has good coding potential within the putative ORF, and covers all this coding potential. /note=SD (Final) Score: RBS Final Score of -2.443 and Z-score of 2.99. /note=Gap/overlap: This gene has a gap of 75 bp with its upstream gene, given that the upstream gene stops at 41346. This gene also has an overlap of 7 bp with its downstream gene, which starts at 42082. /note=Phamerator: This gene was found in pham 80686, which has 352 members, 23 of which are drafts. Additionally, the majority of phages in this pham belong to cluster K. /note=Starterator: There is only one suggested start. /note=Location call: Given that there is only one suggested start, the start location is determinedly 41422. /note=Function call: Unknown function. Every single hit on PhagesDB BLASTp lists genes of unknown function. All of the hits on NCBI BLASTp are hypothetical proteins. /note=Transmembrane domains: No TMDs called by TmHmm or TOPCONS. The protein is not a membrane protein. /note=Secondary Annotator Name: Lee, Adrienne /note=Secondary Annotator QC: I agree with the location call and function call. For starterator, could you give a bit more info on what the suggested start site is and whether it is conserved in other phages? Don`t forget to fill out the synteny box. CDS 41726 - 42082 /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="Tallboi_63" /note=Genemark calls start at 41726 /note=SSC: 41726-42082 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein PQE11_gp64 [Arthrobacter phage Warda] ],,NCBI, q1:s1 100.0% 5.2476E-47 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.99, -3.095100142625534, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE11_gp64 [Arthrobacter phage Warda] ],,YP_010677904,77.1186,5.2476E-47 SIF-HHPRED: SIF-Syn: NKF, upstream gene is NKF (pham 80686), downstream gene is a membrane protein, just like phages Niobe, Elezi and a number of other phages from the same cluster AZ /note=Primary Annotator Name: Khaine, Aye Myat /note=Auto-annotation: Only GeneMark calls the start site at 41726. /note=Coding Potential: The ORF has good coding potential. The start site 41726 includes all coding potential. /note=SD (Final) Score: The final SD score is -3.095 with z-score of 2.99. This is the best score and corresponds to the LORF. /note=Gap/overlap: There is an 8bp overlap upstream of the gene. This is the best option. Otherwise the gaps of other options are too large. /note=Phamerator: This gene belongs to pham 55605 as of 01/10/2022 which has 138 members. More than 10 members of this pham are from the same Cluster AZ. /note=Starterator: Starterator calls start number 17 at 41726. This start is not the most annotated start but is the first start possible with 19 manual annotations, most of which are of phages from Cluster AZ. /note=Location call: Based on the evidence provided, this is a real gene with start site at 41726. /note=Function call: Unknown function. Both phagesdb and NCBI BLAST hits indicate unknown function (phagesdb: e-values less than 1e-37 and NCBI: e-values of less than 1e-43, 100% coverage and about 64% identity). HHpred hits do not give reliable evidence and there are no CDD hits. /note=Transmembrane domains: This is not a membrane protein. TMHMM and TOPCONS predicted no TMDs. /note=Secondary Annotator Name: Magaling, Janelle /note=Secondary Annotator QC: i think it would be an operon if the overlap was 4bp and not 8bp. otherwise looks good! CDS 42082 - 42258 /gene="64" /product="gp64" /function="membrane protein" /locus tag="Tallboi_64" /note=Original Glimmer call @bp 42082 has strength 10.31; Genemark calls start at 42082 /note=SSC: 42082-42258 CP: no SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Powerpuff] ],,NCBI, q3:s2 96.5517% 8.90162E-17 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.557, -4.255672789525901, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Powerpuff] ],,QGZ17364,77.5862,8.90162E-17 SIF-HHPRED: SIF-Syn: NKF (pham 49938), upstream gene is NKF (pham 55605), downstream is NKF (pham 17297), just like in phage Asa16 and others. /note=Primary Annotator Name: Uvarov, Evgeniy /note=Auto-annotation start source: Glimmer and GeneMark both call the start at 42082 (site 1) with a common ATG start codon. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF by both Glimmer and GeneMark. The chosen start (site 1) covers all the coding potential. This ORF only has forward strand coding potential, thus this is a forward gene. /note=SD (Final) Score: Start site 1 has a final score of -4.256 and a good Z-score of 2.557. This start site has the best (highest) values on PECAAN for this gene. /note=Gap/overlap: Start (site 1) has a gap of 2 bp which is acceptable. This start site creates the LORF and has a gene length of 177 bp which is good. /note=Phamerator: The pham number as of 1/7/2022 is 49938. The gene is conserved in phages Adolin (AZ), Adumb2043 (AZ), Amyev (AZ), as well as other non draft phages. Any of these genomes can be used for comparison since they are all non-draft, but those of the AZ cluster are better since Tallboi is from AZ. Based on PhagesDB the function call for the gene is unknown. /note=Starterator: Based on the 1/7/2022 run the most annotated start site 3 is a reasonable choice that is conserved among members of pham 49938. There are 22 members total with 20 being non-draft. 16/20 of non-draft members and 2/2 draft members call start site 3, which correlates to 42082 (site 1) for Tallboi. /note=Location call: Considering all of the evidence above, this gene is a real gene that is conserved in phamerator as well as starterator, has good coding potential and covers all of it with a start site at 42082 (site 1). Starterator agrees with Glimmer and Genemark. /note=Function call: Not enough data to form a specific function hypothesis, but this has been determined to be a “membrane protein”. PhagesDB BLASTp all hits are function unknown with small e-values with the top three being Crewmate (e: 3e-19), Yang (e: 2e-18), DrSierra (e: 9e-18) all of cluster AZ. Phagesdb Function Frequency has no hits. NCBI BLASTp top hit is a membrane protein (e: 5.69848e-21) and the next two have low e-values (6.25755e-17, 3.20564e-13) that are hypothetical proteins thus supporting the conclusion that this is a real protein and has something to do with the cell membrane. There are no CDD hits. The top HHpred hit is Heme exporter protein D (CcmD) with an e-value of 4.9e-7. Since exporter proteins are associated with the cell membrane this supports this protein being a “membrane protein”. /note=Transmembrane domains: TMHMM predicts one TMD, and so does TOPCONS. Based on this evidence this gene can be assumed to have a real TMD and is therefore a “membrane protein”. Since no other function can be interpreted this is what will be called. /note=Secondary Annotator Name: Santos, Charysa /note=Secondary Annotator QC: Evidence heavily suggests that the gene has NKF. CDS 42251 - 42415 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="Tallboi_65" /note=Original Glimmer call @bp 42251 has strength 8.79; Genemark calls start at 42251 /note=SSC: 42251-42415 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD88_gp64 [Arthrobacter phage Amyev] ],,NCBI, q1:s1 100.0% 4.89599E-25 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.99, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD88_gp64 [Arthrobacter phage Amyev] ],,YP_010677767,92.4528,4.89599E-25 SIF-HHPRED: SIF-Syn: The gene has NKF, upstream gene is a membrane protein, and downstream gene is a membrane protein. While these functions are not called in many other phages within the AZ cluster, the upstream and downstream gene sequences are conserved within the cluster. /note=Primary Annotator Name: Alvarez, Alondra /note=Auto-annotation: Glimmer and GeneMark agree on the start site 42251. /note=Coding Potential: There is good Host-Trained GeneMark and Self-Trained GeneMark coding potential, both included in the start site at 42251. No other potential start site includes the coding potential from both programs. Coding potential is found on forward strand, indicating that it is a forward gene. /note=SD (Final) Score: The chosen start site has a z-score of 2.99 and final RBS score of -2.505. These are the best scores amongst the other possible start sites. The ORF is also the longest and has a common ATG start codon. /note=Gap/overlap: There is an overlap of 8 bp between this gene and the gene upstream. Although genes do not often overlap by more than 7 bp, this start site provides an acceptable ORF length. The other two potential start sites also have gaps that are included by the putative start site. /note=Phamerator: As of 1/12/2022 the gene belongs to pham 17297. Of the 25 members, 23 are non-draft genes. All belong to the AZ cluster. Phamerator does not call a function. /note=Starterator: The most conserved start site number is 9. It is annotated in 22 of the 23 non-draft genes in the pham. Tallboi also calls this annotated start site at position 42251. /note=Location call: Based on the coding potential, the start site SD scores, conservation of genome architecture with other non-draft genes within cluster AZ, and many statistically significant phagesDB BLAST hits, we can determine that this gene is “real.” The chosen start site is 42251. /note=Function call: NKF. Top five significant hits returned by phagesDB with e-values <10^-20 and identity percentages >70% had no known function. The PhagesDB function frequency box in PECAAN did not have any data. NCBI BLASTp returned eight statistically significant hits for hypothetical proteins with e-values <10^-8–only the top four having identity percentages above 70%. HHpred did not return any statistically significant hits (lowest e-value was 22). No output was returned by the CDD program. /note=Transmembrane domains: No transmembrane domains are called by TmHmm nor Topcons. The gene is not a membrane protein. /note=Secondary Annotator Name: Sheppy, Tyler /note=Secondary Annotator QC: I agree with this annotation. The location call (start: 42251) is well supported and there is not enough evidence for a known function, supporting the function call of NKF. Make sure to fill out the synteny box and consider checking the evidence boxes for the the top hits in BLAST since they provide evidence of this being a real gene. CDS 42412 - 42525 /gene="66" /product="gp66" /function="membrane protein" /locus tag="Tallboi_66" /note=Original Glimmer call @bp 42412 has strength 10.94; Genemark calls start at 42412 /note=SSC: 42412-42525 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 100.0% 8.44673E-13 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.99, -2.583959800616441, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Adumb2043] ],,YP_010677975,94.5946,8.44673E-13 SIF-HHPRED: SIF-Syn: Membrane protein (pham 14469), upstream gene is pham 17297, downstream gene is pham 16105, just like in phage Asa16. /note=Primary Annotator Name: Baughman, Lexie /note=Auto-Annotation: Glimmer and Genemark. Both agree on the same start site of 42412, with a start codon of GTG. /note=Coding Potential: Reasonable coding potential predicted within the putative ORF. Chosen start site of 42412 covers all of the coding potential. /note=SD (Final) Score: The SD (Final) Score is -2.584, and the Z-score is 2.99. These are the best out of all the listed possible start sites. /note=Gap/Overlap: There is a 4 base pair overlap with the upstream gene, indicating that this gene is likely part of an operon. The auto-annotated start site does not create the longest ORF, but the start site that does (position 42525) creates a 142 base pair overlap, so it seems unlikely to be the true start site. The length of the gene is nearly acceptable (114 bp). /note=Phamerator: As of 01/11/2022, the gene is found in Pham 14469. The pham is conserved in other members of the cluster - comparison was done between Tallboi and a few other non-draft genomes, including Asa16 and Crewmate. The function is not called by either Phamerator or PhagesDB. /note=Starterator: The “Most Annotated” start site (7) is present in 16 of 20 non-draft genes in this pham, and it is present in Tallboi. This start site corresponds to base pair position 42412, which is the auto-annotated start site. /note=Location Call: The gathered evidence suggests that this is a real gene and that its start site is likely at position 42412. /note=Function Call: The top NCBI BLASTp hit’s suggested function is membrane protein with high query coverage (100%), high % identity (86.49%), and a low e-value (6e-13). The top PhagesDB BLASTp hits’ suggested function is unknown, with high % identity (>67%) and a low e-value (<4e-9). While there were no hits in CDD, one of the top two hits in HHpred was informative - with high probability (91.85%), high coverage (51.3514%), and a nearly low enough e-value (0.49) - that listed the function as membrane protein. HHpred also noted “transmembrane segments;” this, in conjunction with the predicted transmembrane domains, makes it seem that the function is in fact membrane protein. /note=Transmembrane Domains: 1 TMD called in TMHMM and multiple TOPCONS programs detected at least one TMD. This evidence supports a function call of “membrane protein,” which was predicted by NCBI BLASTp and HHpred. /note=Secondary Annotator Name: Stephenson, Juliet /note=Secondary Annotator QC: I agree with this annotation and find that all evidence has been considered. CDS 42522 - 42788 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="Tallboi_67" /note=Original Glimmer call @bp 42522 has strength 5.74; Genemark calls start at 42522 /note=SSC: 42522-42788 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ADOLIN_69 [Arthrobacter phage Adolin]],,NCBI, q2:s27 98.8636% 3.14381E-26 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.889, -4.74195517835217, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ADOLIN_69 [Arthrobacter phage Adolin]],,QHB36651,57.7586,3.14381E-26 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Dooley, Naomi /note=Auto-annotation: The gene is called by Glimmer and Gene mark at the same start site 42522 along with the same start codon at GTG. This is a very common start codon so there is reason to believe that this may be the correct start site. /note=Coding Potential: The gene has reasonable coding potential that is covered by the start site. /note=SD (Final) Score: The SD score was reasonable at -4.742 which is the best score listed. The z-score of 1.889 is not greater than 2 and is therefore not the best z-score listed, however, the best z-score listed has an unreasonable gap and therefore is not likely to be a sufficient start site. /note=Gap/overlap: There is a reasonable gap of -4bp which is the smallest gap listed. This is a reasonable gap and it is unlikely for a new gene to be found in this gap. This could indicate an operon. /note=Phamerator: The pham number as of 1/12/2022 is 16105, this gene was also called by Adumb2043 /note=Starterator: The start number is 28 which corresponds to a 42522 start site. /note=Location call: Based on the data it appears that this is a real gene with a 42788 stop site and 42522 start site. /note=Function call: Based on the phages Adumb2043 and Adolin, it is safe to conclude that the function of this gene is HNH endonuclease. The top two BLAST hits on PhagesDB both have E-values of 9e-36 and 2e-25; additionally, they both have high scores of 147 and 112, both of which have a HNH endonuclease listed (both have high coverage, 98% for both, both have low identity (less than 65%) but still have the 1st and 3rd highest identities listed and e-value of 1.4e-40 and 2.2e-26). CDD contains no hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predicted any TMDs. Thus we can conclude that this is not a membrane protein. /note=Secondary Annotator Name: Thorp, Jocelyn /note=Secondary Annotator QC: I agree with the annotation above, and all categories have been considered. CDS 43038 - 43220 /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="Tallboi_68" /note=Genemark calls start at 43014 /note=SSC: 43038-43220 CP: yes SCS: genemark-cs ST: SS BLAST-Start: [hypothetical protein SEA_LONDON_68 [Arthrobacter phage London] ],,NCBI, q1:s9 100.0% 1.10471E-27 GAP: 249 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.224, -2.305049668942183, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_LONDON_68 [Arthrobacter phage London] ],,QOP64371,77.9412,1.10471E-27 SIF-HHPRED: SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: Wu, Meigan /note=Auto-annotation: Only GeneMark. GeneMark called the start site to be 43014. /note=Coding Potential: Significant coding potential observed in a forward reading frame and minor coding potential observed in a reverse reading frame in GeneMark Self. Coding potential in a forward reading frame only for GeneMark Host. This gene is likely a forward gene. /note=SD (Final) Score: For start site 43014, the final score is -4.700, and the corresponding z-score is 2.078. Start site 43014 is the third best set of scores on PECAAN. For start site 43038, the final score is -2.305, and the corresponding z-score is 3.224. Start site 43038 is the best set of scores on PECAAN. /note=Gap/overlap: Gap size for start site 43014 is 225. Gap size for start site 43038 is 249. Although these are large gaps, there is no coding potential observed in these prospective gaps. /note=Phamerator: As of January 12, 2022, the pham number is 17702. This gene is conserved in other phages in AZ, such as phage Adolin, Amyev, and Crewmate. No function is noted for this pham. /note=Starterator: Pham 17702 has 19 non-draft members. One of the start numbers called for this gene was start number 19, which corresponds to start site 43038. Start number 19 has 6 manual annotations. The other start number called was start number 16, which was called 1/19 times; this start number matches the start site called by GeneMark at 43014. /note=Location call: This gene is likely a real gene with a start site at 43038. Although this start site was not called by GeneMark or Glimmer, this start site was manually annotated in 6/19 non-draft genes for pham 17702, and this start site is associated with the best final score and z-score on PECAAN. /note=Function call: NKF. Multiple hits with e-values less than or equal to 6e-06 found on PhagesDB BLASTp corresponding to genes with unknown function. 9 hits with e-values less than or equal to 5e-05 for hypothetical protein genes found by NCBI BLASTp. CDD yielded two hits with e-values 1.29e-04 and 1.40e-04, which correspond to MSCRAMM_ClfB (MSCRAMM family adhesin clumping factor ClfB) and internalin_K (class 1 internalin InIK), respectively. HHpred yielded no reliable hits. /note=Transmembrane domains: No TMDs detected by TMHMM and TOPCONS. This gene is not a membrane protein gene. /note=Secondary Annotator Name: Zhuang, Chuzhi /note=Secondary Annotator QC: I think starterator is updated, now it shows (Start: 19 @43038 has 6 MA`s), and i think it is informative, suggesting 43038 as a better start. I agree with your annotation. Good job. CDS 43220 - 43522 /gene="69" /product="gp69" /function="HNH endonuclease" /locus tag="Tallboi_69" /note=Original Glimmer call @bp 43220 has strength 3.65 /note=SSC: 43220-43522 CP: yes SCS: glimmer ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 99.0% 8.80458E-47 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.258, -4.033958934919761, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage DrManhattan] ],,YP_009815415,90.0,8.80458E-47 SIF-HHPRED: d.4.1.8 (A:513-673) CRISPR-associated endonuclease Cas9/Csn1, HNH domain {Actinomyces naeslundii [TaxId: 1115803]},,,d4ogca2,66.0,97.3 SIF-Syn: /note=Primary Annotator Name: Magaling, Janelle /note=Auto-annotation: Glimmer calls for start at 43220. There is no GeneMark start. Codon is ATG. /note=Coding Potential: host and self trained genemark shows good coding potential and start sites covers it all. /note=SD (Final) Score: final scores -4.034 which is okay. Z score is 2.258 which is good. /note=Gap/overlap: this is the LORF. length of 303 is acceptable. gap of 125 bp which is concerning but synteny with Asa16 shows a similar gap. /note=Phamerator: 1/11/22 pham 94822. 27/675 genes in the pham are also in cluster AZ such as Adolin and Adumb2043. Majority of the genes in this pham called function HNH endonuclease which is an approved function. /note=Starterator: the most annotated start site is site 88 which was called in 419/649 non draft genes in the pham. Tallboi calls start site 88 at 43220. /note=Location call: start 88 at 43220 is conserved in phamerator and starterator, start covers all coding potential. also has synteny with Asa16. this is a real gene. /note=Function call: HNH endonuclease. phagesdb function table calls for hnh endonuclease. Phagesdb blast top hits (e value 1e-53) calls hnh endonuclease. NCBI blast top hits (1.1e-51) calls hnh endonuclease. HHpred and CDD had no good hits. /note=Transmembrane domains: no TMh hits so cant look at topcons and is not a membrane protein. /note=Secondary Annotator Name: Khaine, Aye Myat /note=Secondary Annotator QC: I agree with the primary annotation. I would indicate specifically the gap observed since it`s a bit confusing which gap is between which in the note. I would also indicate the start called in the location call and that it`s not a membrane protein.