CDS 87 - 554 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="MissSwiss_1" /note=Original Glimmer call @bp 87 has strength 12.65; Genemark calls start at 87 /note=SSC: 87-554 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Arthrobacter phage VResidence]],,NCBI, q1:s1 99.3548% 2.68033E-100 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.28, -2.305049668942183, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Arthrobacter phage VResidence]],,UYL87606,98.0519,2.68033E-100 SIF-HHPRED: Terminase_4 ; Phage terminase, small subunit,,,PF05119.15,47.0968,98.7 SIF-Syn: Phages Adolin, Amyev and VResidence have synteny with MissSwiss, all phages have terminase, large subunit gene downstream and portal protein downstream from the terminase, small subunit gene. /note=Primary Annotator Name: Esherick, Sophie /note=Auto-annotation: Glimmer and Genemark. Both call the start at 87. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The complementary ORF does not have any coding potential peaks or atypical coding potential. /note=SD (Final) Score: -2.305. It is the best final score on PECAAN. /note=Gap/overlap: 0. This is ideal for a phage. /note=Phamerator: pham:114715. Date 10/2/2023. It is conserved; found in VResidence (AZ1) and (AZ1). /note=Starterator: Start site in Starterator (start 40) was found in 55 of 249 ( 22.1% ) of genes in pham. It does not have the most manual annotations, however when this start site is present, it is called 100.0% of time when present. It has been manually annotated 39 of 214 phages. The start with the most MA’s was not found in MissSwiss. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 87. /note=Function call: Terminase small subunit. The top two phagesdb BLAST hits have the function of terminase, small subunit (E-value <2e-80 and 3e-77), and 2 top NCBI BLAST hits also have the function of terminase, small subunit. (99-100% coverage, 83%+ identity, and E-values of 3e-100 and 2 e-84). HHpred had two significant hits for terminase, small subunit as well with 98.6% and 98.5% probabilities, 47-53% coverages, and E-values of 2e-7 and 4.4e-8. CDD had no relevant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. DeepTMHMM also did not predict any TMDs. /note=Secondary Annotator Name: Yang, Emma /note=Secondary Annotator QC: I agree with both the location and function call for this gene. CDS 551 - 2257 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="MissSwiss_2" /note=Original Glimmer call @bp 551 has strength 13.32; Genemark calls start at 551 /note=SSC: 551-2257 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Arthrobacter phage VResidence]],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.163, -4.622300575606491, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Arthrobacter phage VResidence]],,UYL87607,97.7113,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,93.1338,100.0 SIF-Syn: Terminase large subunit, upstream gene is terminase small subunit & downstream gene is portal protein, as seen in phage DrManhattan /note=Primary Annotator Name: Peri, Jayasree /note= /note=Auto-annotation: Glimmer and GeneMark both call the start site at 551. ATG is the start codon that is called. /note= /note=Coding Potential: Coding potential in this ORF looks to be on the forward strand only, confirming that this is a forward gene; both Self and Host GeneMark indicate that there is coding potential throughout the ORF. /note= /note=SD (Final) Score: The SD (Final) Score is -4.622, which is the best final score on PECAAN (the least negative). The z-score is 2.163, which is a good score because it`s greater than 2. /note= /note=Gap/overlap: There is an overlap of 4bp, indicating that the gene might be part of an operon. /note= /note=Phamerator: The pham number as of Ocober 4, 2023 is 166018. The gene is conserved in phages Adolin (in the same Cluster AZ) and Amyev (also in the same Cluster AZ).  /note= /note=Starterator: Start site 63 was the most annotated start site, with 560 out of 1237 non-draft phages being called as having this start site. However, start site 69 at position 551 was called as the most annotated start site for my gene. The start position based on Starterator (551) agrees with my Glimmer and GeneMark. /note= /note=Location call: Based on the evidence, this is a real gene, and the most likely start site is 551. /note= /note=Function call: Major capsid protein. Multiple PhagesDB Blast hits, specifically hits for non-draft phages, show the function of terminase large subunit (e-value < 10^-109), and multiple NCBI Blast hits show the same function of terminase large subunit. (100% coverage, +65% identity, and e-value of 0). The top hit in HHpred indicated a function of terminase large subunit with 100% probability, 93.1338% coverage, and E-value of 5.2 x 10^-39. CDD had a hit for terminase large subunit with 44.0141% coverage and E-value of 3.26 x 10^-15.  /note= /note=Transmembrane domains: DeepTMHMM doesn’t predict the presence of any transmembrane domains. /note= /note=Secondary Annotator Name: Emma Yang /note= /note=Secondary Annotator QC: After reviewing the PECAAN notes, I agree with both the location and function call for this gene. CDS 2285 - 3661 /gene="3" /product="gp3" /function="portal protein" /locus tag="MissSwiss_3" /note=Original Glimmer call @bp 2285 has strength 13.19; Genemark calls start at 2285 /note=SSC: 2285-3661 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 0.0 GAP: 27 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.28, -2.0162541296952132, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage DrManhattan] ],,YP_009815346,97.8166,0.0 SIF-HHPRED: Portal protein; Bacteriophage, SPP1, Portal Protein, Head completion proteins, Connector Complex, DNA Channel, VIRAL PROTEIN; 2.7A {Bacillus subtilis},,,7Z4W_L,91.4847,100.0 SIF-Syn: Portal protein, upstream gene is terminase (large portal), downstream is capsid maturation protease, just like in phage Adolin (AZ). /note=Primary Annotator Name: Shera, Simer /note= /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 2285. ATG is the start codon that is called. /note=Coding Potential: ORF only shows coding potential in the forward strand, implying this is a forward gene. Coding potential is found both in GenMark Self and Host. /note=SD (Final) Score: Best (least negative) score found on PECAAN: -2.016. The z-score is 3.28 which is good because it is greater than 2. /note=Gap/overlap: 27 base pairs, this is a reasonable gap. The gap is conserved in phage Adolin, which is also part of the AZ1 cluster. /note=Phamerator: 116016. Date 10/04/2023. It is conserved; found in Adolin (AZ), Adumb2043 (AZ). /note=Starterator: Start Site 73 is the most called for manually (298/1525 non-draft genes). MissSwiss does not call for this start site. MissSwiss calls for Start 102 which is called for in (called for in 40/1525 non-draft genes). This Start Site is called for 86.3% of the time when present. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the likely start site is 2285. /note= /note=Function call: Portal protein. The top three non-draft genes PhagesDB BLAST hits call for portal protein function (E-value=0). 16 of the top NCBI BLAST hits have a portal protein function (E-value=0, 100% coverage, >80% identity). HHPred has two hits calling for portal protein function (E-Value<10^-33). CDD has one hit for a portal protein function (E-value=1.17995e-37) /note= /note=Transmembrane domains: No evidence of a transmembrane protein by DeepTMHMM. /note= /note=Secondary Annotator Name: Yang, Emma /note= /note=Secondary Annotator QC: Even though it’s not one of the points, I would still mention the z-score in the SD score section since it gives additional evidence for your start site. For Function call, I would mention your evidence from CDD, even if there were no relevant hits so people know that you checked it. Other than these changes, I agree with both the location and function call for this gene. CDS 3683 - 5740 /gene="4" /product="gp4" /function="capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin" /locus tag="MissSwiss_4" /note=Original Glimmer call @bp 3683 has strength 11.82; Genemark calls start at 3683 /note=SSC: 3683-5740 CP: yes SCS: both ST: SS BLAST-Start: [capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 0.0 GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.889, -2.827683592113848, yes F: capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin SIF-BLAST: ,,[capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin [Arthrobacter phage Adolin]],,QHB36586,97.5182,0.0 SIF-HHPRED: d.166.1.1 (A:265-550) automated matches {Anthrax bacillus (Bacillus anthracis) [TaxId: 1392]} | CLASS: Alpha and beta proteins (a+b), FOLD: ADP-ribosylation, SUPFAM: ADP-ribosylation, FAM: ADP-ribosylating toxins,,,SCOP_d4dv8a1,37.5182,99.6 SIF-Syn: MissSwiss` capsid maturation and VIP2-like ADP-ribosyltransferase toxin is in the same location as the capsid maturation and VIP2-like ADP-ribosyltransferase toxin in Aldolin and DrManhattan. The protein before capsid maturation and VIP2-like ADP-ribosyltranferase toxin is a portal protein for both Aldolin and DrManhattan, showing synteny. /note=Primary Annotator Name: Qin, Kaley /note=Auto-annotation: Both Glimmer and GeneMark call the same start site at 3683 with a start codon of ATG. /note=Coding Potential: Both host-trained and self-trained GeneMark agree that there is reasonable coding potential within the ORF and include all of the coding potential within the chosen start site. Both host-trained and self-trained GeneMark show coding potential in the complementary sequence; however, it is not possible for to switch orientation because there are no gaps or stops in the gene. /note=SD (Final) Score: -2.828. It is the best final score on PECAAN. /note=Gap/overlap: The gap is 21 which is reasonable and is the smallest gap on PECAAN. /note=Phamerator: pham: 2324. Date 10/03/2023. It is most well conserved in cluster AZ. Two examples of AZ phages where this pham is conserved is Cassia (AZ) and Crewmate (AZ). /note=Starterator: Start site 3 in Starterator was manually annotated in 32 of 35 non-draft genes in this pham. Start site 3 correlates to start site 3683 in MissSwiss. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 3683. /note=Function call: Capsid maturation and VIP2-like ADP-ribosyltransferase toxin. On phagesDB pBLAST, the top two hits have significant e-values < 10-6 (DrManhattan’s e-value and Adolin’s e-value is 0). On PECAAN phagesDB pBLAST, these two phages don’t appear. These hits are capsid maturation and VIP2-like ADP-ribosyltransferase toxin. NCBI pBLAST confirms significant hits with both Aldolin and DrManhattan (same e-values of 0). NCBI pBLAST confirms the same function for Aldolin’s hit; however, for DrManhattan, it says head maturation protease function. On CDD, there is only one result for actin-ADP-ribosylating toxin in bacillus bacteria with an e-value of 1.64 e-19 which is less than our threshold of 10e-3. On HHPRED, the top hits are matches with proteins in bacillus bacteria - including lethal toxins and ADP-ribosyltransferase. All top matches have a percentage over 99%, coverage over 35%, and an e-value below 10e-3. Based on the evidence, we can conclude this is a Capsid maturation and VIP2-like ADP-ribosyltransferase toxin. /note=Transmembrane domains: DeepTMHMM predicts one TMDs with a length of 685. This aligns with the HHPred predictions because one hit with 99.34% confidence, coverage of 37.3%, and an e-value of 6.2e-10 is an anthrax toxin lethal factor with a middle domain. So, our protein with similar sequence and function most likely has a TMD as well. /note=Secondary Annotator Name: Santos, Elisha Anne /note=Secondary Annotator QC: I agree with this annotation for location and function call. All of the evidence categories have been considered. I would maybe mention that there is a little bit of coding in the reverse, but it’s not possible to switch orientations, or add a gene since there are no stops or enough bps gap. CDS 5797 - 6156 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="MissSwiss_5" /note=Original Glimmer call @bp 5797 has strength 6.51; Genemark calls start at 5797 /note=SSC: 5797-6156 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp05 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 1.1214E-66 GAP: 56 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.125, -2.9284886490054283, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp05 [Arthrobacter phage DrManhattan] ],,YP_009815348,92.437,1.1214E-66 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santaolaya, Cristal /note=Auto-annotation: Glimmer and GeneMark both listed and called the start site at 5797. The start codon is ATG. /note=Coding Potential: According to the data from the Host-Trained GeneMark and the Self-Trained GeneMark, this gene evidently has a good amount of coding potential and has no alternate coding potential reported / detected. Within this ORF there is coding potential on the forward strand so this suggests that this is a forward gene. /note=SD (Final) Score: -2.928. This is the best final score on PECAAN and it is therefore the best choice. Additionally, this start location also has the smallest gap which is that of 56 bp compared to that of the next smallest which would be a gap of 251 bp. It is important to note that there was a good Z-score of 3.125 as well and it was the best one. /note=Gap/overlap: 56 bp gap, this gap is greater than 50 bp so it is worth considering to see if there is possibly another gene but ultimately there seems to be no coding potential within the gap. The gap is conserved in the phages Adolin and DrManhattan as well. /note=Phamerator: pham: 116466. Date 10/02/2023. The pham is conserved in Adolin and DrManhattan which are both part of the cluster AZ. /note=Starterator: Start site number 6 in Staterator was manually annotated in 29 of 35 and was found in 43 of 51 genes within the pham. Start site 6 can be found at 5797 in MissSwiss. The evidence that has been collected and analyzed agrees with the predicted start sites given by both Glimmer and GeneMark. /note=Location call: After evaluating the gene based on the guidelines, the gene is real and does not need to be deleted. Furthermore, the gene contains coding potential and the start site is 5797. /note=Function call: Function unknown. As of right now, all of the members of the Pham 116466 that are shown in the Phagesdb BLAST have no called function, their functions remain unknown so no function can be called yet. The best two hits on Phagesdb BLAST were DrManhattan (4e-56) and Adolin (9e-56). Additionally, CDD and HHpred did not give any significant hits. There was and NCBI Blast hit with an e-value of 1.1214e-66, 100% coverage and 82.3529% identity. /note=Transmembrane domains: TMHMM did not predict any TMDs. Since it did not detect any, this cannot be a membrane protein. /note=Secondary Annotator Name: Givan, Susanna /note=Secondary Annotator QC: Should note that this the Z-score is also the best Z-score of all the possible start sites. Perhaps also mention NCBI Blast on the function call as additional evidence for this gene. Did this gene have synteny to put in the box below? CDS 6276 - 6818 /gene="6" /product="gp6" /function="scaffolding protein" /locus tag="MissSwiss_6" /note=Original Glimmer call @bp 6276 has strength 17.38; Genemark calls start at 6276 /note=SSC: 6276-6818 CP: yes SCS: both ST: SS BLAST-Start: [head scaffolding protein [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 3.71209E-102 GAP: 119 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.044, -3.3503726477288405, yes F: scaffolding protein SIF-BLAST: ,,[head scaffolding protein [Arthrobacter phage DrManhattan] ],,YP_009815349,94.382,3.71209E-102 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_b,61.1111,97.9 SIF-Syn: The gene is a scaffolding protein, with the upstream gene as NKF and the downstream gene as a major capsid protein, just like in the phage DrManhattan, Adolin, and Asa16 /note=Primary Annotator Name: Santos, Elisha Anne /note=Auto-annotation: Glimmer and GeneMark both call the start at 6276. This start site has a starting codon of ATG, which is a common start codon. Thus, this agreement provides evidence that the start codon is likely to be correct. This start site does not give the longest ORF option. /note=Coding Potential: There is strong coding potential for this gene in GeneMark self and host. The start site covers all of the coding potential. There is coding potential in both forward and reverse strands, but the potential in the reverse does not indicate the presence of an orientation switch or new gene since there were no stop sites present. /note=SD (Final) Score: -3.350. This is the best final score on PECAAN. The Z-score is also the highest at 3.044. /note=Gap/overlap: 119bps. It is a larger gap; however, the gap upstream is conserved in other genomes in the same cluster and pham like DrManhattan. /note=Phamerator: Pham 1850. Date 10/03/2023. It is conserved; found in DrManhattan, Asa16, Adolin. Overall, there are 56 members in the same pham and same cluster (AZ). /note=Starterator: Start site 17 in Starterator was manually annotated in 47/50 non-draft genes in this pham. This site is called 95.6% of the time when present in the gene. Start is 6376 in MissSwiss. This agrees with GeneMark and Glimmer. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 6276. /note=Function call: Scaffolding Protein. The top 5 PhagesDB Blast hits of non-draft phages have the function of Scaffolding Protein with e-values from 5e^-85 to 1e^-101. In NCBI Blastp, there were also multiple hits with the function of scaffolding protein (e-value < 4e^-100). There was 1 CCD hit that had an e-value of 4.92e^-3 where the function was also scaffold protein. Out of the 83 HHpred Hits, the best hit also had a function of scaffold protein (e-value: 0.00086, 97.86% probability, 61.11% coverage). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs; therefore, it is not a membrane protein. /note=Secondary Annotator Name: Givan, Susanna /note=Secondary Annotator QC: Perhaps not mention Z-score/ SD score in the auto-annotation? Maybe save discussing them for the specific Z-score/SD Score section. Is the gap upstream or downstream of the gene? Or is it both upstream/downstream? What percentage of the time is the start site location for Starterator called when present? I really like your analysis of the function call, good job!! CDS 6842 - 7783 /gene="7" /product="gp7" /function="major capsid protein" /locus tag="MissSwiss_7" /note=Original Glimmer call @bp 6842 has strength 18.38; Genemark calls start at 6842 /note=SSC: 6842-7783 CP: yes SCS: both ST: SS BLAST-Start: [major head protein [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 0.0 GAP: 23 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.794, -3.3136498407041004, yes F: major capsid protein SIF-BLAST: ,,[major head protein [Arthrobacter phage DrManhattan] ],,YP_009815350,97.1246,0.0 SIF-HHPRED: Major capsid protein; P22 Bacteriophage, VIRUS; 3.3A {Salmonella phage P22},,,5UU5_A,92.6518,100.0 SIF-Syn: The gene is a major capsid protein, with the upstream gene as a scaffolding protein and the downstream gene as a head-to-tail adaptor, just like in the phage Asa16. This section of proteins and adaptors is conserved in Asa16. /note=Primary Annotator Name: Yang, Emma /note=Auto-annotation: Both Glimmer and GeneMark call the gene and note that 6842 is the start site. /note=Coding Potential: Coding potential is seen in both Host and Self-Trained GeneMark in the forward frame ORF2 only, suggesting that it is a forward gene. Furthermore, the start site covers all the coding potential as seen by the Self and Host GeneMark maps. /note=SD (Final) Score: The final score is -3.314, which is the best option compared to other start sites, and the z-value is 2.794, which is a good z-value because it’s greater than 2. Furthermore, the start codon is ATG, which is one of the more common start codons. /note=Gap/overlap: The overlap is 23bp, which is a slightly bigger overlap, but it is still the smallest gap compared to the other start sites. The length of the gene (942bp) was acceptable because it exceeded 120bp and is the LORF. /note=Phamerator: This gene is part of pham 228 according to PhagesDB as of October 4, 2023. It is conserved in phages Adolin of the same cluster as MissSwiss as well as 39 other phages in Cluster AZ (288 total members in the pham). The function call for the gene in other phages is a major capsid protein and it’s consistent on PECAAN as well. /note=Starterator: Out of the 252 non-draft members in this Pham, 141 of them call the start site at start site 11, which doesn’t correlate to the auto-annotated start site 9 at 6842 with 53 manual annotations for MissSwiss by GeneMark and Glimmer, so Staterator is not relevant for location call. However, 6842 is the most likely start site because it has the most favorable RBS final score, is the LORF, and the gap is the smallest out of all the start sites. /note=Location call: This evidence all points to the presence of a real gene with a start site confirmed by both Glimmer and GeneMark at nucleotide 6842. /note=Function call: Multiple PhagesDB BLAST had hits with the suggested function of a major capsid protein; these results had small e-values of 1 x 10^-159 and 1 x 10^-134 and alignment of 97% and 88%; the NCBI BLAST results had the small e-value of 0.0 and 7 x 10^-174. HHpred had hits that aligned with 5UU5_A major capsid protein with 99.96% probability, 92.65% coverage, and an e-value of 9.7 x 10^-27 and 8E16_C tail protein with 99.94% probability, 90.73% coverage, and an e-value of 7.7 x 10^-25. There was one relevant hit on CDD with an e-value of 8.97 x 10^-11 that listed the function as a P22 coat protein super family. /note=Transmembrane domains: There are no TMDs found by DeepTMHMM, so this gene is not a membrane protein. /note=Secondary Annotator Name: Shera, Simer /note=Secondary Annotator QC: After reviewing the PECAAN notes, I agree with both the location and function call for this gene. CDS 7857 - 8261 /gene="8" /product="gp8" /function="head-to-tail adaptor" /locus tag="MissSwiss_8" /note=Original Glimmer call @bp 7857 has strength 11.0; Genemark calls start at 7857 /note=SSC: 7857-8261 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail adaptor [Arthrobacter phage Adolin]],,NCBI, q1:s1 99.2537% 4.95555E-74 GAP: 73 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.28, -2.0162541296952132, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Arthrobacter phage Adolin]],,QHB36590,96.2687,4.95555E-74 SIF-HHPRED: a.229.1.1 (A:) Hypothetical protein YqbG {Bacillus subtilis [TaxId: 1423]} | CLASS: All alpha proteins, FOLD: Hypothetical protein YqbG, SUPFAM: Hypothetical protein YqbG, FAM: Hypothetical protein YqbG,,,SCOP_d1xn8a_,82.0896,99.2 SIF-Syn: This gene is a head-to-tail adaptor shares synteny with Adolin which also has a major capsid protein upstream and pham 132640 downstream. /note=Primary Annotator Name: Givan, Susanna /note=Auto-annotation: Both Genemark and Glimmer call a start site of 7857 and agree on this start site. The auto-annotated start site of 7857 has a start codon of “ATG”, which is a common start site. Both Genemark and Glimmer agree on the start codon being ATG, thus there is a high probability of ATG being the start codon. /note=Coding Potential: The coding potential for this ORF is forward, this indicates it is a forward gene. The host-trained Genemark does show strong coding potential present and this gene has a coding potential predicted by both Genemark and Glimmer. The ORF the start site is found in and that covers all of the coding potential is ORF 3. The start site of 7857 covers all of the coding potential for this gene. /note=SD (Final) Score: The final score is -2.016 and the Z-score is 3.28. Both the Z-score and final score are the highest respective scores listed on PECAAN. /note=Gap/overlap: There is a gap of 74 bps with the upstream gene and an overlap of -3 with the downstream gene. Although this is a larger gap, the upstream gap is conserved with a member of the same AZ1 subcluster, Adolfin. The overlap with the downstream gene is also conserved with Adolfin. Thus it has an overall conserved gene structure with Adolfin. The start site of 7857 would give a total gene length of 405 bps, which is an acceptable gene length. /note=Phamerator: Gene was found in Pham number 116364 as of 10/3/23. This Pham is pretty well conserved in the AZ1 subcluster, with a total of 80 members. This gene’s Pham was also found in the EH subcluster. Not all of the gene’s Pham had a function listed, however, the majority had head-to-tail adaptor listed as the function on both the phage db page and Phamerator. /note=Starterator: This gene’s Pham was found in the AZ1 and EH subclusters. In the AZ1 subcluster, subcluster members MissSwiss, AEgle, AGrandiflora, and Adolfin all called the start location as site #8 which corresponds to bp 7857. Base pair 7857 is also the most annotated start site in the Starterator. Start site #8 was called in 32 / 60 non-draft genes in the Pham and it was called 96.1 % of the time it was present. /note=Location call: Given all the evidence, this is a real gene and the start site begins at 7857. This is consistent with the fact that this start site is the most annotated site in the Starterator and that Glimmer/ Genemark both agree on the auto-annotated start site of 7857. In Genemark, 7857 also covers all of the coding potential and it is the start site with the longest gene product of the start site candidates. /note=Function call: Given all the evidence, the predicted function of the gene would be a head-to-tail adapter. There were many genes matched to conserved genes in the same Pham for this gene, thus there is conserved synteny for gene function. NCSI Blast also agrees with the head-to-tail adapter function for all but 1 of the other Pham members. The top two NCBI BLAST matches also had a head-to-tail adapter function labeled with e-values of 6.9e-13 and 1.5e-9 respectively. HHpred had a hit for Bacillus protein yqbG which is indicated in Sea-Phages as the criterion for a head-to-tail adaptor function to be called. /note=Transmembrane domains: DeepTMHMM didn’t have any TMDs listed, thus, it is not a membrane protein. /note= /note=Secondary Annotator Name: Santaolaya, Cristal /note=Secondary Annotator QC: I have QC`ed this gene and I agree with the calls made by the primary annotator. Since a function call was made, it would be good to fill out the synteny box to provide supporting evidence. Additionally I would also check the boxes for good hits. Besides that everything looks good, great job! CDS 8258 - 8365 /gene="9" /product="gp9" /function="hypothetical protein" /locus tag="MissSwiss_9" /note=Genemark calls start at 8258 /note=SSC: 8258-8365 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein HOU48_gp09 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s4 100.0% 4.36922E-11 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.984, -6.859370981496332, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp09 [Arthrobacter phage DrManhattan] ],,YP_009815352,86.8421,4.36922E-11 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: White, Logan /note=Auto-annotation: GeneMark calls the start site at 8258 bp with ATG start codon. Glimmer does not call a start site. /note=Coding Potential: There is coding potential on GeneMark self and host. The chosen start site does not cover all of the coding potential, however this gene is likely part of an operon which accounts for the start site cutoff of coding potential. /note=SD (Final) Score: The only final score is -6.859, and the z-score is 0.984. These values indicate that this gene is part of an operon. These scores are given less importance as this gene is likely part of an operon. /note=Gap/overlap: 4 bp overlap, which indicates this gene is part of an operon. /note=Phamerator: Pham 118122. Date: 10/09/23. This pham is conserved among DrManhattan(AZ1) and Adolin(AZ1). There is no function called in Phamerator or pham maps. /note=Starterator: Start site 2 was manually annotated in 3/6 non-draft genomes, including in Dr. Manhattan and Adolin. Start site 2 in MissSwiss is 8258, which agrees with GeneMark. /note=Location call: This gene is real with a start site at 8258 bp. This is the LORF with a length of 108 bp and is conserved with DrManhattan and Adolin. /note=Function call: Unknown function. There are two hits in PhagesDB BLAST (e-value: 1e-12) and one hit in NCBI BLAST(100% coverage, 78.9% identity, e-value: 4.37e-11) for no known function. HHPred and CDD are not informative as there were no hits in CDD and HHPred had high e-values (e-values > 24). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, not a membrane protein. /note=Secondary Annotator Name: Santaolaya, Cristal /note=Secondary Annotator QC: I have QC`ed this gene and agree with this annotation. All of the supporting evidence to make this call has been filled out and considered. Overall great job! CDS 8349 - 8699 /gene="10" /product="gp10" /function="head-to-tail stopper" /locus tag="MissSwiss_10" /note=Original Glimmer call @bp 8349 has strength 13.04; Genemark calls start at 8349 /note=SSC: 8349-8699 CP: yes SCS: both ST: SS BLAST-Start: [head closure Hc1 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 89.6552% 1.23055E-63 GAP: -17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.889, -3.4175091270247986, yes F: head-to-tail stopper SIF-BLAST: ,,[head closure Hc1 [Arthrobacter phage DrManhattan] ],,YP_009815353,85.3448,1.23055E-63 SIF-HHPRED: Head completion protein gp16; Bacteriophage, SPP1, Portal Protein, Head completion proteins, Connector Complex, DNA Channel, VIRAL PROTEIN; 2.7A {Bacillus subtilis},,,7Z4W_5,93.1034,99.6 SIF-Syn: Phages Adolin, DrManhattan and VResidence have synteny with MissSwiss, all phages have a head-to-tail adaptor gene upstream and a tail terminator gene downstream. /note=Primary Annotator Name: Esherick, Sophie /note=Auto-annotation: Glimmer and Genemark. Both call the start at 8349. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The complementary ORF does not have any coding potential peaks or atypical coding potential. /note=SD (Final) Score: -3.418. It is the best final score on PECAAN. /note=Gap/overlap: -17 overlap, this overlap is a bit large, however this overlap is conserved in another AZI phage Adolin that displays synteny. This start site also gives the LORF and has the best final score. /note=Phamerator: 10/4/2023 pham: 116108. It is conserved; found in Adolin (AZ1) and DrManhattan (AZ1). /note=Starterator: 10/4/2023. Start site in Starterator (start 32) was found in 5 of 295 ( 1.7% ) of genes in pham. It does not have the most manual annotations, however when this start site is present, it is called 80.0% of time when present. It has been manually annotated 3 of 263 phages. The start with the most MA’s was not found in MissSwiss. This start site was called in others phages that display synteny (Adolin, DrManhattan). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 8349. /note=Function call: head-to-tail stopper. The top two phagesdb BLAST hits have the function of head-to-tail stopper (E-values of 5e-58), and 2 top NCBI BLAST hits also have the function of head-to-tail stopper. (89-93% coverage, 67%+ identity, and E-values of 1.23055e-63 and 2.18014e-46). HHpred had significant hits for head-to-tail stopper as well with 99.6 probability, 93% coverage, and E-value of 5.6e-15. CDD had no relevant hits. SEA-PHAGES approved function list requirement of an HHPRED alignment to one of the following crystal structures: SPP1 16 (5A21 chain E or F in the macromolecular complex) was met. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. DeepTMHMM also did not predict any TMDs. /note=Secondary Annotator Name: Qin, Kaley /note=Secondary Annotator QC: I have QC’d this gene and agree with the primary annotator. CDS 8710 - 9018 /gene="11" /product="gp11" /function="hypothetical protein" /locus tag="MissSwiss_11" /note=Original Glimmer call @bp 8710 has strength 6.46; Genemark calls start at 8710 /note=SSC: 8710-9018 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp11 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 6.00554E-56 GAP: 10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.28, -2.4811409279978642, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp11 [Arthrobacter phage DrManhattan] ],,YP_009815354,97.0588,6.00554E-56 SIF-HHPRED: SIF-Syn: n/a /note=Primary Annotator Name: Peri, Jayasree /note= /note=Auto-annotation: Glimmer and GeneMark both call the start site at 8710. ATG is the start codon that is called. /note= /note=Coding Potential: Coding potential in this ORF looks to be on the forward strand only, confirming that this is a forward gene; Host GeneMark indicates that there is coding potential for about half of the ORF (but not all throughout), and Self GeneMark also indicates coding potential through most (although not all) of the ORF. /note= /note=SD (Final) Score: The SD (Final) Score is -2.481. Although this is not the best Final Score, it`s quite close; please see "Gap/overlap" sections for further explanation. /note= /note=Gap/overlap: There is a gap of 10bp, which is below the 50bp gap threshold. This indicates that there isn`t any excessive space in the genome around this gene that raises concern. The gap for the start site with the best Final score exceeds this 50bp gap threshold. /note= /note=Phamerator: The pham number as of Ocober 4, 2023 is 117543. The gene is conserved in phages Adolin (in the same Cluster AZ) and DrManhattan (also in the same Cluster AZ).  /note= /note=Starterator: Start site 26 was the most annotated start site, with 47 out of 93 non-draft phages being called as having this start site. However, start site 25 at position 8710 was called as the most annotated start site for my gene. The start position based on Starterator (8710) agrees with my Glimmer and GeneMark. /note= /note=Location call: Based on the evidence, this is a real gene, and the most likely start site is 8710. /note= /note=Function call: Unknown function. All PhagesDB Blast hits indicate an unknown function for this gene (e-value < 10^-30), and multiple NCBI Blast hits indicate a function of hypothetical protein. (≥96% coverage, +70% identity, and e-value < 10^-34). There were no notable hits in both HHpred and CDD. /note= /note=Transmembrane domains: DeepTMHMM doesn’t predict the presence of any transmembrane domains. /note= /note=Secondary Annotator Name: Santos, Elisha Anne /note= /note=Secondary Annotator QC: I do agree with the annotation for location and function call but I think the information has updated because some of the information is not matching. The final score and gap noted do not match the final score and gap for the start codon indicated on PECAAN. Similarly, the pham does not match either. You wrote pham 116309, but it is actually pham 117543. The Starterator data is also not right. It should be start site 25, which has 6 manual annotations and this start site doesnt have the phage Amyev. CDS 9018 - 9434 /gene="12" /product="gp12" /function="tail terminator" /locus tag="MissSwiss_12" /note=Original Glimmer call @bp 9018 has strength 8.44; Genemark calls start at 9018 /note=SSC: 9018-9434 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 2.52657E-90 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.378, -3.905685530420168, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage DrManhattan] ],,YP_009815355,99.2754,2.52657E-90 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,95.6522,99.4 SIF-Syn: Tail terminator protein, upstream gene is a protein with a no known function, downstream is a major tail protein, just like in phage Adolin (AZ). /note=Primary Annotator Name: Shera, Simer /note= /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 9018. /note=Coding Potential: ORF only shows coding potential in the forward strand, implying this is a forward gene. Coding potential is found both in GenMark Self and Host. GTG is the start codon that is called. /note=SD (Final) Score: Best score found on PECAAN: -3.906 but the score should not be considered significantly because the -1 gap for this gene could indicate it is part of an operon. The z-score is 2.378 which is good because it is greater than 2. /note=Gap/overlap: -1 base pair overlap, this overlap is indicative of this gene being a part of an operon. The gap is conserved in phage Adolin, which is also part of the AZ1 cluster. /note=Phamerator: 117600. Date 10/09/2023. It is conserved; >20 non-draft members of the Pham are from cluster AZ. /note=Starterator: Start Site 8 is the most called for manually (34/66 non-draft genes). MissSwiss does not call for this start site. MissSwiss calls for Start 9 which is called for in 27/66 non-draft genes. Start 8 is 9018 in MissSwiss. This Start Site is called for 100.0% of the time when present. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the likely start site is 9018. /note= /note=Function call: Tail terminator. The top five non-draft genes BLAST hits have tail terminator function (E-value<10^-59) and top NCBI BLAST hit has a tail terminator protein function (E-value<10^-90, 100% coverage, >97% identity). HHPred has a hit for a Tail terminator with an E-value<10^-11. CDD has no relevant hits. /note= /note=Transmembrane domains: No evidence of a transmembrane protein by DeepTMHMM. /note=Secondary Annotator Name: Yang, Emma /note=Secondary Annotator QC: Even though it’s not one of the points, I would still mention the z-score in the SD score section since it gives additional evidence for your start site. For Starterator, there is a report as of 10/11/2023, so I would go back to find the information and add it to your PECAAN notes. Also, don’t forget to check an option in the Pham Starterator dropdown once you get your information. For Function call, I would mention your evidence from CDD, even if there were no relevant hits so people know that you checked it. Other than these changes, I agree with both the location and function call for this gene. CDS 9452 - 10000 /gene="13" /product="gp13" /function="major tail protein" /locus tag="MissSwiss_13" /note=Original Glimmer call @bp 9452 has strength 12.26; Genemark calls start at 9452 /note=SSC: 9452-10000 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 2.71681E-126 GAP: 17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.28, -2.305049668942183, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage DrManhattan] ],,YP_009815356,99.4505,2.71681E-126 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_L,93.4066,98.5 SIF-Syn: MissSwiss` major tail protein is in the same location as the major tail protein in Adolin and DrManhattan. Before this protein, in all three phages, there is a tail terminator protein. And after this protein, there is a tail assembly chaperone. /note=Primary Annotator Name: Qin, Kaley /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 9452 with a start codon of ATG. /note=Coding Potential: Both host-trained and self-trained GeneMark agree that there is reasonable coding potential within the ORF and include all of the coding potential within the chosen start site. /note=SD (Final) Score: -2.305. It is the best final score on PECAAN. Z-score: 3.28 is above 2 which is good. /note=Gap/overlap: The gap is 17 which is reasonable and the smallest gap on PECAAN. /note=Phamerator: pham: 116263. Date 10/04/2023. The pham is cluster diverse including members from Z, BL, AZ, Q, DI, EH, DY, DW, & BA. Examples of AZ members in this pham are Aldolin and Crewmate. /note=Starterator: Start site 8 in Starterator is the most called start site and manually annotated in 99 of 101 non-draft genes in the pham. Start site 8 correlates to start site 9452 in MissSwiss. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 9452. /note=Function call: Major Tail Protein. On phagesDB pBLAST, the top two hits have significant e-values < 10-6 (DrManhattan’s e-value and Adolin’s e-value is 1e-101). The function of these hits is major tail protein. NCBI pBLAST confirms significant hits with DrManhattan (98% alignment, 100% coverage, and an e-value of 2.7e-126) and VResidence (81% alignment, 100% coverage, and an e-value of 3.1e-107). NCBI pBLAST confirms the same function - major tail protein. HHPRED confirms the same function in another bacteriophage (probability 98.5, coverage 93%, and e-value 0.000011 < threshold 10e-3). There were no hits from CDD. Based on the evidence, we can conclude this is a major tail protein. /note=Transmembrane domains: DeepTMHMM doesn’t predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Yang, Emma /note=Secondary Annotator QC: After reviewing the PECAAN notes, I agree with both the location and function call for this gene. CDS 10094 - 10360 /gene="14" /product="gp14" /function="tail assembly chaperone" /locus tag="MissSwiss_14" /note=Original Glimmer call @bp 10094 has strength 17.4; Genemark calls start at 10094 /note=SSC: 10094-10360 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage Reedo] ],,NCBI, q1:s1 100.0% 3.23117E-46 GAP: 93 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.291, -2.2821867859826788, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Reedo] ],,YP_010678198,92.0455,3.23117E-46 SIF-HHPRED: Phage_TAC_8 ; Phage tail assembly chaperone protein Gp14 ()A118,,,PF10666.12,94.3182,94.5 SIF-Syn: tail assembly chaperone. The HHPRED data does not have good data as the e-values are quite low however the PhagesDB blast data supports the function to be called as a tail assembly chaperone. The Pham that gene 14 belongs to is 116385 which is the same Pham that shares the same gene in DrManhattan. The gene that is upstream is thought to be a major tail protein which is in Pham 116263. This is also the case for the same genes in DrManhattan. Downstream is the tape measure protein which is also seen in the phages DrManhattan and Reedo. /note=Primary Annotator Name: Santaolaya, Cristal /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 10094 and the start codon is ATG. /note=Coding Potential: After analyzing the data given by both the Host-Trained GeneMark and Self-Trained GeneMark, it is evident that this gene contains good coding potential and there was no alternate coding potential that was detected. In the ORF, there is coding potential on the forward strand therefore suggesting that this is a forward gene. /note=SD (Final) Score: -2.282. This is the best score given on PECAAN and it is the best option. Notably, this start location also has the smallest gap and a Z-score of 3.291. /note=Gap/overlap: This gene has a length of 267 and a gap of 93 which is the smallest gap out of the 3 gene candidates. The gap is quite large however it is conserved in phages Adolin and DrManhattan. Within the gap there is no coding potential detected. /note=Phamerator: pham: 116385 which has 72 other members. Date 10/09/2023. The pham is conserved in DrSierra and DrManhattan. /note=Starterator: Start site number 6 in Staterator was called the most often in published annotations in 48 of the 53 non-draft genes but this gene does not have the most annotated start. This gene`s start site is at 7 and has one other manually annotated phage with the same start site, Reedo. Start site 7 can be found at 10094 in MissSwiss. The evidence that has been collected and analyzed agrees with the predicted start sites given by both Glimmer and GeneMark. /note=Location call: After evaluating the gene based on the guidelines, the gene is real and does not need to be deleted. Furthermore, the gene contains coding potential and the start site is 10094. /note=Function call: According to the Phagesdb pham information and Phamerator, the majority of the members of the pham 116385 have called this gene to be a tail assembly chaperone. Phages Cassia and Reedo are both part of the AZ cluster along with MissSwiss. They both have the same length of 267 bp and have the function of a tail assembly chaperone. There was no data given when CDD was ran and the information given by HHpred only gave one significant hit with an e-value of 0.5 and 94% coverage. It had the function of a phage tail assembly chaperone protein. From the Phagesdb BLAST, phages Adolin (e-value of 1e-40) and DrManhattan (e-value of 1e-40) which are part of the same pham and cluster as MissSwiss called the function as a tail assembly chaperone. The NCBI BLAST hit obtained with 100% coverage, 84% identity and an e-value of 3e-46 aligned with the call made for the tail assembly chaperone. /note=Transmembrane domains: TMHMM did not predict any TMDs. Since it did not detect any, this cannot be a membrane protein. /note=Secondary Annotator Name: Peri, Jayasree /note=Secondary Annotator QC: HHPred doesn`t show the hit that you have checked here on PECAAN (the one mentioned in your notes). I would check the HHPred website directly! Other than that, I agree with location and function call. Great work! CDS join(10094..10354,10354..10716) /gene="15" /product="gp15" /function="tail assembly chaperone" /locus tag="MissSwiss_15" /note= /note=SSC: 10094-10716 CP: yes SCS: neither ST: NI BLAST-Start: [tail assembly chaperone [Arthrobacter phage DrManhattan] ],,NCBI, q2:s3 99.5169% 8.65688E-134 GAP: -267 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.291, -2.2821867859826788, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage DrManhattan] ],,YP_009815357,95.2153,8.65688E-134 SIF-HHPRED: SIF-Syn: The gene is a tail assembly chaperone, with the upstream gene as major tail protein and the downstream gene as a tape measure protein, just like in the phage DrManhattan and Adolin. In addition, the overlap in genes of two tail assembly chaperones is also conserved in DrManhattan and Adolin. /note=Primary Annotator Name: Santos, Elisha Anne /note=Auto-annotation: After looking at coding potential and synteny with other genes, it is clear that a translational frameshift occurred. Thus, we can refer to the previous gene auto-annotation for the start site of this new added gene. Glimmer and GeneMark agree on the start site 10094. /note=Coding Potential: Since there is a programmed translational frameshift, we must look at 2 different frames. Both host-trained and self-trained GeneMark show reasonable coding potential within the first and second gene ORFs. The start site of 10396 does cover all of the coding potential for both the first and second gene. /note=SD (Final) Score: -2.282. This is the best final score on PECAAN. The z-score is highest at 3.273. /note=Gap/overlap: The overlap with the upstream gene is -267. This is a large overlap, but is ultimately conserved as a frameshift in several other phages in the same cluster of AZ like DrManhattan and Adolin. /note=Phamerator: Because we added this new gene for the frameshift annotation, and we based the startsite on the slippery sequence. the phamerator data written on PECAAN is not helpful. The gene`s pham is based on the previous gene it overlaps (this is also where the start site begins), so referring to the previous gene, the start site of 10395 is in Pham 116385 which has 72 other members. Date 10/10/2023. The pham is conserved in DrSierra and DrManhattan. /note=Starterator: Similar to Phamerator, this analysis is not based on the written Starterator data on PECAAN for this gene. Instead, it is based on the analysis data from the overlap/ previous gene. Thus, Start site 7 in Starterator was manually annotated in 1/ 53 non-draft genes in this pham. This phage is called Reedo. Start site 7 corresponds to 10094 in MissSwiss. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 10095. /note=Function call: Tail Assembly Chaperone. The top hits in phagesdb BLAST have a function of tail assembly chaperone (e-value < e-100). Similarly, the top hits in NCBI Blast also have the function of tail assembly chaperone ( >98% coverage, >82% identity, and an E-value 87.6% identity, e-values of 0) for minor tail protein function. CDD has no hits and HHPred has hits (e-values < 1.1e-11) for tail proteins. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, not a membrane protein. /note=Secondary Annotator Name: Peri, Jayasree /note=Secondary Annotator QC: I agree with this location and function call. CDS 15610 - 16614 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="MissSwiss_19" /note=Original Glimmer call @bp 15610 has strength 10.65; Genemark calls start at 15610 /note=SSC: 15610-16614 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage DrSierra] ],,NCBI, q1:s1 100.0% 0.0 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.947, -2.7863799983944713, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage DrSierra] ],,YP_010678344,90.4192,0.0 SIF-HHPRED: Endo-N-acetylneuraminidase; Chaperone, Glycosidase, Hydrolase; HET: TAM, PEG; 2.6A {Enterobacteria phage K1F},,,3GW6_F,35.9281,94.1 SIF-Syn: Phages Adolin, DrManhattan and VResidence have synteny with MissSwiss, all phages have a tape measure protein upstream and a minor tail protein gene downstream. /note=Primary Annotator Name: Esherick, Sophie /note=Auto-annotation: Glimmer and Genemark. Both call the start at 15610. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The complementary ORF does not have any coding potential peaks or atypical coding potential. /note=SD (Final) Score: -2.786. It is the best final score on PECAAN. Z-score is 2.947 which is above 2, which is preferred. /note=Gap/overlap: 0 gap/overlap. This is ideal in phage genomes. /note=Phamerator: 10/4/2023 pham: 85229. It is conserved in Adolin, DrManhattan and VResidence (all AZ1 phages). /note=Starterator: 10/4/2023. Suggested start site (start@15610) in Starterator (start 6) was found in 12 of 105 ( 11.4% ) of genes in pham. It does not have the most manual annotations, however when this start site is present, it is called 100.0% of time when present. It has been manually annotated 10 of 99. The start with the most MA’s was not found in MissSwiss. This start site was called in others phages that display synteny (Adolin, DrManhattan). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 15610. /note=Function call: minor tail protein. The top two phagesdb BLAST hits have the function of minor tail protein (E-values of 1e-168 and 1e-167), and 2 top NCBI BLAST hits also have the function of minor tail protein. (100% coverage, 86%+ identity, and E-values of 0). HHpred had no significant hits. CDD had no significant hits. SEA-PHAGES approved function list requirement of being in the syntenic region of minor tail proteins was met. In the protein sequence, there was also evidence of glycine-rich regions. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. DeepTMHMM also did not predict any TMDs. /note=Secondary Annotator Name: Shera, Simer /note=Secondary Annotator QC: /note=10/16/2023: I agree with the evidence presented in the PECAAN notes and the final location and function calls. /note=10/14/2023: Although it is not explicitly required, I would mention the z-score in discussion of the final score since it is greater than 2 (indicating it is good). It would also be helpful to mention what start site was called for in MissSwiss by Starterator and how many phages call for that start site. Since Starterator does not indicate the correct start site for MissSwiss, change the suggested start (SS) selection to not informative (NI). Otherwise, I agree with the evidence presented in the PECAAN notes and the final location and function calls. CDS 16611 - 17693 /gene="20" /product="gp20" /function="minor tail protein" /locus tag="MissSwiss_20" /note=Original Glimmer call @bp 16611 has strength 17.03; Genemark calls start at 16611 /note=SSC: 16611-17693 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage VResidence]],,NCBI, q1:s1 57.2222% 1.25662E-89 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.044, -2.5052746077145835, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage VResidence]],,UYL87625,47.7528,1.25662E-89 SIF-HHPRED: SIF-Syn: Minor tail protein, upstream gene is minor tail protein but downstream gene is NKF, as seen in phage DrManhattan /note=Primary Annotator Name: Peri, Jayasree /note= /note=Auto-annotation: Glimmer and GeneMark both call the start site at 16611. ATG is the start codon that is called. /note= /note=Coding Potential: Coding potential in this ORF looks to be on the forward strand only, confirming that this is a forward gene; both Self and Host GeneMark indicate that there is coding potential throughout the ORF. /note= /note=SD (Final) Score: The SD (Final) Score is -2.505, which is the best final score on PECAAN (the least negative). /note= /note=Gap/overlap: There is an overlap of 4bp, indicating that the gene might be part of an operon. /note= /note=Phamerator: The pham number as of October 4, 2023 is 88988. The gene is conserved in phages Adolin (in the same Cluster AZ) and Dr. Manhattan (also in the same Cluster AZ).  /note= /note=Starterator: Start site 1 was the most annotated start site, with 4 out of 4 non-draft phages being called as having this start site. This start site 1 at position 16611 was also called as the most annotated start site for my gene. The start position based on Starterator (16611) agrees with my Glimmer and GeneMark. /note= /note=Location call: Based on the evidence, this is a real gene, and the most likely start site is 16611. /note= /note=Function call: Minor tail protein. Many PhagesDB Blast hits indicate a function of minor tail protein (e-value < 10^-111). Most NCBI Blast hits indicate the function as virion structural protein, but one notable hit indicates a function of minor tail protein. (57% coverage, +70% identity, and e-value of 10^-89). There were no notable hits in both HHpred and CDD. In the Pham Maps, the systemic region of this gene includes a tape measure protein, which is further evidence that this is most likely the gene for a minor tail protein. /note= /note=Transmembrane domains: DeepTMHMM doesn’t predict the presence of any transmembrane domains. /note= /note=Secondary Annotator Name: Santos, Elisha Anne /note= /note=Secondary Annotator QC: I have QC`ed this gene and I agree with this annotation for the location and function call. CDS 17769 - 18038 /gene="21" /product="gp21" /function="membrane protein" /locus tag="MissSwiss_21" /note=Original Glimmer call @bp 17769 has strength 10.07; Genemark calls start at 17769 /note=SSC: 17769-18038 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp21 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 3.76423E-52 GAP: 75 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.854, -3.9397787770235135, yes F: membrane protein SIF-BLAST: ,,[hypothetical protein HOU48_gp21 [Arthrobacter phage DrManhattan] ],,YP_009815364,96.6292,3.76423E-52 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shera, Simer /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 17769. /note=Coding Potential: ORF primarily shows coding potential in the forward strand, implying this is a forward gene. There is some activity in the complementary strand, but not significant compared to the forward. Coding potential is found both in GenMark Self and Host. /note=SD (Final) Score: Best score found on PECAAN: -3.940. The z-score was 2.854 which is good because it is greater than 2. /note=Gap/overlap: 75 base pairs, this is an unusually large gap and there doesn’t appear to be any coding potential in between for both Glimmer and Genemark. Ultimately, this gap is reasonable because the gap is conserved in phage Adolin, which is also part of the AZ1 cluster. /note=Phamerator: 116413. Date 10/03/2023. It is conserved; found in Adolin (AZ), Adumb2043 (AZ). /note=Starterator: Start site 17 is called for manually in 9/47 non-draft genes. Start Site 17 is 17769 for MissSwiss. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and most likely start site is 17769. /note=Function call: Function unknown. The top three non-draft genes (E-value<10^-41) BLAST hits have a function unknown and 2 out of top NCBI BLAST hits have a hypothetical function. (100% coverage, 92.13% identity, and E-Value<10^-52 for first and 98% coverage, 77.27% identity, and E-Value<10^-41 for second top hit). HHPred is uninformative because all e-values are >0.0024 and CDD has no relevant hits found. /note=Transmembrane domains: DeepTMHMM predicts 1 TMD with a length of 89 base pairs, indicating evidence of a membrane protein. /note=Secondary Annotator Name: Qin, Kaley /note=Secondary Annotator QC: I have QC’d this gene and agree with the primary annotator. For the function call notes, I would also include a sentence about how the HHPRED were uninformative due to high e-values and how there were no CDD hits. CDS 18050 - 18298 /gene="22" /product="gp22" /function="membrane protein" /locus tag="MissSwiss_22" /note=Original Glimmer call @bp 18050 has strength 15.31; Genemark calls start at 18050 /note=SSC: 18050-18298 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Crewmate] ],,NCBI, q1:s1 100.0% 7.23643E-41 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.291, -1.993391246735709, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Crewmate] ],,YP_010678275,92.6829,7.23643E-41 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qin, Kaley /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 18050 with a start codon of ATG. /note=Coding Potential: Both host-trained and self-trained GeneMark agree that there is reasonable coding potential within the ORF and include all of the coding potential within the chosen start site. /note=SD (Final) Score: -1.993. It is the best final score in PECAAN. /note=Gap/overlap: The gap is 11 which is reasonable and the smallest gap on PECAAN. /note=Phamerator: pham: 115057. Date 10/04/2023. The majority of members in this pham are in the AZ cluster. Examples of AZ members in this pham are Adolin and DrManhattan. /note=Starterator: Start site 7 in Starterator is the most called start site and manually annotated in 26 of 36 non-draft genes in the pham. It is also found in 77.4% of genes in the pham. Start site 7 correlates to start site 18050 in MissSwiss. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 18050. /note=Function call: NKF. On phagesDB pBLAST, the top two hits have significant e-values < 10e-6 (ObiToo’s e-value is 2e-35 and Crewmates’s e-value is 2e-35) for unknown function proteins. NCBI pBLAST confirms these unknown function protein hits for Crewmate and ObiToo (100% coverage, 92% alignment, and e-value of 7.2e-41). There were no hits from CDD and no informative hits from HHpred (all the e-values were well above a reasonable threshold of 10e-3). So, we can only call the function as NKF. /note=Transmembrane domains: DeepTmHMM predicts 2 TMHs with a length of 82. Since it has no known function, all we can say is that the protein is a membrane protein. /note=Secondary Annotator Name: Esherick, Sophie /note=Secondary Annotator QC: I have QC`ed this gene and I agree with the location and function call of this gene. Just a quick note, Adolin is spelled wrong. For Starterator, You might also include the percentage of when it is called (100% of the time when present, etc.). This will give you even stronger evidence for this start site. CDS 18449 - 19060 /gene="23" /product="gp23" /function="deoxynucleoside monophosphate kinase" /locus tag="MissSwiss_23" /note=Original Glimmer call @bp 18449 has strength 19.75; Genemark calls start at 18449 /note=SSC: 18449-19060 CP: yes SCS: both ST: SS BLAST-Start: [deoxynucleoside monophosphate kinase [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 100.0% 5.64503E-127 GAP: 150 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.044, -2.5052746077145835, yes F: deoxynucleoside monophosphate kinase SIF-BLAST: ,,[deoxynucleoside monophosphate kinase [Arthrobacter phage Adumb2043] ],,YP_010677934,94.0887,5.64503E-127 SIF-HHPRED: DEOXYNUCLEOSIDE MONOPHOSPHATE KINASE; TRANSFERASE, PHOSPHOTRANSFERASE; HET: OCS, DGP; 2.0A {Enterobacteria phage T4} SCOP: c.37.1.1,,,1DEK_A,93.5961,99.8 SIF-Syn: Deoxynucleoside monophosphate kinase, downstream of this gene at gene 25 is exonuclease and upstream at gene 20 there is a minor tail protein just like in phage Adolin. in both MissSwiss there is synteny in the assigned functions in the same genes. /note=Primary Annotator Name: Santaolaya, Cristal /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 18449 and the start codon is ATG. /note=Coding Potential: After analyzing the data given by both the Host-Trained GeneMark and Self-Trained GeneMark, it is evident that this gene contains good coding potential. In the ORF, there is coding potential on the forward strand therefore suggesting that this is a forward gene. /note=SD (Final) Score: -2.505. This is the best score given on PECAAN and it is the best option. Notably, this start location also has the smallest gap and a Z-score of 3.044. /note=Gap/overlap: This gene has a length of 612 bp and a gap of 150 bp. Although it is the smallest gap out of the gene candidates, it is large; however it is conserved in phages Adolin and DrManhattan. Within the gap there is no coding potential detected. /note=Phamerator: pham: 97531 which has 222 other members. Date 10/02/2023. The pham is conserved in DrManhattan. /note=Starterator: Start site number 51 in Staterator was called the most often in published annotations in 51 of the 190 non-draft genes but this gene does not have the most annotated start. This gene`s start site is at 45 and has 39 other manually annotated phages with the same start site. Start site number 45 can be found at 18449 in MissSwiss. The evidence collected and analyzed agrees with the predicted start sites given by Glimmer and GeneMark. /note=Location call: After considering the guidelines, the gene is real and does not need to be deleted. Furthermore, the gene contains coding potential and the start site is called at 18449. /note=Function call: deoxynucleoside monophosphate kinase. According to the Phagesdb information and Phamerator, the majority of the members of the pham 97531 have called this gene to be a deoxynucleoside monophosphate kinase. Phagesdb BLAST gave 2 significant hits of 1e-102 for Nitro and an e-value of 1e-101 for Adumb2043. NCBI pBLAST provided a deoxynucleoside monophosphate kinase hit for Adumb2043 with 100% coverage and alignment of 94%. Its e-value was 5.6e-127. CDD gave a good hit with 88% coverage and an e-value of 6e-24. HHpred had two significant hits for deoxynucleoside monophosphate kinase. Hit 1DEK_A has an e-value of 6.2e-18 and SCOP-d1deka has an e-value of 5.4e-17. Taking this information into account, this gene function is deoxynucleoside monophosphate kinase. /note=Transmembrane domains: TMHMM did not predict any TMDs so therefore this cannot be a membrane protein. /note=Secondary Annotator Name: Peri, Jayasree /note=Secondary Annotator QC: I agree with these location and function calls. CDS 19159 - 19746 /gene="24" /product="gp24" /function="hypothetical protein" /locus tag="MissSwiss_24" /note=Original Glimmer call @bp 19159 has strength 16.14; Genemark calls start at 19159 /note=SSC: 19159-19746 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE12_gp25 [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 100.0% 1.33271E-117 GAP: 98 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.789, -7.2632802877541405, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE12_gp25 [Arthrobacter phage Adumb2043] ],,YP_010677935,91.2821,1.33271E-117 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santos, Elisha Anne /note=Auto-annotation: Glimmer and GeneMark both call start site at 19159. This start site has a starting codon of ATG, which is a common start codon. This is the longest ORF out of the possible options, however the z-score is <2 and the final score is the most negative, which is not ideal. /note=Coding Potential: The gene has reasonable coding potential in the ORF. The start site covers all of the coding potential and there is only coding potential in the forward direction. /note=SD (Final) Score:-7.263. This is not the best final score in PECAAN; however, Genemark, Glimmer, and Starterator agree to call this the start. Additionally, this start site allows for the longest ORF and smallest gap. /note=Gap/overlap: 98bps. It is a larger gap; however, the gap is conserved in other genomes in the same cluster and pham like DrManhattan and Adolin. There is no coding potential in the gap to indicate the presence of another gene. /note=Phamerator: Pham 1819. Date 10/04/2023. It is conserved; found in 53 members like DrManhattan (AZ) and Adolin (AZ). /note=Starterator: Start site 19 in Starterator was manually annotated in 35/52 non-draft genes in this pham (called 92.5% of the time when present). Start is 19159 in MissSwiss. This agrees with GeneMark and Glimmer. /note=Location call: Based on evidence, this is a real gene and the most likely start site is 19159. /note=Function call: No Known Function. All hits on phagesdb Blast and NCBI with e-values better than e-10 had a function call of no known function (NKF). Since the e-values were 93.2% probability, e-values < 1.9e-86) for laglidadg endonuclease. CDD has a hit (55.2% coverage, e-value 3e-4) and HHPred ( >76.1% coverage, 99.5% identity, e-values < 2e-12). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, not a membrane protein. /note=Secondary Annotator Name: Esherick, Sophie /note=Secondary Annotator QC: I have QC`ed this gene and agree with the location and functional call made by the primary annotator. CDS 21691 - 22404 /gene="28" /product="gp28" /function="recombination directionality factor" /locus tag="MissSwiss_28" /note=Original Glimmer call @bp 21691 has strength 18.79; Genemark calls start at 21691 /note=SSC: 21691-22404 CP: yes SCS: both ST: SS BLAST-Start: [recombination directionality factor [Arthrobacter phage Lego]],,NCBI, q1:s1 100.0% 8.81322E-164 GAP: 142 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.558, -4.103700314387452, no F: recombination directionality factor SIF-BLAST: ,,[recombination directionality factor [Arthrobacter phage Lego]],,QIN94429,97.0464,8.81322E-164 SIF-HHPRED: Gp3-like ; Recombination directionality factor-like,,,PF18897.3,87.7637,100.0 SIF-Syn: Phages Adolin and DrManhattan have synteny with MissSwiss. All phages have a LAGLIDADG endonuclease upstream and a metallophosphoesterase gene downstream. /note=Primary Annotator Name: Esherick, Sophie /note=Auto-annotation: Glimmer and Genemark. Both call the start at 21691. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The complementary ORF does not have any coding potential peaks or atypical coding potential. /note=SD (Final) Score: -4.104. It is the best final score on PECAAN. /note=Gap/overlap: 142 bp gap. This is very large, however it is conserved among the other AZ1 phages that display synteny with MissSwiss: Adolin (144 bp gap) and VResidence (146 bp gap). /note=Phamerator: 10/6/2023 pham: 848. It is conserved in Adolin, DrManhattan and VResidence (all AZ1 phages). /note=Starterator: 10/6/2023 Start site in Starterator (start 37) was found in 78 of 144 ( 52.4% ) of genes in pham. This start has the most manual annotations, and when this start site is present, it is called 96.2% of time when present. It has been manually annotated 56 of 119. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 21691. /note=Function call: recombination directionality factor. The top two phagesdb BLAST hits have the function of recombination directionality factor (E-values of 1e-129 and 1e-123), and 2 top NCBI BLAST hits also have the function of recombination directionality factor. (100% coverage, 94%+ identity, and E-values of 8.81322e-164 and 8.83394e-162). HHpred had a significant hit for recombination directionality factor (100% probability, E-value of 8.1e-33 and 87.7% coverage) . CDD had no significant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. DeepTMHMM also did not predict any TMDs. /note=Secondary Annotator Name: Santaolaya, Cristal /note=Secondary Annotator QC: I have QC`ed this annoation and agree with the location and function calls made by the primary annotator. Great job! CDS 22404 - 22532 /gene="29" /product="gp29" /function="membrane protein" /locus tag="MissSwiss_29" /note=Original Glimmer call @bp 22404 has strength 22.06; Genemark calls start at 22404 /note=SSC: 22404-22532 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Lizalica] ],,NCBI, q1:s1 97.619% 3.73218E-13 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.142, -4.455662434380964, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Lizalica] ],,YP_010677595,80.0,3.73218E-13 SIF-HHPRED: SIF-Syn: Membrane, upstream gene is recombination directionality factor & downstream gene is NKF, as seen in phage Adumb2043. /note=Primary Annotator Name: Peri, Jayasree /note= /note=Auto-annotation: Glimmer and GeneMark both call the start site at 22404. ATG is the start codon that is called. /note= /note=Coding Potential: Coding potential in this ORF looks to be on the forward strand only, confirming that this is a forward gene; both Self and Host GeneMark indicate that there is coding potential throughout the ORF. /note= /note=SD (Final) Score: The SD (Final) Score is -4.456, which is the best final score on PECAAN (the least negative). The Z-score is 2.142, which also supports this evidence as being the start site. /note= /note=Gap/overlap: There is a overlap of 1bp, indicating that the gene might be part of an operon. /note= /note=Phamerator: The pham number as of Ocober 9, 2023 is 117755. The gene is conserved in phages Adolin (in the same Cluster AZ) and Dr. Manhattan (also in the same Cluster AZ).  /note= /note=Starterator: Start site 3 was the most annotated start site, with 43 out of 43 non-draft phages being called as having this start site. This start site 3 at position 22404 was also called as the most annotated start site for my gene. The start position based on Starterator (22404) agrees with my Glimmer and GeneMark. /note= /note=Location call: Based on the evidence, this is a real gene, and the most likely start site is 22404. /note= /note=Function call: Membrane protein, suspected holin but cannot confirm. All PhagesDB Blast hits indicate an unknown function for this gene (most have e-value < 10^-6). Some NCBI Blast hits indicate a function of membrane protein. (≥92% coverage, most are +65% identity, and most e-values are < 10^-10). There were no notable hits in both HHPred and CDD. Because of the presence of 2 TMDs and absence of TMDs in genes in the syntenic area, calling this as a membrane protein based on SEA-PHAGES approved function list criteria. /note= /note=Transmembrane domains: DeepTMHMM predicts the presence of 2 transmembrane domains. /note= /note=Secondary Annotator Name: Logan White /note= /note=Secondary Annotator QC: Starterator has updated and start site 3 in MissSwiss is consistent with Glimmer/GeneMark. Also there is no gap but a 1 bp overlap. I would also include the z-score as it is a good score that further supports the called start site. I agree on location and function call. Great, succinct notes! CDS 22553 - 22717 /gene="30" /product="gp30" /function="hypothetical protein" /locus tag="MissSwiss_30" /note=Original Glimmer call @bp 22553 has strength 16.22; Genemark calls start at 22553 /note=SSC: 22553-22717 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp30 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 1.58194E-24 GAP: 20 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.044, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp30 [Arthrobacter phage DrManhattan] ],,YP_009815373,90.7407,1.58194E-24 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shera, Simer /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 22553. /note=Coding Potential: ORF only shows coding potential in the forward strand, implying this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: Best score found on PECAAN: -2.523 /note=Gap/overlap: 20 base pair gap, this is a reasonable gap. The gap is conserved in phage Adolin, which is also part of the AZ1 cluster. /note=Phamerator: 4977. Date 10/09/2023. It is conserved; All 13 non-draft members of the Pham are from cluster AZ. /note=Starterator: Start Site 2 is the most called for manually (13/13 non-draft genes). MissSwiss calls for Start Site 2 which is 22553. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the likely start site is 22553. /note=Function call: Function unknown. The top four non-draft genes BLAST hits have unknown function and 2 out of top NCBI BLAST hits have a hypothetical protein function with E-value<10^-23, 100% coverage and >83% identity. HHPred has no relevant hits because all E-values>38. /note=Transmembrane domains: No evidence of a transmembrane protein by DeepTMHMM. /note=Secondary Annotator Name: Esherick, Sophie /note=Secondary Annotator QC: The only note i can say is to include that for transmembrane you used DeepTMHMM. Other than that, I have QC`ed this gene and agree with the location and function call. CDS 22798 - 23148 /gene="31" /product="gp31" /function="hypothetical protein" /locus tag="MissSwiss_31" /note=Original Glimmer call @bp 22798 has strength 15.2; Genemark calls start at 22798 /note=SSC: 22798-23148 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp31 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 2.20424E-68 GAP: 80 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.947, -2.996490344739583, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp31 [Arthrobacter phage DrManhattan] ],,YP_009815374,92.2414,2.20424E-68 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qin, Kaley /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 22798 with a start codon of GTG. /note=Coding Potential: The host-trained GeneMark predicts reasonable coding potential within the ORF; however, the called start-site leaves out a small bit of coding potential. The self-trained GeneMark only predicts reasonable coding potential for half of the open-reading frame. The called start-site includes all reasonable coding potential. /note=SD (Final) Score: -2.996. This is the best final score in PECAAN. The Z-score is also above 2 which is good. /note=Gap/overlap: The gap is 80, which is on the larger end. However, when looking at pham maps, the gap is reasonable and conserved in other phages like Adolin. 80 is also the smallest gap of all proposed start sites on PECAAN. /note=Phamerator: pham: 85710. Date 10/05/2023. All of the members of this pham are in cluster AZ, including ObiToo and Crewmate. /note=Starterator: Start site 8 in Starterator is the most called start site and manually annotated in 32 of 35 non-draft genes in the pham. Start site 8 correlates to start site 22798 in MissSwiss. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 22798. /note=Function call: NKF. On phagesDB pBLAST, the top two hits have significant e-values < 10e-6 (Adolin’s e-value is 5e-54 and DrManhattan’s e-value is 5e-54) for unknown function proteins. NCBI pBLAST confirms these hypothetical protein hits for DrManhattan and Adolin (100% coverage, 92% alignment, and e-value of 2.2e-68). There were no hits from CDD and no informative hits from HHpred (all the e-values were well above a reasonable threshold of 10e-3). So, we can only call the function as NKF. /note=Transmembrane domains: DeepTMHMM predicts no TMDs. So, this is not a transmembrane protein. /note=Secondary Annotator Name: Yang, Emma /note=Secondary Annotator QC: Even though it’s not one of the points, I would still mention the z-score in the SD score section since it gives additional evidence for your start site. For the gap/overlap, I would also check other phages to see if the gap is conserved as extra evidence. Other than these changes, I agree with both the location and function call for this gene. CDS 23153 - 23398 /gene="32" /product="gp32" /function="NrdH-like glutaredoxin" /locus tag="MissSwiss_32" /note=Original Glimmer call @bp 23153 has strength 17.61; Genemark calls start at 23153 /note=SSC: 23153-23398 CP: yes SCS: both ST: SS BLAST-Start: [NrdH-like glutaredoxin [Arthrobacter phage Asa16]],,NCBI, q1:s2 93.8272% 2.81484E-42 GAP: 4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.104, -5.495956147230759, yes F: NrdH-like glutaredoxin SIF-BLAST: ,,[NrdH-like glutaredoxin [Arthrobacter phage Asa16]],,UAJ15395,83.5294,2.81484E-42 SIF-HHPRED: NrdH-redoxin; NRDH, THIOREDOXIN, GLUTAREDOXIN, REDOX PROTEIN, DOMAIN SWAPPING, ELECTRON TRANSPORT; 2.69A {Corynebacterium ammoniagenes} SCOP: c.47.1.1,,,1R7H_A,91.358,99.6 SIF-Syn: NrdH-like glutaredoxin. Downstream is the metallophosphoesterase protein and very good upstream synteny which is also seen in the phage Adolin. Upstream the genes share the same function in the same order. This gene shares the same pham and cluster as MissSwiss. /note=Primary Annotator Name: Santaolaya, Cristal /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 23153. The start codon is ATG. /note=Coding Potential: Host-Trained GeneMark and Self-Trained GeneMark both evidently had good coding potential. In the ORF, there is coding potential on the forward strand therefore suggesting that this is a forward gene. /note=SD (Final) Score: -5.496. This is the best final score given on PECAAN for this gene. It has a Z-score of 2.104. /note=Gap/overlap: There is a length of 246 and a gap of 4bp. This is an acceptable gap as it is no more than 50 bp. Additionally the gap is too small for the addition of another gene. /note=Phamerator:pham: 117332. 10/09/2023. Another phage within the same cluster (AZ), Adolin, has the same length of 246 bp. /note=Starterator: There are 972 members of the pham 117332 as given on Starterator. Start site number 112 was the most called in 189 out of 906 total non-draft genes but MissSwiss did not contain this start site. Its start site is 108 and there are 12 manual annotations of this start site. The start is called at 23153 in MissSwiss which is in agreement with the predicted start sites on both Glimmer and GeneMark. /note=Location call: This gene is real and does not need to be deleted as it has sufficient coding potential. Its start site is located at 23153. /note=Function call: NrdH-like glutaredoxin. Phagesdb information shows that the majority of phages within the same pham and cluster (AZ) call the function as being NrdH-like glutaredoxin. Phagesdb BLAST gave 2 significant hits of 6e-34 for Asa16 and an e-value of 5e-34 for Elezi. NCBI pBLAST provided a deoxynucleoside monophosphate kinase hit for Asa16 with 93% coverage and alignment of 83%. Its e-value was 2.8e-42. CDD gave a good hit with 88% coverage and an e-value of 1.4e-22. HHpred had two significant hits for NrdH-like glutaredoxin. Hit 1R7H_Ahas an e-value of 3.6e-12 and 1H75_A has an e-value of 4.8e-12. /note=Transmembrane domains: DeepTMHMM did not predict any TMD’s so this cannot be a membrane protein. /note=Secondary Annotator Name: Yang, Emma /note=Secondary Annotator QC: For the synteny box, I would also include the upstream protein synteny as well as the downstream one you put. Other than that, I agree with both the location and function call for this gene. CDS 23388 - 23591 /gene="33" /product="gp33" /function="hypothetical protein" /locus tag="MissSwiss_33" /note=Original Glimmer call @bp 23388 has strength 8.97; Genemark calls start at 23388 /note=SSC: 23388-23591 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ADOLIN_33 [Arthrobacter phage Adolin]],,NCBI, q3:s2 97.0149% 4.55577E-37 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.889, -2.827683592113848, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ADOLIN_33 [Arthrobacter phage Adolin]],,QHB36615,96.9697,4.55577E-37 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santos, Elisha Anne /note=Auto-annotation: Glimmer and GeneMark both call the start at 23388. This start site has a starting codon of ATG, which is a common start codon. Thus, this agreement provides evidence that the start codon is likely to be correct. /note=Coding Potential: The gene has reasonable coding potential in the ORF. The start site covers all of the coding potential. There is coding potential in both forward and reverse strands. However, the small spike of potential in the reverse does not indicate the presence of an orientation switch or new gene since there were no stop sites present. /note=SD (Final) Score: -2.828. This is the only candidate with the best final score present on PECAAN. The z-score is 2.889. /note=Gap/overlap: The gap/ overlap is -11bps. This is still a relatively small overlap and is the best and only option on PECAAN. /note=Phamerator: Pham 5867. Date 10/09/2023. It is conserved; found in DrManhattan and Adolin. Overall, there are 11 members in the same pham and same cluster (AZ). /note=Starterator: Start site 6 in Starterator was manually annotated in 8/8 non-draft genes in this pham. The start site is called 100% for the time when present in a gene. Start is 23388 in MissSwiss. This agrees with GeneMark and Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 23388 bp. /note=Function call: No Known Function. Hits on Blastp had e-values less than e-10, coverage of greater than 50%, and had a function call of no known function (NKF) from phages in the same cluster and pham as MissSwiss. For example, phages like Adolin, DrManhattan, and Phives have synteny with this gene and also are NKF. There are no hits on CDD. On the other hand, there were multiple hits on HHpred, however, in addition to CDD they were also not informative since the e-value was too high to be considered as evidence. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs; therefore, it is not a membrane protein. /note=Secondary Annotator Name: Givan, Susanna /note=Secondary Annotator QC: Read through the coding potential segment--there is a conflicting statement about coding potential only in forward strand and a statement about coding potential in both forward/reverse strand. For starterator, consider including the % that the start site is called when present. Your function call was beautifully done. Remember to also fill out the synteny box! CDS 23588 - 24193 /gene="34" /product="gp34" /function="metallophosphoesterase" /locus tag="MissSwiss_34" /note=Original Glimmer call @bp 23588 has strength 11.07; Genemark calls start at 23588 /note=SSC: 23588-24193 CP: yes SCS: both ST: SS BLAST-Start: [metallophosphoesterase [Arthrobacter phage Adolin]],,NCBI, q1:s1 97.0149% 3.93036E-127 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.137, -4.387988835334809, no F: metallophosphoesterase SIF-BLAST: ,,[metallophosphoesterase [Arthrobacter phage Adolin]],,QHB36616,94.3878,3.93036E-127 SIF-HHPRED: d.159.1.8 (A:1-186) Hypothetical protein aq_1666 {Aquifex aeolicus [TaxId: 63363]} | CLASS: Alpha and beta proteins (a+b), FOLD: Metallo-dependent phosphatases, SUPFAM: Metallo-dependent phosphatases, FAM: Hypothetical protein aq_1666,,,SCOP_d1xm7a1,89.0547,99.8 SIF-Syn: The gene is a metallophophoesterase, with the upstream gene as with no known function and the downstream gene as a holliday junction resolvase, just like in the phage Adolin. This section of is conserved in Adolin. /note=Primary Annotator Name: Yang, Emma /note=Auto-annotation: Both Glimmer and GeneMark call the gene and note that 23,588 is the start site. /note=Coding Potential: Coding potential is seen in both Host and Self-Trained GeneMark in the forward frame ORF2, suggesting that it is a forward gene. Furthermore, the start site covers all the coding potential as seen by the Self and Host GeneMark maps. /note=SD (Final) Score: The final score is -4.388, which is the best option compared to other start sites because it is the least negative, and the z-value is 2.137, which is a good z-value because it’s greater than 2 and also the highest z-value. Furthermore, the start codon is GTG, which is one of the more common start codons. /note=Gap/overlap: The overlap is only 4bp, which is a small overlap and also indicative of the possibility of this gene being part of an operon, which explains the necessity of the operon for the ribosome to continue translation. The length of the gene (606bp) is acceptable because it exceeded 120bp, and even though it is not the LORF, it has the smallest overlap. /note=Phamerator: This gene is part of pham 85087 according to PhagesDB as of October 7, 2023. It is conserved in phages Adolin and DrManhattan of the same cluster as MissSwiss as well as 11 other phages in Cluster AZ (139 total members in the pham). The function call for the gene in other phages is a metallophosphoesterase and it is consistent on PECAAN as well. /note=Starterator: Out of the 123 non-draft members in this Pham, 32 of them call the start site at start site 56, including MissSwiss which calls the auto-annotated start site 56 at 23,588 with 32 manual annotations. This correlates with the start site for MissSwiss denoted by GeneMark and Glimmer, so Staterator is informative for the location call. Furthermore, 23,588 is the most likely start site because it has the most favorable RBS final score, the highest z-value, and the overlap is small and indicative of the presence of an operon. /note=Location call: This evidence all points to the presence of a real gene with a start site confirmed by both Glimmer and GeneMark at nucleotide 23,588. /note=Function call: Multiple PhagesDB BLAST had hits with the suggested function of a metallophosphoesterase; these results had small e-values of 1 x 10-106 and alignments of 89% and 88%; the NCBI BLAST results had the small e-value of 4 x 10-127 and 3 x 10-126 with 97% coverage and 89.74% and 88.21% alignment. HHpred had hits that aligned with SCOP_d1xm7a1 hypothetical metallo-dependent phosphatase with 99.79% probability, 89.05% coverage, and an e-value of 4.3 x 10–18 and 2AHD_A phosphodiesterase with 99.6% probability, 82.09% coverage, and an e-value of 6.5 x 10-14. There was one relevant hit on CDD with an e-value of 8.57 x 10-32 that listed the function as a COG4186 Calcinerium phosphoesterase superfamily protein. The Cluster AZ harmonization made this call for function that was supported by the literature. Even though it doesn`t have the HEXXH motif determined by SEA-PHAGES, it has other motifs specific to metallophosphoesterases as seen in the literature, such as GDX. /note=Transmembrane domains: There are no TMDs found by DeepTMHMM, so this gene is not a membrane protein. /note=Secondary Annotator Name: White, Logan /note=Secondary Annotator QC: For the overlap, I would just say a 4 bp overlap, or a -4 bp gap. Also there are not hits on DeepTMHMM. CDD evidence boxes could also be checked off for function. For the function, the SEA-PHAGES approved functions list says that to be a metallophosphoesterase there must be a HEXXH motif but I do not see it in the sequence, so it may be a phosphoesterase, but there many HHPred hits for metallo-dependent phosphatase. I need to double check w/ professor unless you already have then jk. Here is a link to a recent forum: https://seaphages.org/forums/topic/5557/. I agree with the location call. CDS 24190 - 24540 /gene="35" /product="gp35" /function="Holliday junction resolvase" /locus tag="MissSwiss_35" /note=Original Glimmer call @bp 24190 has strength 17.21; Genemark calls start at 24190 /note=SSC: 24190-24540 CP: yes SCS: both ST: SS BLAST-Start: [holliday junction resolvase [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 6.44161E-78 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.056, -4.635604911695725, yes F: Holliday junction resolvase SIF-BLAST: ,,[holliday junction resolvase [Arthrobacter phage DrManhattan] ],,YP_009815378,99.1379,6.44161E-78 SIF-HHPRED: Holliday junction resolvase; archeal holliday junction resolvase helicase DNA binding enzyme phage 15-6 thermus thermophilus, RECOMBINATION; HET: SO4, MSE; 2.5A {Thermus thermophilus phage 15-6},,,7BGS_B,93.1034,99.7 SIF-Syn: The gene is a holiday junction resolves, with the upstream gene as with a metallophosphoesterase and the downstream gene as a no known function just like in the phage Adumb2043. /note=Primary Annotator Name: Givan, Susanna /note=Auto-annotation:Both Glimmer and Genemark recognize this gene and they agree on a start site of 24190. The auto-annotated start site has a start codon of “ATG,” which is a common start site and is the start codon with a high probability of being called. Since both Glimmer and Genemark call the start site of 24190, they both agree on the start codon being “ATG”. This indicates “ATG” is likely the correct start codon and leads strong credibility to the start site being 24190 due to ATG being a very high probability start codon. /note=Coding Potential:The coding potential for this ORF is forward, which indicates this is a forward gene. The host-trained Genemark shows strong coding potential present and this gene has a coding potential called by both Genemark and Glimmer. The ORF which covers all of the potential is ORF #1 and the start site of 24190 covers all of the coding potential that the gene of interest is in. In fact, there is an up-hash directly near where the coding potential began and a downhash directly near the 24540 stop site. There is another potential start site in the host-trained Genemark however, it is a lot farther away from where the coding potential begins; the uphash is near the 24000. /note=SD (Final) Score:The final score for this gene is a -4.636 and the Z-score is a 2.056. Both the Z-score and the final score are the highest respective scores listed in PECAAN. /note=Gap/overlap: There is an overlap between the upstream gene, which ends at 20782 and the 20779 start site of the gene of 3 bps. This indicates there is a possibility that this gene is part of an operon. Furthermore, there is a gap of 146 bps between the end of this gene and the 24686 start site of the next gene. Although this gap is pretty large, this section of the genome with an overlap of 3 bps with the upstream gene and then a larger overlap with the downstream gene is conserved in other members of the AZ1 subcluster, specifically, it displays perfect synteny with Adolin, and Ascela. This indicates that the overall gene structure is present among other members of the subcluster. Additionally, the start site of 24190 would result in a gene length of 350. This is a perfectly acceptable gene length. /note=Phamerator:Gene was found in Pham 116215 as of 10/11/2023. This Pham consists of 160 members, 31 of which are drafts. The common function call for this gene conserved among Pham members appears to be a holliday junction resolvase. /note=Starterator: Start site #53 was called via manual annotation in 37 out of 129 manual annotations and was found in 55 out of the 160 total annotations. Start site #53 in Starterator corresponds to the start site 24190 for this gene and when it was present, it was called 100% of the time. Although this start site is not the most conserved in the pham of the annotated genes, it is called 100% of the time it is present. /note=Location call: Given all the evidence, this is a real gene and the start site begins at 24190. This is consistent with the fact that Glimmer/ Genemark both agree on the auto-annotated start site of 24540. In Genemark, 24540 also covers all of the coding potential and it is the strongest Z-score and Final Score. /note=Function call: In the phages db run, the top two were both other AZ 1 subcluster members (DrManhatten, Adolin) who shared the function of holliday junction resolvase. The e-values on these subcluster members were both 2e^-62, which makes this compelling data for this gene’s function call. The function of holliday junction resolvase also correlates with the NCBI Blast top two hits that both had holliday junction resolvase function listed with 97% and 86% identity, respectively. HHpred also listed the top two hits as having a holliday junction resolvase function. The first hit had a probability of 99.7 and e-value of 1*7e^-15. The second hit had a probability of 99.3% and 8.3e^-11, which further adds validity to this claim put forth that this gene has a functional call of holliday junction resolvase due to the high probability, low e-values and conservation of the holliday junction resolvase function throughout members of the pham. CDD did not have any relevant hits. /note=Transmembrane domains:DeepTMHMM didn’t have any TMDs listed, thus, it is not a membrane protein. /note=Secondary Annotator Name: Santaolaya, Cristal /note=Secondary Annotator QC: I have QC`ed this annoation and agree with the location and function calls made by the primary annotator. Ensure to add information to the synteny box since a function call was made, besides that everything looks good. Great job! CDS complement (24537 - 24686) /gene="36" /product="gp36" /function="hypothetical protein" /locus tag="MissSwiss_36" /note=Original Glimmer call @bp 24686 has strength 5.02; Genemark calls start at 24686 /note=SSC: 24686-24537 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp36 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 5.485E-19 GAP: 192 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.77, -5.227163949899647, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp36 [Arthrobacter phage DrManhattan] ],,YP_009815379,78.9474,5.485E-19 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: White, Logan /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 24686. /note=Coding Potential: There is coding potential in both GeneMark self and host. The chosen start site does cover all of the coding potential. /note=SD (Final) Score: The best final score is -5.227 with a z-score of 1.77. These are not the best scores on PECAAN. /note=Gap/overlap: The gap is 192 bp, which is significant but conserved in Adolin and DrManhattan. /note=Phamerator: Pham 116533. Date 10/09/23. This pham is conserved among several members of the cluster including Adolin and DrManhattan. /note=Starterator: Start site 18 is not the most manually annotated start site. Start site 18 is manually annotated in 9/27 non-draft genomes, including Adolin and DrManhattan. Start site 18 is 24686 in MissSwiss. /note=Location call: This gene is real with a start site at 24686. This is not the LORF and has a length of 150 bp. The LORF shows a start site at 24716 but does not align with Glimmer, GeneMark, or Starterator. /note=Function call: Unknown Function. There are multiple hits in Phages DB with e-values of < 8e-15 and multiple hits on NCBI Blast ( > 87% coverage, > 59% identity, e-values <7.7e-16) for unknown function. HHPred and CDD are not informative as there were no hits in CDD and HHPred had high e-values (e-values > 20). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, not a membrane protein. /note=Secondary Annotator Name: Esherick, Sophie /note=Secondary Annotator QC: I have QC`ed this gene and agree with the location and functional call. CDS 24879 - 27389 /gene="37" /product="gp37" /function="DNA primase/helicase" /locus tag="MissSwiss_37" /note=Original Glimmer call @bp 24879 has strength 11.43; Genemark calls start at 24879 /note=SSC: 24879-27389 CP: yes SCS: both ST: SS BLAST-Start: [DNA primase [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 0.0 GAP: 192 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.044, -2.442961286954254, yes F: DNA primase/helicase SIF-BLAST: ,,[DNA primase [Arthrobacter phage DrManhattan] ],,YP_009815380,96.7742,0.0 SIF-HHPRED: Primase D5; DNA helicase, D5_N domain, DUF5906 domain, Pox_D5 domain, SF3 helicase, VIRAL PROTEIN;{Vaccinia virus Copenhagen},,,8APM_C,50.3589,100.0 SIF-Syn: Phages Adolin and DrManhattan have synteny with MissSwiss with a metalllophosphoesterase upstream and a DNA ligase gene downstream. /note=Primary Annotator Name: Esherick, Sophie /note=Auto-annotation: Genemark and Glimmer. Both call the start at 24879. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The complementary ORF does not have any coding potential peaks or atypical coding potential. /note=SD (Final) Score: -2.443. it is the best final score on PECAAN. The z-score is 3.044 which is the best z-score out of all start sites and is greater than 2. /note=Gap/overlap: 192 bp.This is very large, however it is conserved among the other AZ1 phages that display synteny with MissSwiss: Adolin (192 bp gap) and VResidence (194 bp gap). /note=Phamerator: 10/6/2023 pham: 85082. It is conserved in Adolin, DrManhattan and VResidence (all AZ1 phages). /note=Starterator: 10/6/2023 Start site in Starterator (start 49) was found in 78 of 142 ( 54.9% ) of genes in pham. This start has the most manual annotations, and when this start site is present, it is called 100.0% of time when present. It has been manually annotated 57 of 117. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 24879. /note=Function call: DNA primase/helicase. According to SEA-PHAGES approved fucntion list, the requirement for this function is to have hits for both parts (primase and helicase). This requirement is met by various hits. The top two phagesdb BLAST hits have the function of DNA primase and DNA helicase (E-values of 0), and 2 top NCBI BLAST hits also have the function of DNA primase and DNA helicase. (100% coverage, 84%+ identity, and E-values of 0). HHpred had a significant hits for DNA primase and DNA helicase (100% probability, E-values of 2.1 e-32 and 3 e-31 with 43%+ coverage) . CDD had significant hits for DNA primase and D 5 N terminal domains with e-values of 0 and 2.27521e-30. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. DeepTMHMM also did not predict any TMDs. /note=Secondary Annotator Name: Santaolaya, Cristal /note=Secondary Annotator QC: I have QC`ed this annoation and agree with the location and function calls made by the primary annotator. CDS 27401 - 27859 /gene="38" /product="gp38" /function="HNH endonuclease" /locus tag="MissSwiss_38" /note=Original Glimmer call @bp 27620 has strength 10.34; Genemark calls start at 27392 /note=SSC: 27401-27859 CP: no SCS: both-cs ST: NI BLAST-Start: [hypothetical protein C1H84_16205 [Glutamicibacter soli]],,NCBI, q2:s12 96.0526% 1.03825E-45 GAP: 11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.988, -2.6217801767860207, yes F: HNH endonuclease SIF-BLAST: ,,[hypothetical protein C1H84_16205 [Glutamicibacter soli]],,RBL99226,56.6474,1.03825E-45 SIF-HHPRED: restriction endonuclease PacI; HNH restriction endonuclease, beta-beta-alpha-metal active site, 8 base-pair rare cutter, HYDROLASE-DNA complex; HET: SO4; 1.92A {Pseudomonas alcaligenes},,,3M7K_A,96.7105,99.4 SIF-Syn: No synteny with any phages in AZ /note=Primary Annotator Name: Peri, Jayasree /note= /note=Auto-annotation: Glimmer calls the start site as 27620 (start codon ATG), and GeneMark calls the start site as 27392 (start codon GTG). /note= /note=Coding Potential: Coding potential in this ORF looks to be on the forward strand only, confirming that this is a forward gene. Both Self and Host GeneMark indicate coding potential in the ORF starting from around 27550, but there is no coding potential from 27300-27600. However, for both Self and Host GeneMark, the coding potential does not go through until the end of ORF. /note= /note=SD (Final) Score: The best SD (Final) Score is -2.622, which corresponds to a start site of 27401. The auto-annotated SD (Final) Score is -5.290, which corresponds to the start site of 27620; this score is much more negative and therefore not the best SD (Final) Score. The SD (Final) Score for start site 27557 (a start site that covers the coding potential in the ORF) is -5.179, which is a bit better than that of the auto-annotated start site. /note= /note=Gap/overlap: The most reasonable gap, which is 11bp, corresponds to the start site of 27401; this gap is below the 50bp gap threshold. The auto-annotated gap, however, is 230bp, which corresponds to the start site of 27620; this is well beyond the 50bp gap threshold and would indicate the necessity to possibly insert a new gene. The gap for start site 27557 is 167bp, which is more reasonable than that of the auto-annotated start site. /note= /note=Phamerator: The pham number as of Ocober 11, 2023 is 117988. The gene is conserved in phages Fryberger(in the same Cluster DP) and Ziko (also in the same Cluster DP).  /note= /note=Starterator: Start site 37 was the most annotated start site, with 5 out of 12 non-draft phages being called as having this start site. However, start site 28 at position 27620 was called as the most annotated start site for my gene. The start position based on Starterator (27620) agrees with my Glimmer, but not GeneMark. Start site 19 at position 27557 agrees with my GeneMark start site call. /note= /note=Location call: Based on the evidence, I do think this is a real gene, but I think that the start site is different than both of the auto-annotated start sites called by Glimmer and GeneMark. Because there is good potential for a part of the ORF and many non-draft phages in the gene`s pham, I believe this is a real gene. Based on comparing the overall length of this gene in other non-draft phages and taking into account the coding potential, the best start site for this gene is 27557. /note= /note=Function call: Endonuclease VII. Many PhagesDB Blast hits indicate a function of endonuclease VII for this gene (e-value < 10^-15); some indicate a function of some other type of endonuclease, but the majority say endonuclease VII. Many NCBI Blast hits indicate a function of endonuclease VII (≥80% coverage, most are around 50% identity, and e-values < 10^-17); interestingly, the same phage that indicated a function of HNH endonuclease in PhagesDB Blast (phage Crewmate) was indicated for a function of endonuclease VII in NCBI Blast. The top hit in HHPred indicates a function of endonuclease VII (99.6% probability and e-value of 3.4 x 10^-15); there is also a hit for HNH endonuclease, but the e-value is much larger (about 10^-6). The only hit in CDD indicates a function of endonuclease VII as well (73% coverage and e-value of 4.3 x 10^-26). Given all this evidence, along with the fact that the protein sequence for the gene does not have any H-N-H repeats over a 30-amino acid sequence, I`ve chosen to call the function for this gene as endonuclease VII. /note= /note=Transmembrane domains: DeepTMHMM doesn’t predict the presence of any transmembrane domains. /note= /note=Secondary Annotator Name: Logan White /note= /note=Secondary Annotator QC: I would check evidence boxes on NCBI BLAST and another good hit has appeared on CDD as well for the function. HHPred also has more good hits for endonucelase VII. I would also uncheck the evidence box on Phagesdb BLAST for Crewmate since it says its an HNH endonuclease. I agree with the location and function call, notes are great and descriptive. CDS 27861 - 27974 /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="MissSwiss_39" /note=Original Glimmer call @bp 27861 has strength 14.52; Genemark calls start at 27861 /note=SSC: 27861-27974 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE13_gp38 [Arthrobacter phage Elezi] ],,NCBI, q4:s4 91.8919% 1.22816E-13 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.291, -1.993391246735709, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE13_gp38 [Arthrobacter phage Elezi] ],,YP_010678016,89.1892,1.22816E-13 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shera, Simer /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 27861. /note=Coding Potential: ORF only shows coding potential in the forward strand, implying this is a forward gene. Coding potential is found both in GenMark Self and Host and the entire coding region is covered. /note=SD (Final) Score: Best score found on PECAAN: -1.993. The z-score for this final value was 3.291 which is good because it is greater than 2. /note=Gap/overlap: 1 base pair gap, this is a reasonable gap. The gap is conserved in phage Adolin, which is also part of the AZ1 cluster. /note=Phamerator: 85720. Date 10/10/2023. It is conserved; All 36 non-draft members of the Pham are from cluster AZ. /note=Starterator: Start Site 9 is the most called for manually (14/36 non-draft genes). MissSwiss calls for Start Site 9 which is 27861. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the likely start site is 27861. /note=Function call: Function unknown. The top three non-draft genes BLAST hits have unknown function (E-value<10^-14) and 3 out of top NCBI BLAST hits have a hypothetical protein function with E-value<10^-11, 91% coverage and >79% identity. HHPred has no relevant hits because all E-values>0.79. CDD has no relevant hits. /note=Transmembrane domains: No evidence of a transmembrane protein as DeepTMHMM found 0 TMHs. /note=Secondary Annotator Name: Qin, Kaley /note=Secondary Annotator QC: I have QC’d this gene and agree with the primary annotator. For the coding potential notes, I would state whether or not all coding potential is within the called start site. For the function call notes, I would also include a sentence about how there were no CDD hits. For the transmembrane domains, I would include what you mean by no evidence (e.g. “DeepTMHMM found 0 TMHs.) CDS 27977 - 28162 /gene="40" /product="gp40" /function="hypothetical protein" /locus tag="MissSwiss_40" /note=Original Glimmer call @bp 27977 has strength 7.4; Genemark calls start at 27977 /note=SSC: 27977-28162 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp39 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 80.3279% 4.22797E-15 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.647, -3.2669815101762634, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp39 [Arthrobacter phage DrManhattan] ],,YP_009815382,77.551,4.22797E-15 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qin, Kaley /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 27977 with a start codon of GTG which is common. /note=Coding Potential: Both host-trained and self-trained GeneMark agree that there is reasonable coding potential within the ORF and include all of the coding potential within the chosen start site. /note=SD (Final) Score: -3.267. This is the best final score in PECAAN. /note=Gap/overlap: The gap is 2 which is reasonable. It is not the smallest gap on PECAAN, but makes sense as opposed to the smallest gap of -52. This gap is conserved in Adolin and DrManhattan. /note=Phamerator: pham: 4064. Date 10/06/2023. All of the members of this pham are in cluster AZ, including Adolin and DrManhattan, and have unknown functions. /note=Starterator: Start site 6 in Starterator is the most called start site and manually annotated in 13 of 18 non-draft genes in the pham. Start site 8 correlates to start site 27977 in MissSwiss. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 27977. /note=Function call: NKF. On phagesDB pBLAST, the top two hits have significant e-values < 10e-6 (Adolin’s e-value is 1e-14 and VResidence’s e-value is 8e-15) for unknown function proteins. NCBI pBLAST confirms these hypothetical protein hits for DrManhattan and Adolin (80% coverage, 78% alignment, and e-value of 4.2e-15). There were no hits from CDD and no informative hits from HHpred (all the e-values were well above a reasonable threshold of 10e-3). So, we can only call the function as NKF. /note=Transmembrane domains: DeepTMHMM predicts no TMDs. So, this is not a transmembrane protein. /note=Secondary Annotator Name: Givan, Susanna /note=Secondary Annotator QC: Perhaps comment on the commonality of GTG start codon usage or why it is still a good candidate for start site given the unusualness of a GTG start codon? Remember to include the final score under the SD ( final) score. Do the phams in the same phamerator have a common function? Also remember to fill out the synteny box below. CDS 28328 - 30193 /gene="41" /product="gp41" /function="DNA polymerase I" /locus tag="MissSwiss_41" /note=Original Glimmer call @bp 28328 has strength 15.1; Genemark calls start at 28328 /note=SSC: 28328-30193 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase I [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 100.0% 0.0 GAP: 165 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.291, -1.993391246735709, yes F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Arthrobacter phage Adumb2043] ],,YP_010677948,97.2625,0.0 SIF-HHPRED: Apicoplast DNA polymerase; DNA polymerase, exonulease, apicoplast, Plasmodium falciparum, REPLICATION, TRANSFERASE; HET: PEG, EDO; 2.5A {Plasmodium falciparum (isolate 3D7)},,,7SXQ_B,96.9404,100.0 SIF-Syn: Upstream and downstream of this gene there are NKF`s however the structure is conserved as can be seen in the phage Adolin. Adolin shares synteny with MissSwiss and calls the function as a DNA polymerase i. /note=Primary Annotator Name: Santaolaya, Cristal /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 28328 and the start codon is TTG. /note=Coding Potential: There is a good amount of coding potential that covers a good span given by Host-Trained GeneMark and Self-Trained GeneMark. There is no alternate coding potential found and the coding potential is only on the forward strand hence suggesting that this is a forward gene. There is coding potential on the reverse strand however the synteny in pham maps shows that there is no addition of any genes. /note=SD (Final) Score: -1.993. This is the best score given on PECAAN and it has a Z-score of 3.291. /note=Gap/overlap: This gene has a length of 1866 and contains a gap of 165 bp which is the smallest gap in the provided candidates. This gap is conserved in phages within the same pham (Adolin and Nitro). /note=Phamerator: 117330. 10/09/2023. This pham has 1621 other members and is conserved in phams Adumb2043 and Amyev. /note=Starterator: Start site number 227 in Staterator was called the most often in published annotations in 820 of the 1507 non-draft genes but this gene does not have the most annotated start. This gene`s start site is at 225 and has 46 other manually annotated phages with the same start site. Start site 225 can be found at 28328 in MissSwiss. The evidence that has been collected and analyzed agrees with the predicted start sites given by both Glimmer and GeneMark. /note=Location call: Based on the guidelines and information evaluated, the gene is real and does not need to be deleted. Furthermore, the gene contains coding potential and the start site is 28328. /note=Function call: DNA Polymerase I. The information on Phagesdb states that the majority of the members within this pham have the function of DNA Polymerase I. CDD was run and gave 2 hits for DNA polymerase I with e-values of 0. The coverage of hit COG0749 had a coverage of 95% and hit PRK05755 had a coverage of 95% as well. /note=HHpred only gave multiple significant hits with an e-value of 0 and 96% coverage. It had the function of a phage tail assembly chaperone protein. From the Phagesdb BLAST, phages Nitro (e-value of 0) and Adumb2043 (e-value of 0) which are part of the same pham and cluster as MissSwiss called the function as a DNA polymerase I. /note=The NCBI BLAST hits obtained had 100% coverage, 94% identity and an e-value of 0 with the call made for the DNA polymerase i in phages Adumb2043 and Asa16. /note=Transmembrane domains: TMHMM did not predict any TMDs. Since it did not detect any, this cannot be a membrane protein. /note=Secondary Annotator Name: Peri, Jayasree /note=Secondary Annotator QC: I agree with this location and function call. CDS 30190 - 30375 /gene="42" /product="gp42" /function="hypothetical protein" /locus tag="MissSwiss_42" /note=Original Glimmer call @bp 30220 has strength 2.9; Genemark calls start at 30220 /note=SSC: 30190-30375 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein PQD88_gp41 [Arthrobacter phage Amyev] ],,NCBI, q1:s1 98.3607% 1.33641E-31 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.916, -4.863117164298828, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD88_gp41 [Arthrobacter phage Amyev] ],,YP_010677744,95.082,1.33641E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santos, Elisha Anne /note=Auto-annotation: Glimmer and GeneMark both call the start at 30220. However, this start site has the most negative final score, a z-score< 2, does not have the longest ORF, and most importantly, does not have any manual annotations in Starterator. Thus, the next best starting site is at start 30190. /note=Coding Potential: The gene has reasonable coding potential in the ORF. The start site covers all of the coding potential and there is only coding potential in the forward direction. /note=SD (Final) Score: -4.863. This is the best final score on PECAAN. This does not have the best z-score, however, it is very close to 2. Yet, since it is an orpham, this data is less informational. /note=Gap/overlap: The gap is -4 bps. This does create the longest OR and suggests this gene is an operon. There is no coding potential in the gap to indicate the presence of another gene. Since it is an orphan, the final score and z-score become less relevant. /note=Phamerator: Pham 85757. Date 10/09/2023. It is conserved; found in DrManhattan and Adolin. Overall, there are 51 members in the same pham and same cluster (AZ). These phages do not have a known function in the phams database. /note=Starterator: The original startsite listed was start site 8, but because there are no manual annotations, this start site is less likely. Start site 4 which corresponds to the changed start site of 30190 in Starterator has 23 manual annotations. Since this start calls this gene an orpham, the starterator data is NA. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is a 30190. /note=Function call: No Known Function. Based on hits from NCBI pBlast, the top 3 hits had a high coverage of >70%, high identity of low e-values of 60% identity, e-values <5.7e-103) for DNA binding protein. CDD has hits ( >68% coverage, e-values < 5.3e-7) and HHPred ( >96.7% coverage, 100% probability e-values< 1.5e-24) for an RNA polymerase sigma factor but the SEA-PHAGES Approved Functions List says to call it a DNA binding protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, not a membrane protein. /note=Secondary Annotator Name: Shera, Simer /note=Secondary Annotator QC: Since Starterator does not call for the suggested start site, make sure to change the selection under starterator from SS to NI (Not Informative). CDS 32050 - 32349 /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="MissSwiss_46" /note=Original Glimmer call @bp 32050 has strength 18.09; Genemark calls start at 32050 /note=SSC: 32050-32349 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE12_gp43 [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 100.0% 1.08878E-55 GAP: 74 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.558, -3.8026703187234707, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE12_gp43 [Arthrobacter phage Adumb2043] ],,YP_010677953,93.9394,1.08878E-55 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Esherick, Sophie /note=Auto-annotation: Glimmer and GeneMark. Both call start at 32050. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The complementary ORF does not have any coding potential peaks or atypical coding potential. /note=SD (Final) Score: -3.803. It is the best final score on PECAAN. The z-score is 2.558 which is the best z-score on PECAAN and it is greater than 2. /note=Gap/overlap: 74 bp gap. This is a bit large, however it is conserved among other AZ1 phages. DrManhattan had a 75 bp gap and Adolin had a 74 bp gap. /note=Phamerator: 10/6/2023 pham: 4404. It is conserved in Adolin, DrManhattan and VResidence (all AZ1 phages). /note=Starterator: 10/6/2023. Start site in Starterator (start 11) was found in 24 of 24 ( 100% ) of genes in pham. This start has the most manual annotations, and when this start site is present, it is called 100.0% of time when present. It has been manually annotated 16 of 16. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 32050. /note=Function call: NKF. Various hits across phagesdb BLAST, NCBI BLAST with good e-values (1 e-43, 1.08878e-55) have hits of unknown function or hypothetical protein. HHpred had no significant hits. CDD had no significant hits. /note=Transmembrane domains: either TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. DeepTMHMM also did not predict any TMDs. /note=Secondary Annotator Name: Santaolaya, Cristal /note=Secondary Annotator QC: I have QC`ed this annoation and agree with the location and function calls made by the primary annotator. CDS 32479 - 33075 /gene="47" /product="gp47" /function="SprT-like protease" /locus tag="MissSwiss_47" /note=Original Glimmer call @bp 32479 has strength 14.39; Genemark calls start at 32479 /note=SSC: 32479-33075 CP: yes SCS: both ST: SS BLAST-Start: [SprT-like protease [Arthrobacter phage Lizalica] ],,NCBI, q1:s1 100.0% 9.79427E-143 GAP: 129 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.947, -2.707694805492614, yes F: SprT-like protease SIF-BLAST: ,,[SprT-like protease [Arthrobacter phage Lizalica] ],,YP_010677611,100.0,9.79427E-143 SIF-HHPRED: SprT-like domain-containing protein Spartan; DPC repair protease, DNA BINDING PROTEIN; HET: FLC, MLZ, ADP; 1.5A {Homo sapiens},,,6MDW_A,51.5152,99.6 SIF-Syn: SrpT-like protease, upstream & downstream genes are NKF, as seen in Adolin /note=Primary Annotator Name: Peri, Jayasree /note= /note=Auto-annotation: Glimmer and GeneMark both call the start site at 32479. ATG is the start codon that is called. /note= /note=Coding Potential: Coding potential in this ORF looks to be on the forward strand only, confirming that this is a forward gene; both Self and Host GeneMark indicate that there is coding potential almost all throughout the ORF. /note= /note=SD (Final) Score: The SD (Final) Score is -2.708, which is the best final score on PECAAN (the least negative). /note= /note=Gap/overlap: There is a gap of 129bp, which is above the 50bp gap threshold. This indicates that there is excessive space in the genome around this gene that raises concern. /note= /note=Phamerator: The pham number as of October 11, 2023 is 1210. The gene is conserved in phages Adolin (in the same Cluster AZ) and Adumb2043 (also in the same Cluster AZ).  /note= /note=Starterator: Start site 46 was the most annotated start site, with 43 out of 82 non-draft phages being called as having this start site. This start site 46 at position 32749 was also called as the most annotated start site for my gene. The start position based on Starterator (32749) agrees with my Glimmer and GeneMark. /note= /note=Location call: Based on the evidence, this is a real gene, and the most likely start site is 32749; I`m calling this despite the excessive gap. /note= /note=Function call: SprT-like protease. All non-draft PhagesDB Blast hits indicate a function of SprT-like protease for this gene (e-value < 10^-40). Most NCBI Blast hits indicate a function of SprT-like protease. (≥90% coverage, most are +65% identity, e-values < 10^-46). One notable hit in HHPred predicted a function of SprT-like domain-containing protein (99.6% probability and e-value of 7.1 x 10^-15). The only hit in CDD was not notable because it said unknown function. /note= /note=Transmembrane domains: DeepTMHMM doesn’t predict the presence of any transmembrane domains. /note= /note=Secondary Annotator Name: Logan White /note= /note=Secondary Annotator QC: I agree with location and function call. CDD shows an OK hit for SprT-like protease (e-value: 0.0016) - could possibly be checked as evidence. I think HHPred might have updated, there is also another good hit in HHPred (e-value: 3.1e-10). CDS 33199 - 33531 /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="MissSwiss_48" /note=Original Glimmer call @bp 33199 has strength 19.15; Genemark calls start at 33199 /note=SSC: 33199-33531 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE16_gp47 [Arthrobacter phage Reedo] ],,NCBI, q3:s2 96.3636% 1.63179E-41 GAP: 123 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.947, -2.707694805492614, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE16_gp47 [Arthrobacter phage Reedo] ],,YP_010678230,80.0,1.63179E-41 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shera, Simer /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 33199. /note=Coding Potential: ORF only shows coding potential in the forward strand, implying this is a forward gene. Coding potential is found both in GenMark Self and Host. /note=SD (Final) Score: Best score found on PECAAN: -2.708 /note=Gap/overlap: 123 base pair gap, this is a reasonable gap. The gap is conserved in phage Adolin, which is also part of the AZ1 cluster. /note=Phamerator: 115459. Date 10/10/2023. It is conserved; 3 of the 9 non-draft members of the Pham are from cluster AZ. /note=Starterator: Start Site 8 is the most called for manually (7/9 non-draft genes). MissSwiss calls for Start Site 8 which is 33199. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the likely start site is 33199. /note=Function call: Function unknown. The top non-draft gene BLAST hits has an unknown function (E-value=10^-38) and 2 out of top NCBI BLAST hits have a hypothetical protein function with E-value<10^-40, 96% coverage and >72% identity. HHPred has no relevant hits because all E-values>8.1. /note=Transmembrane domains: No evidence of a transmembrane protein. /note=Secondary Annotator Name: White, Logan /note=Secondary Annotator QC: I agree with the location and function call. CDS 33605 - 33748 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="MissSwiss_49" /note=Original Glimmer call @bp 33605 has strength 12.77; Genemark calls start at 33605 /note=SSC: 33605-33748 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp48 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 3.97441E-15 GAP: 73 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.801, -5.023113690490372, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp48 [Arthrobacter phage DrManhattan] ],,YP_009815391,82.9787,3.97441E-15 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qin, Kaley /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 33605 with a start codon of ATG. /note=Coding Potential: The host-trained GeneMark predicts reasonable coding potential within the ORF and includes all the coding potential within the start site. The self-trained GeneMark predicts reasonable coding potential; however, the called start-site leaves out a small bit of coding potential. /note=SD (Final) Score: -5.023. This is the best final score in PECAAN. /note=Gap/overlap: The gap is 73. Although this is a large gap, it is the smallest gap of the start sites on PECAAN and retains the longest open reading frame. This gap is also conserved in Adolin and DrManhattan. /note=Phamerator: pham: 101869. Date 10/07/2023. All three members of this pham are in cluster AZ, including Adolin and DrManhattan. /note=Starterator: Start site 1 in Starterator is the most called start site and manually annotated in 2 of 2 non-draft genes in the pham. Start site 1 correlates to start site 33605 in MissSwiss. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 33605. /note=Function call: NKF. On phagesDB pBLAST, the top two hits have significant e-values < 10e-6 (Adolin’s e-value is 1e-12 and DrManhattan’s e-value is 1e-12) for unknown function proteins. NCBI pBLAST confirms these hypothetical protein hits for DrManhattan and Adolin (100% coverage, 82% alignment, and e-value of 3.97e-15). There were no hits from CDD and no informative hits from HHpred (all the e-values were well above a reasonable threshold of 10e-3). So, we can only call the function as NKF. /note=Transmembrane domains: DeepTMHMM predicts no TMDs. So, this is not a transmembrane protein. /note=Secondary Annotator Name: Santos, Elisha Anne /note=Secondary Annotator QC: I have QC’ed this gene and I agree with this annotation for location and function call. I do think you can possibly mention that gap is also okay because it is conserved in other AZ phages like Adolin or DrManhattan to make the case stronger. CDS 33782 - 34084 /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="MissSwiss_50" /note=Original Glimmer call @bp 33782 has strength 19.56; Genemark calls start at 33782 /note=SSC: 33782-34084 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD88_gp50 [Arthrobacter phage Amyev] ],,NCBI, q1:s1 100.0% 3.57833E-60 GAP: 33 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.182, -2.156361006712914, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD88_gp50 [Arthrobacter phage Amyev] ],,YP_010677753,97.0,3.57833E-60 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santaolaya, Cristal /note=Auto-annotation: Glimmer and GeneMark both call the start at 33782 which has a start codon of ATG. /note=Coding Potential: Within this ORF, there is coding potential on only the forward strand which indicates that this is in fact a forward gene. Notably, the start position does not contain all of the coding potential, there is some coding potential right before the start site on the Host-Trained GeneMark but in the Self, all the coding potential is within the ORF. /note=SD (Final) Score: -2.156. This is the best and only final score on PECAAN and it has a Z-score of 3.182. /note=Gap/overlap: There is only one gene candidate given which has a length of 303 and a gap of 33. This gap is acceptable because it is less than 50 bp and contains no coding potential on the Host-Trained GeneMark however there is coding potential within the gap on GeneMark Self. /note=Phamerator: pham: 106361. 10/09/2023. Within this pham there are 54 members two of them being Adumb2043 and Phives. /note=Starterator: Starterator claims that start number 14 is called the most often in 19 out of 39 non-draft genes. MissSwiss has start number 15 which currently has no annotations. It can be found at start 33782 in MissSwiss. /note=Location call: With the consideration of the collected evidence and guidelines, this is a real gene that contains coding potential and the start site is located at 33782. This gene does not need to be deleted. /note=Function call: NKF. On Phagesdb BLAST, there was a significant hit of 4e-49 for Amyev and most of the other phages listed are drafts. It called the function as unknown. NCBI pBLAST provided a hypothetical protein hit for Amyev with 100% coverage and alignment of 97%. Its e-value was 3.5e-60. CDD gave no hits and HHpred was not informative because of high e-values. Taking all this into account, this gene function was called to have NFK. /note=Transmembrane domains: No TMDs were predicted in DeepTMHMM so it is not a membrane protein. /note=Secondary Annotator Name: Esherick, Sophie /note=Secondary Annotator QC: I have QC`ed this gene and agree with the function and location call. CDS 34207 - 34434 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="MissSwiss_51" /note=Original Glimmer call @bp 34207 has strength 15.56; Genemark calls start at 34207 /note=SSC: 34207-34434 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ADOLIN_52 [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 3.78064E-47 GAP: 122 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.558, -4.279791573443133, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ADOLIN_52 [Arthrobacter phage Adolin]],,QHB36634,100.0,3.78064E-47 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santos, Elisha Anne /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on start site 34207. Although this is not the longest ORF out of the possible options, the z-score and final score are much more preferred than the other options. /note=Coding Potential: The ORF has reasonable coding potential. The start site covers all of the coding potential and there is only coding potential in the forward direction. /note=SD (Final) Score: -4.280. This is the best final score has the highest z-score at 2.558. /note=Gap/overlap: The gap is 122bps, which is a larger gap. However, this gene and gap is conserved in several other phages like in the phages DrManhattan and Adolin. /note=Phamerator: Pham 101947. Date 10/09/2023. It is conserved; found in DrManhattan and Adolin, which are both in the same cluster as MissSwiss. There is no known function call of this gene in the conserved phages. /note=Starterator: There are only 2 non-draft members in this pham. Start site 1 in Starterator was manually annotated in 2/2 non-draft genes in this pham. Start 1 is 34207 in MissSwiss. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this is a real gene and the most likely start site is 34207. /note=Function call: No Known Function. The 2 phagesdb BLAST hits have unknown functions (e-values < 1e-37). Similarly, the 2 NCBI Blastp hits were hypothetical proteins (100% coverage, 97% identity, and e-values of <2e-46 ). There were no relevant hits in HHpred. CDD had no hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs; therefore, it is not a membrane protein. /note=Secondary Annotator Name: Shera, Simer /note=Secondary Annotator QC: Based on the PECAAN notes, I agree with the start site and function call. CDS 34520 - 35938 /gene="52" /product="gp52" /function="serine integrase" /locus tag="MissSwiss_52" /note=Original Glimmer call @bp 34517 has strength 9.61; Genemark calls start at 34517 /note=SSC: 34520-35938 CP: yes SCS: both-cs ST: SS BLAST-Start: [serine integrase [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 0.0 GAP: 85 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.561, -3.525474288708562, no F: serine integrase SIF-BLAST: ,,[serine integrase [Arthrobacter phage Adolin]],,QHB36635,94.7034,0.0 SIF-HHPRED: INTEGRASE; HYDROLASE, SERINE RECOMBINASE, UNIDIRECTIONAL, SITE-SPECIFIC RECOMBINATION; 2.15A {STREPTOMYCES PHAGE PHIC31},,,4BQQ_A,63.9831,100.0 SIF-Syn: The gene is a serine integrase, with the upstream gene as with no known function and the downstream gene with no known function as well, just like in the phage Adolin. This section of is conserved in Adolin. /note=Primary Annotator Name: Yang, Emma /note=Auto-annotation: Both Glimmer and GeneMark call the gene and note that 34,517 is the start site. /note=Coding Potential: Coding potential is seen in both Host and Self-Trained GeneMark in the forward frame ORF2, suggesting that it is a forward gene. Even though there is some coding potential in the reverse frame, there are no start or stop marks that indicate the presence of a reverse gene. Furthermore, the start site covers all the coding potential as seen by the Self and Host GeneMark maps. /note=SD (Final) Score: The final score for 34,517 is -4.496, but there is a smaller final score of -3.525 for the start site 24,520, which is the best option compared to other start sites because it is the least negative. The z-value is 2.561 for both of these start sites, which is a good z-value because it’s greater than 2 and also the highest z-value. The start codon is ATG for 34,517 and GTG for 34,520, which are both one of the more common start codons. /note=Gap/overlap: The overlap is 82bp for start site 34,517 and 85bp for start site 34,520, and while these aren’t the smallest gaps, they make the most sense when compared to the other indicators like final score and z-value. The length of the gene for start site 34,517 (1422bp) and start site 34,520 (1419bp) are acceptable because it exceeded 120bp. /note=Phamerator: This gene is part of pham 117373 according to PhagesDB as of October 9, 2023. It is conserved in phages Adolin and DrManhattan of the same cluster as MissSwiss as well as 34 other phages in Cluster AZ (409 total members in the pham). The function call for the gene in other phages is a serine integrase and it’s consistent on PECAAN as well. /note=Starterator: Out of the 384 non-draft members in this Pham, 123 of them call the start site at start site 72, which doesn’t correlate to the auto-annotated start site 63 at 34,517 with 15 manual annotations for MissSwiss by GeneMark and Glimmer, so Staterator is not relevant for location call. However, 34,520 is the most likely start site because it has the most favorable RBS final score and is the second start site of two tandem start sites. /note=Location call: In this case, these two start sites are tandem start sites, and according to SEA-PHAGES, when there are tandem start sites, the second one is the right one to use. Therefore, evidence all points to the presence of a real gene with a start site at nucleotide 34,520. /note=Function call: Multiple PhagesDB BLAST had hits with the suggested function of a serine integrase; these results had small e-values of 0.0 and 0.0 and alignment of 91% and 90% for Adolin and Kaylissa, respectively; the NCBI BLAST results had the small e-values of 0.0 and 0.0 and alignment of 91.74% and 91.53% for Adolin and DrManhattan, respectively. HHpred had hits that aligned with 4BQQ_A a serine recombinase integrase with 100% probability, 63.98% coverage, and an e-value of 2.6 x 10-35 and 5UDO_G A118 serine integrase with 100% probability, 67.80% coverage, and an e-value of 3.8 x 10-32. There were three hits on CDD, and two of them had e-values that were relevant with an e-value of 4.06 x 10-33 for a resolvase in the serine recombinase superfamily and an e-value of 3.29 x 10-12 for a recombinase associated with an integrase. /note=Transmembrane domains: There are no TMDs found by DeepTMHMM, so this gene is not a membrane protein. /note=Secondary Annotator Name: White, Logan /note=Secondary Annotator QC: I think some CDD boxes could be checked as evidence as they have good e values for that function, I believe an integrase is a type of recombinase. I agree with the location and function call. CDS 36235 - 36483 /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="MissSwiss_53" /note=Original Glimmer call @bp 36235 has strength 10.52; Genemark calls start at 36235 /note=SSC: 36235-36483 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE16_gp50 [Arthrobacter phage Reedo] ],,NCBI, q1:s1 100.0% 2.45954E-21 GAP: 296 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.137, -4.387988835334809, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE16_gp50 [Arthrobacter phage Reedo] ],,YP_010678233,77.2152,2.45954E-21 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Givan, Susanna /note=Auto-annotation:Both Glimmer and Genemark recognize this gene and they agree on a start site of 36235. The auto-annotated start site has a start codon of “ATG,” which is a common start site and is the start codon with a high probability of being called. Since both Glimmer and Genemark call the start site of 36235, they both agree on the start codon being “ATG”. This indicates “ATG” is likely the correct start codon and leads strong credibility to the start site being 36235 due to ATG being the start codon 60% of the time. /note=Coding Potential:The coding potential for this ORF is forward, which indicates this is a forward gene. The host-trained Genemark shows strong coding potential present and this gene has a coding potential called by both Genemark and Glimmer. The ORF which covers all of the potential is on the last open reading frame and the start site of 36235 covers all of the coding potential that the gene of interest is in. In fact, there is an up-hash directly near where the coding potential began and a downhash directly after the stop site. /note=SD (Final) Score:The final score for this gene is a -4.388 and the Z-score is a 2.137. /note=Both the Z-score and the final score are the highest respective scores listed in PECAAN. It should be noted here however that the start site of 36235 is the only available start site to choose for this gene. /note=Gap/overlap:There is an overlap between this gene and the downstream gene of 3 bps. Furthermore, there is a gap of 188 bps between the end of the upstream gene and this gene’s of 297 bps. Although this gap is pretty large, this section of the genome with a large gap between the upstream gene and an overlap of 3 bps with the downstream gene is conserved in other members of the AZ1 subcluster, specifically, it displays perfect synteny with Adolin, Adumb2043, and Ascela. This indicates that the overall gene structure is present among other members of the subcluster. Additionally, the start site of 30978 would result in a gene length of 311. This is a strong gene length. /note=Phamerator: Gene was found in Pham 117708 as of 10/11/2023. This Pham consists of 52 members, 16 of which are drafts. There appears to be no consistent function typically called among this gene’s pham. /note=Starterator: Start site #9 was called via manual annotation in 34 out of 36 total non-draft phage annotations. Start site #9 in Starterator corresponds to the start site 36483 for this gene and when it was present, it was called 98.1% of the time. Overall, since this start site was the most conserved in the pham, it supports the evidence that this is the best possible start site for this gene. /note=Location call: Given all the evidence, this is a real gene and the start site begins at 36483. This is consistent with the fact that this start site is the most annotated site in the Starterator and that Glimmer/ Genemark both agree on the auto-annotated start site of 36483. In Genemark, 36483 also covers all of the coding potential and it is the start site with the longest gene product of the start site candidates/it has the longest ORF of all the possible start sites. Furthermore, the start site of 36483 has the highest respective Z-score and final score. /note=Function call:In the phages db run, the top two hits were a L2subcluster phage “WiggleWiggle” and a L2 subcluster phage “LilDestine.” They both had no known function listed at e-values of 2e-08 and 7e-07 respectively. This makes phagesdb as compelling evidence that this gene has no known function. Having no known function is also consistent with the NCBI Blast top two hits that both had identities listed as “hypothetical protein.” The first hypothetical protein, YP_010678233 had a high identity of 69.6, and an e-value of 2. 459e^-21. The second hypothetical protein, YP_010677755 was less compelling evidence due to it’s low identity % of 59.7; however, it’s e-value was still strong at 3.24e^-16. HHpred only brought back one hit of an “uncharacterized hypothetical protein,” 4FM3_D. 4FM3_D had a % identity of 72.6, and a poor e-value of 34. This meant HHpred results could not be used as compelling evidence. However, there is still enough evidence for a functional call of unknown known due to the high % identity, low e-values and conservation of the unknown function/synteny of gene structure through the other members of the same subcluster. CDD did not bring back any further hits. /note=Transmembrane domains:DeepTMHMM didn’t have any TMDs listed, thus, it is not a membrane protein. /note=Secondary Annotator Name:White, Logan /note=Secondary Annotator QC: The gap info needs to be updated, there is a 296 bp gap. Under location call, the start site is 36235 (I think you put the stop site by mistake). The info for Phagesdb Blast needs to be updated, and for the evidence boxes I would uncheck the MissSwiss draft and check the ones under it (Reedo + Nitro). Also I would uncheck the HHPred evidence box because the e-value is very high. CDS 36480 - 36698 /gene="54" /product="gp54" /function="RNA binding protein" /locus tag="MissSwiss_54" /note=Original Glimmer call @bp 36480 has strength 16.25; Genemark calls start at 36480 /note=SSC: 36480-36698 CP: yes SCS: both ST: SS BLAST-Start: [RNA binding protein [Arthrobacter phage Cassia]],,NCBI, q1:s4 100.0% 1.20194E-24 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.28, -2.0949393225970705, yes F: RNA binding protein SIF-BLAST: ,,[RNA binding protein [Arthrobacter phage Cassia]],,WGH21123,77.6316,1.20194E-24 SIF-HHPRED: SIF-Syn: MissSwiss has synteny with Adolin, DrManhattan and various other phages. In these phages, the gene upstream of RNA binding proteins is serine integrase and downstream is NKF followed by an endolysin. /note=Primary Annotator Name: White, Logan /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 36480. /note=Coding Potential: There is coding potential in both GeneMark self and host. The chosen start site does cover all of the coding potential. /note=SD (Final) Score: The best final score is -2.095 with the best z-score of 3.28. /note=Gap/overlap: 4bp overlap, which suggests that this gene is part of an operon. /note=Phamerator: Pham 85659. Date: 10/09/23. This pham is conserved among several members of the cluster including DrSierra. /note=Starterator: Start site 6 is manually annotated in 18/38 non-draft genomes. Start site 6 is MissSwiss is 36480, which matches with Glimmer and GeneMark. /note=Location call: This gene is real with a start site at 36480. This is the LORF with a length of 219 bp and is conserved with several members of the pham. /note=Function call: RNA binding protein. There are multiple hits in PhagesDB BLAST with e-values < 5e-26 and on NCBI BLAST (91.7% coverage, 75.7% identity, and e-values < 4.41e-31) for RNA binding protein. CDD has no hits, and HHPred has some hits with somewhat high e-values of 0.57 for RNA binding protein and 0.014-0.015 for an unknown function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, not a membrane protein. /note=Secondary Annotator Name: Peri, Jayasree /note=Secondary Annotator QC: I agree with this location and function call. CDS 36695 - 36886 /gene="55" /product="gp55" /function="RNA binding protein" /locus tag="MissSwiss_55" /note=Original Glimmer call @bp 36695 has strength 10.4; Genemark calls start at 36695 /note=SSC: 36695-36886 CP: yes SCS: both ST: SS BLAST-Start: [RNA binding protein [Arthrobacter phage London] ],,NCBI, q1:s1 100.0% 7.94537E-25 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.453, -3.8116689268127657, yes F: RNA binding protein SIF-BLAST: ,,[RNA binding protein [Arthrobacter phage London] ],,QOP64354,85.7143,7.94537E-25 SIF-HHPRED: Asl2047 protein; HFQ, SM, RNA-BINDING PROTEIN, SRNA, TRANSLATIONAL REGULATION, RNA BINDING PROTEIN; 2.31A {Nostoc sp.},,,3HFN_A,93.6508,97.4 SIF-Syn: AZ1 Phage Adolin shows synteny with MissSwiss with a upstream gene of RNA binding protein and a downstream endolysin gene. /note=Primary Annotator Name: Esherick, Sophie /note=Auto-annotation: Glimmer and GeneMark. Both call start at 36695. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The auto-annotated start site includes all coding potential. The complementary ORF does not have any coding potential peaks or atypical coding potential. /note=SD (Final) Score: -3.812. It is the best final score on PECAAN. /note=Gap/overlap: -4 overlap. This is common within an operon. This small overlap is also conserved in other AZ1 phages like Adolin and DrManhattan. /note=Phamerator: 10/6/2023. pham: 115273. It is conserved in Adolin and DrManhattan (AZ1 phages). /note=Starterator: 10/6/2023. Start site in Starterator (start 4) was found in 23 of 24 ( 95.8% ) of genes in pham. This start has the most manual annotations, and when this start site is present, it is called 100.0% of time when present. It has been manually annotated 15 of 16. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 36695. /note=Function call: RNA binding protein. Significant hits were found in Phagesdb (e-values of 1 e-17 and 4e-23) for RNA binding protein. NCBI Blast had significant hits for RNA binding protein (e-value of 7.94537e-25 with 100% coverage and 77.7% identity). HHpred had hits for rna binding protein, with moderately low e-values (0.0052 with 97.4% probability and 93.6% coverage). CDD had relevant hits to conserved areas of an RNA binding protein with a moderately low e-value (0.00944974). /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. DeepTMHMM also did not predict any TMDs. /note=Secondary Annotator Name: Qin, Kaley /note=Secondary Annotator QC: I have QC’d this gene and agree with the primary annotator. CDS 36962 - 37186 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="MissSwiss_56" /note=Original Glimmer call @bp 36962 has strength 16.4; Genemark calls start at 36962 /note=SSC: 36962-37186 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TUCK_57 [Arthrobacter phage Tuck]],,NCBI, q8:s6 86.4865% 6.97642E-29 GAP: 75 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.127, -4.408951082954879, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TUCK_57 [Arthrobacter phage Tuck]],,WAB10831,42.8571,6.97642E-29 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Peri, Jayasree /note= /note=Auto-annotation: Glimmer and GeneMark both call the start site at 36962. ATG is the start codon that is called. /note= /note=Coding Potential: Coding potential in this ORF looks to be on the forward strand only, confirming that this is a forward gene; both Self and Host GeneMark indicate that there is coding potential throughout the ORF. /note= /note=SD (Final) Score: The SD (Final) Score is -4.409, which is the best and only final score on PECAAN (the least negative). /note= /note=Gap/overlap: There is a gap of 75bp, which is above the 50bp gap threshold. This indicates that there is excessive space in the genome around this gene that raises concern. /note= /note=Phamerator: The pham number as of Ocober 11, 2023 is 89039. The gene is conserved in phages Adolin (in the same Cluster AZ) and DrManhattan (also in the same Cluster AZ).  /note= /note=Starterator: Start site 1 was the most annotated start site, with 3 out of 3 non-draft phages being called as having this start site. This start site 1 at position 36962 was also called as the most annotated start site for my gene. The start position based on Starterator (36962) agrees with my Glimmer and GeneMark. /note= /note=Location call: Based on the evidence, this is a real gene, and the most likely start site is 36962; I`m calling this despite the excessive gap. /note= /note=Function call: NKF. All non-draft PhagesDB Blast hits indicate unknown function for this gene (most have e-value < 10^-6). All NCBI Blast hits indicate hypothetical protein (≥80% coverage, most are +65% identity, and most e-values < 10^-6). There were no notable hits in both HHPred and CDD. /note= /note=Transmembrane domains: DeepTMHMM doesn’t predict the presence of any transmembrane domains. /note= /note=Secondary Annotator Name: Givan, Susanna /note= /note=Secondary Annotator QC: CDS 37199 - 37351 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="MissSwiss_57" /note=Original Glimmer call @bp 37199 has strength 19.86; Genemark calls start at 37199 /note=SSC: 37199-37351 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp58 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 3.804E-24 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.291, -2.0720764396375664, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp58 [Arthrobacter phage DrManhattan] ],,YP_009815401,96.0784,3.804E-24 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shera, Simer /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 37199. /note=Coding Potential: ORF only shows coding potential in the forward strand, implying this is a forward gene. Coding potential is found both in GeneMark Self and Host across the entire gene. /note=SD (Final) Score: Best score found on PECAAN: -2.072 with a z-score of 3.291 (good because it is greater than 2). /note=Gap/overlap: 12 base pair gap, this is a reasonable gap. The gap is conserved in phage Adolin, which is also part of the AZ1 cluster. /note=Phamerator: 118919. Date 10/14/2023. /note=117431. Date 10/10/2023. It is conserved; 3 non-draft members of the Pham are from cluster AZ. /note=Starterator: As of 10/14/2023: No Starterator report found /note=10/10/2023: Start Site 39 is the most called for manually (114/214 non-draft genes). MissSwiss calls for Start Site 8 which is called for in 8/214 non-draft genes but is present 100% of the time it is called for. Start Site 8 in MissSwiss is 37199. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the likely start site is 37199. /note=Function call: Function unknown. The top non-draft gene BLAST hits has an unknown function (E-value=10^-20) and top NCBI BLAST hit has a hypothetical protein function with E-value<10^-24, 100% coverage and 94.7% identity. HHPred has no relevant hits because all E-values>110. CDD has no relevant hits. /note=Transmembrane domains: No evidence of a transmembrane protein. /note=Secondary Annotator Name: White, Logan /note=Secondary Annotator QC: In Starterator, start site 40 is called in MissSwiss. Starterator evidence box and functions box should be filled. I agree with location and function call. Great, succinct notes. CDS 37351 - 37551 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="MissSwiss_58" /note=Original Glimmer call @bp 37351 has strength 12.08; Genemark calls start at 37351 /note=SSC: 37351-37551 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein PQE14_gp58 [Arthrobacter phage Kaylissa] ],,NCBI, q1:s1 80.303% 3.40221E-21 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.314, -5.059653572764363, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE14_gp58 [Arthrobacter phage Kaylissa] ],,YP_010678104,69.5652,3.40221E-21 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qin, Kaley /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 37351 with a start codon of ATG. /note=Coding Potential: Both host-trained and self-trained GeneMark agree that there is reasonable coding potential within the ORF; however, the called start-sites leave out a small bit of coding potential. /note=SD (Final) Score: -5.060. It is the second best score in PECAAN. The Z-score is 2.314 which is good. /note=Gap/overlap: The gap is -1 which is indicative that this gene belongs to an operon. So, this start site trumps all other start sites. /note=Phamerator: pham: 3368. Date 10/07/2023. All members of this pham are in cluster AZ, including Cassia and Crewmate. /note=Starterator: Start site 6 in Starterator is the most called start site and manually annotated in 18 of 20 non-draft genes in the pham. When present, it is called 78.8% of the time. Start site 6 correlates to start site 37351 in MissSwiss. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 37351. /note=Function call: NKF. On phagesDB pBLAST, the top two hits have significant e-values < 10e-6 (Adumb2043s e-value is 6e-18 and Asa16’s e-value is 2e-17) for unknown function proteins. NCBI pBLAST confirms hypothetical protein hits for Kaylissa (80% coverage, 69% alignment, and e-value of 3.4e-21). There were no hits from CDD and no informative hits from HHpred (all the e-values were well above a reasonable threshold of 10e-3). So, we can only call the function as NKF. /note=Transmembrane domains: DeepTMHMM predicts no TMDs. So, this is not a transmembrane protein. /note=Secondary Annotator Name: Givan, Susanna /note=Secondary Annotator QC: I have QC`d this gene and recommend the following changes. Remember to include the final score along with the SD Score under the SD(final) score section. For Phamerator, does the pham have a typical function or one that is fairly conserved throughout? For Starterator, perhaps consider adding what % of the time the start site is called when it is present. Function call and transmembrane domain section are great! CDS 37548 - 37763 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="MissSwiss_59" /note=Original Glimmer call @bp 37548 has strength 11.67; Genemark calls start at 37548 /note=SSC: 37548-37763 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE12_gp55 [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 98.5916% 2.02504E-31 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.889, -2.9063687850157054, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE12_gp55 [Arthrobacter phage Adumb2043] ],,YP_010677965,78.481,2.02504E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Santaolaya, Cristal /note=Auto-annotation: Both Glimmer and GeneMark call the start at 37548 which has a start codon of ATG. /note=Coding Potential: All the coding potential is contained within the ORF on the forward strand which indicates that this is in fact a forward gene. This coding potential was present in both GeneMark Self and Host. /note=SD (Final) Score: -2.906. This is the best final score given on PECAAN. Additionally the Z-score is 2.889. The only other final score given is -5.882. /note=Gap/overlap: -4 overlap. This overlap is commonly seen within operons. Additionally this overlap is conserved in other AZ1 phages such as Adolin. /note=Phamerator: pham: 4773. Date 10/09/2023. In this pham only the AZ1 cluster is represented and has 17 members. Two of the members are Adolin and DrManhattan. /note=Starterator: According to Starterator, start site 3 was called the most often and was manually annotated in 11 of the 11 non-draft genes within the pham. This gene was called in MissSwiss and it can be found at start 37548. Overall, the evidence supports the calls made by both GeneMark and Glimmer. /note=Location call: After considering the guidelines and the information above, this is a real gene with a start site at 37548. /note=Function call: NKF. On Phagesdb BLAST, there was a significant hit of 2e-25 for Adumb2043. It called the function as unknown. NCBI pBLAST provided a hypothetical protein hit for Adumb2043 with 98.5916% coverage and alignment of 78.481%. Its e-value was 2.02504e-31. CDD gave no hits and HHpred was not informative since the e-values were far too high, the lowest being 12. Taking all this into account, this gene function was called to have NKF. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs so ultimately it is not a membrane protein. /note=Secondary Annotator Name: Peri, Jayasree /note=Secondary Annotator QC: I agree with this function and location call. CDS 37830 - 39293 /gene="60" /product="gp60" /function="endolysin" /locus tag="MissSwiss_60" /note=Original Glimmer call @bp 37830 has strength 12.16; Genemark calls start at 37926 /note=SSC: 37830-39293 CP: yes SCS: both-gl ST: SS BLAST-Start: [endolysin [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 0.0 GAP: 66 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.28, -2.033982896655645, yes F: endolysin SIF-BLAST: ,,[endolysin [Arthrobacter phage Adolin]],,QHB36642,90.1841,0.0 SIF-HHPRED: Endolysin; amidase-2 domain, HYDROLASE; HET: ZN; 2.27A {Staphylococcus phage GH15},,,4OLS_B,40.2464,99.2 SIF-Syn: The gene is an endolysin, with the upstream gene as NKF and the downstream gene as NKF, just like in the phage DrManhattan and Adolin /note=Primary Annotator Name: Santos, Elisha Anne /note=Auto-annotation: Glimmer and GeneMark do not call the same starting codon. Glimmer calls at 37830 which has a starting codon of TTG. GeneMark calls at 37926 which has a start codon of ATG. Although ATG is much more common, Glimmer’s call allows for a smaller and more reasonable gap. GeneMark is not able to call TTG start sites, so this maybe be a reason why their calls don`t match. Additionally, there is a better z-score and final score with Glimmer’s call of 37830. Neither of these start sites are the LORF, but that start site might not have been called because of the uncommon TGG start site. /note=Coding Potential: There is reasonable coding potential in this ORF. This start site does not include all of the coding potential as it excludes a small portion of the beginning coding potential. There is a potential start site that can lessen the gap and include all of the coding potential; however, after further evaluation of starterator data and conservation in other phages like DrManhattan and Adolin, we see that with start site 37830, a slightly larger gap is conserved. There is coding potential in both forward and reverse strands, but this is reasonable for the long length of this gene as it is over 1000bps long. There is coding potential in both forward and reverse strands, but the potential in the reverse does not indicate the presence of an orientation switch or new gene since there were no stop sites present. /note=SD (Final) Score: -2.034. This is not the best final score on PECAAN. However, Glimmer and Starterator data agree on this start site. Additionally, this site has the highest z-score of 3.28. /note=Gap/overlap: 66 bps. Somewhat large gap, but ultimately reasonable because the gap is conserved in other phages like DrManhattan and Adolin. There is also no coding potential in the gap to indicate that a new gene should be added in the gap. /note=Phamerator: Pham 103558. Date 10/09/2023. It is conserved; found in Asa16, DrManhattan, and Adolin which are all in the same cluster AZ. /note=Starterator: Start site 13 in Starterator was manually annotated in 16/42 non-draft genes in this pham. Start site 13 corresponds to 37830 in MissSwiss. Because this start site has 16 manual annotations and the start site 8 (corresponds to 37791 in MissSwiss) does not have any manual annotations, this provides evidence that start site @37830 is most likely correct and agrees with Glimmer. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 37830. /note=Function call: Endolysin. There were multiple hits on phagesdb BLAST that had the function of endolysin (e-value=0.0). In the top 5 hits in NCBI Blastp, 3 hits had a function of endolysin (100% coverage, 77%+ identity, and e-values = 0). HHpred had a hit for endolysin as well (99.24% probability, 40% coverage, and an e-value= 4.3e-9). CDD had no relevant hits /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs; therefore, it is not a membrane protein. /note=Secondary Annotator Name: Qin, Kaley /note=Secondary Annotator QC: I have QC’d this gene and agree with the primary annotator. For the coding potential notes, I specify host-trained and self-trained if there is a difference between which start site excludes coding potential. CDS 39406 - 39600 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="MissSwiss_61" /note=Original Glimmer call @bp 39406 has strength 20.93; Genemark calls start at 39406 /note=SSC: 39406-39600 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CASSIA_60 [Arthrobacter phage Cassia]],,NCBI, q3:s5 95.3125% 1.26421E-28 GAP: 112 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.182, -2.2186743274732437, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CASSIA_60 [Arthrobacter phage Cassia]],,WGH21133,84.6154,1.26421E-28 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Yang, Emma /note=Auto-annotation: Both Glimmer and GeneMark call the gene and note that 39,406 is the start site. /note=Coding Potential: Coding potential is seen in both Host and Self-Trained GeneMark in the forward frame ORF1, suggesting that it is a forward gene. Furthermore, the start site covers all the coding potential in the Self and Host GeneMark maps. /note=SD (Final) Score: The final score is -2.219, which is the best option compared to other start sites because it is the least negative, and the z-value is 3.182, which is a good z-value because it’s greater than 2 and also the highest z-value. Furthermore, the start codon is ATG, which is one of the more common start codons. /note=Gap/overlap: The gap is 112bp, and even though this is a larger gap and there is a start site with a smaller gap, this gap is conserved in phage Asa16. The length of the gene (195bp) is acceptable because it exceeded 120bp, but it isn’t the LORF. /note=Phamerator: This gene is part of pham 86088 according to PhagesDB as of October 9, 2023. It is conserved in phages Adolin and DrManhattan of the same cluster as MissSwiss as well as 21 other phages in Cluster AZ (35 total members in the pham). The function call for the gene in other phages is no known function and it’s consistent on PECAAN as well. /note=Starterator: Out of the 23 non-draft members in this Pham, 15 of them call the start site at start site 8, including MissSwiss which calls the auto-annotated start site 8 at 39,406 with 15 manual annotations. This correlates with the start site for MissSwiss denoted by GeneMark and Glimmer, so Staterator is informative for the location call. Furthermore, 39,406 is the most likely start site because it has the most favorable RBS final score and a high z-value. /note=Location call: This evidence all points to the presence of a real gene with a start site confirmed by both Glimmer and GeneMark at nucleotide 39,406. /note=Function call: Multiple PhagesDB BLAST had hits with no known function; these results had small e-values of 2 x 10-24 and 1 x 10-22 and alignment of 81% and 78% for Cassia and Nitro, respectively; the NCBI BLAST results had the small e-values of 1 x 10-28 and 2 x 10-26 and alignment of 81.97% and 78.69% for Cassia and Nitro, respectively. There were no hits on HHpred that had acceptable e-values because they were all significantly greater than 0. There was one hit on CDD for the DUF3989 super family with a protein of unknown function, but the e-value is only 4.68 x 10-3. /note=Transmembrane domains: There are no TMDs found by DeepTMHMM, so this gene is not a membrane protein. /note=Secondary Annotator Name: Santos, Elisha Anne /note=Secondary Annotator QC: I have QC’ed this gene and I agree with this annotation for location and function call. Great job! CDS 39726 - 40088 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="MissSwiss_62" /note=Original Glimmer call @bp 39726 has strength 17.76; Genemark calls start at 39726 /note=SSC: 39726-40088 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_ADOLIN_63 [Arthrobacter phage Adolin]],,NCBI, q1:s1 99.1667% 3.02595E-57 GAP: 125 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.947, -2.996490344739583, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ADOLIN_63 [Arthrobacter phage Adolin]],,QHB36645,84.2975,3.02595E-57 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Givan, Susanna /note=Auto-annotation:Both Glimmer and Genemark recognize this gene and they agree on a start site of 39726. The auto-annotated start site has a start codon of “ATG,” which is a common start site and is the start codon with a high probability of being called. Since both Glimmer and Genemark call the start site of 39726, they both agree on the start codon being “ATG”. This indicates “ATG” is likely the correct start codon and leads strong credibility to the start site being 39726 due to ATG being the start codon 60% of the time. /note=Coding Potential:The coding potential for this ORF is forward, which indicates this is a forward gene. The host-trained Genemark shows strong coding potential present and this gene has a coding potential called by both Genemark and Glimmer. The ORF which covers all of the potential is on the 3rd reading frame and the start site of 39726 covers all of the coding potential that the gene of interest is in. In fact, there is an up-hash directly near where the coding potential began and a downhash directly after the stop site. /note=SD (Final) Score:The final score for this gene is a -2.996 and the Z-score is a 2.947 /note=Both the Z-score and the final score are the highest respective scores listed in PECAAN. However, the SD/Final Score for the 39726 were not associated with the longest open reading frame. /note=Gap/overlap:There is a gap between the upstream gene, which ends at 36698 and the 39726 start site of 126 bps. This is a rather large gap between genes, however, this is a gap that is conserved in other AZ subcluster members. This gap is specifically conserved in Adolin and Adumb2043. Furthermore, there is a gap of 94 bps between the end of this gene and the 40493 start site of the next gene. This gap is also conserved in Adolin and Adumb2043. This gene has synteny with Adolin and Adumb2043. This indicates that the overall gene structure is present among other members of the subcluster. Additionally, the start site of 39726 would result in a gene length of 362; which is perfectly acceptable. /note=Phamerator: Gene was found in Pham 117716 as of 10/11/2023. This Pham consists of 50 members, 17 of which are drafts. There appears to be no consistent function typically called among this gene’s pham. /note=Starterator: Start site #17 was called via manual annotation in 2 out of 33 total non-draft phage annotations. Start site #17 in Starterator corresponds to the start site 39726 for this gene and when it was present, it was called 83.3% of the time. The most annotated start site was not present in this gene. Overall, this is adequate evidence for the start site being #17; although the low # of start site calls via manual annotation is cause for concern. /note=Location call: Given all the evidence, this is a real gene and the start site begins at 39726. This is consistent with the fact that this start site is the most annotated site in the Starterator and that Glimmer/ Genemark both agree on the auto-annotated start site of 39726. In Genemark, 39726 also covers all of the coding potential. Furthermore, the start site of 39726 has the highest respective Z-score and final score. /note=Function call:No known function. In the phages db run, AZ subcluster phages “Adolin” and “DrManhatten” had no known function listed at e-values of 7e^-65 and 4e^-48 respectively. This establishes phagesdb hits as compelling evidence that this gene has no known function. Having no known function is also consistent with the NCBI Blast top two hits that both had identities listed as “hypothetical protein.” The first hypothetical protein, QHB36645 had a high identity of 75% and an e-value of 3.02e6-57. The second hypothetical protein, YP_009815407 had an identity of 61% and an e-value of 3.56e^-56. HHpred was not informative as the e-values were too high. However, there is still enough evidence for a functional call of unknown known due to the high % identity, low e-values and conservation of the unknown function/synteny of gene structure through the other members of the same subcluster. CDD was not informative either; it had no data to display. /note=Transmembrane domains:DeepTMHMM and TOPCONS didn’t have any TMDs listed, thus, it is not a membrane protein. /note=Secondary Annotator Name: Shera, Simer /note=Secondary Annotator QC: Based on the PECAAN notes, I agree with the start site and function call. CDS 40182 - 40493 /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="MissSwiss_63" /note=Original Glimmer call @bp 40182 has strength 11.93; Genemark calls start at 40182 /note=SSC: 40182-40493 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE12_gp61 [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 100.0% 1.32934E-56 GAP: 93 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.044, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE12_gp61 [Arthrobacter phage Adumb2043] ],,YP_010677971,89.3204,1.32934E-56 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: White, Logan /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 40182. /note=Coding Potential: There is coding potential in both GeneMark self and host. The chosen start site does cover all of the coding potential. /note=SD (Final) Score: The only final score is -2.505 with a z-score of 3.044. /note=Gap/overlap: There is a 93 bp gap, but it appears to be conserved with phage KeAlii. /note=Phamerator: Pham 117844. Date: 10/09/23. This pham is conserved in one other phage of cluster AZ1, KeAlii, and several others of different clusters. /note=Starterator: Start site 7 is manually annotated in 7 of 26 non-draft genomes. Start site 7 in MissSwiss is at 40182, which agrees with Glimmer and GeneMark. /note=Location call: This gene is real with a start site at 40182. This is the LORF with a length of 312 bp and is conserved with several members of the pham. /note=Function call: Unknown Function. There are multiple hits in PhagesDB BLAST with e-values < 1e-46 and on NCBI BLAST (100% coverage, 83.5% identity, and e-values < 4.36e-55) for unknown function. HHPred and CDD are not informative as there were no hits in CDD and HHPred had high e-values (e-values >1.8). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, not a membrane protein. /note=Secondary Annotator Name: Yang, Emma /note=Secondary Annotator QC: After reviewing the PECAAN notes, I agree with both the location and function call for this gene. CDS 40483 - 40842 /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="MissSwiss_64" /note=Original Glimmer call @bp 40483 has strength 16.48; Genemark calls start at 40483 /note=SSC: 40483-40842 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE14_gp65 [Arthrobacter phage Kaylissa] ],,NCBI, q1:s1 100.0% 1.36447E-67 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.044, -2.5052746077145835, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE14_gp65 [Arthrobacter phage Kaylissa] ],,YP_010678111,94.9153,1.36447E-67 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Esherick, Sophie /note=Auto-annotation: Glimmer and GeneMark. Both call start at 40483. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The complementary ORF does not have any coding potential peaks or atypical coding potential. /note=SD (Final) Score: -2.505. It is the best final score on PECAAN. /note=Gap/overlap: -11 bp overlap. This is a bit of a large overlap, however it is conserved among other AZ1 phages Adolin and DrManhattan. /note=Phamerator: 10/6/2023. pham: 116447. It is conserved in Adolin, VResidence and DrManhattan (AZ1 phages). /note=Starterator: 10/6/2023. Start site in Starterator (start 7) was found in 7 of 56 ( 12.5% ) of genes in pham. It does not have the most manual annotations, however when this start site is present, it is called 100.0% of time when present. It has been manually annotated 4 of 40. The start with the most MA’s was not found in MissSwiss. This start site was called in others phages that display synteny (Adolin, DrManhattan). /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 40483. /note=Function call: NKF. The top two phagesdb BLAST hits have the function of DUF (E-values of 3e-58 , and 2 top NCBI BLAST hits also have the function of unknown function. (100% coverage, 87%+ identity, and E-values of 1.36447e-67). HHpred had no significant hits. CDD had no significant hits. /note=Transmembrane domains: Neither TMHMM or TOPCONS predict any TMDs, therefore it is not a membrane protein. DeepTMHMM also did not predict any TMDs. /note=Secondary Annotator Name: Santos, Elisha Anne /note=Secondary Annotator QC: I have QC’ed this gene and I agree with this annotation for location and function call. CDS 40842 - 41120 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="MissSwiss_65" /note=Original Glimmer call @bp 40842 has strength 14.35; Genemark calls start at 40842 /note=SSC: 40842-41120 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_JANEEMI_63 [Arthrobacter phage Janeemi]],,NCBI, q1:s1 100.0% 9.66189E-55 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.608, -4.255672789525901, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_JANEEMI_63 [Arthrobacter phage Janeemi]],,UVK63583,98.913,9.66189E-55 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Peri, Jayasree /note= /note=Auto-annotation: Glimmer and GeneMark both call the start site at 40842. ATG is the start codon that is called. /note= /note=Coding Potential: Coding potential in this ORF looks to be on the forward strand only, confirming that this is a forward gene; both Self and Host GeneMark indicate that there is coding potential almost all throughout the ORF. /note= /note=SD (Final) Score: The SD (Final) Score is -4.256, which is the best and only final score on PECAAN (the least negative). /note= /note=Gap/overlap: There is an overlap of 1bp, indicating that the gene might be part of an operon. /note= /note=Phamerator: The pham number as of Ocober 11, 2023 is 3945. The gene is conserved in phages Adolin (in the same Cluster AZ) and DrManhattan (also in the same Cluster AZ).  /note= /note=Starterator: Start site 13 was the most annotated start site, with 11 out of 19 non-draft phages being called as having this start site. However, start site 14 at position 40842 was called as the most annotated start site for my gene. The start position based on Starterator (40842) agrees with my Glimmer and GeneMark. /note= /note=Location call: Based on the evidence, this is a real gene, and the most likely start site is 40842. /note= /note=Function call: NKF. All non-draft PhagesDB Blast hits that have e-values < 10^-6 indicate unknown function for this gene; there are hits that indicate a function of tape measure protein, but the e-values are > 4. All NCBI Blast hits indicate hypothetical protein (≥90% coverage, most are +50% identity, and most e-values < 10^-6). There were no notable hits in both HHPred and CDD. /note= /note=Transmembrane domains: DeepTMHMM doesn’t predict the presence of any transmembrane domains. /note= /note=Secondary Annotator Name: Givan, Susanna /note= /note=Secondary Annotator QC: ATG is a common start codon to be called and this is good evidence for this start site as well so maybe mention this in the auto-annotation. Remember to mention the final score under SD (final) score section. For Phamerator, is there a conserved function among the pham more broadly? Starterator, Location Call and Function Call are great! Remember to fill out synteny box below. CDS 41111 - 41281 /gene="66" /product="gp66" /function="membrane protein" /locus tag="MissSwiss_66" /note=Original Glimmer call @bp 41111 has strength 14.53; Genemark calls start at 41126 /note=SSC: 41111-41281 CP: yes SCS: both-gl ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 6.23301E-25 GAP: -10 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.291, -2.2821867859826788, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Adolin]],,QHB36649,89.2857,6.23301E-25 SIF-HHPRED: Heme exporter protein D; ATP-binding exporter, Heme transmembrane transporter, Cytochrome c biogenesis protein., MEMBRANE PROTEIN; HET: PO4, 3PE; 3.24A {Escherichia coli BL21(DE3)},,,7F02_D,92.8571,96.0 SIF-Syn: /note=Primary Annotator Name: Shera, Simer /note=Auto-annotation: Glimmer and GeneMark call two different start sites. Glimmer calls for 41111 and Genemark calls for 41126. /note=Coding Potential: ORF only shows coding potential in the forward strand, implying this is a forward gene. Coding potential is found both in GenMark Self and Host and there is potential across the entire length of the gene. It is difficult to distinguish from the coding potential alone whether 41111 or 41126 is the right starting site because of how close together the two potential sites are. /note=SD (Final) Score: Best score found on PECAAN : -2.282 for start site 41111. The final score for 41126 is significantly lower and more unfavorable. /note=Gap/overlap: -10 base pair gap, although this is an odd overlap, it seems to be conserved in Adolin, which is also part of the AZ1 cluster. /note=Phamerator: 105058. Date 10/11/2023. It is conserved; all 31 non-draft members of the Pham are from cluster AZ. /note=Starterator: Start Site 3 is the most called for manually (27/31 non-draft genes). MissSwiss calls for Start Site 3 which is 41111. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the above evidence, this is a real gene and the likely start site is 41111. /note=Function call: Membrane protein. Two non-draft gene BLAST hits have an unknown function (E-value=10^-19 and two NCBI BLAST hits call for membrane protein function with E-value<10^-21, 100% coverage and >80.70% identity. HHPred has no relevant hits because all E-values>0.16. /note=Transmembrane domains: There is evidence of one predicted TMR with a length of 56. This indicates it to be a transmembrane protein. /note=Secondary Annotator Name: Qin, Kaley /note=Secondary Annotator QC: I have QC’d this gene and agree with the primary annotator. For the coding potential notes, I would state whether or not all coding potential is within the called start sites. Also, make sure to check the phagesDR pBLAST evidence you mention in your PECAAN notes. CDS 41278 - 41436 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="MissSwiss_67" /note=Original Glimmer call @bp 41278 has strength 11.1; Genemark calls start at 41278 /note=SSC: 41278-41436 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp69 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 1.555E-23 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.044, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp69 [Arthrobacter phage DrManhattan] ],,YP_009815412,90.566,1.555E-23 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qin, Kaley /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 41278 with a start codon of ATG. /note=Coding Potential: All the coding potential is included in the start site for the host-trained GeneMark; however, coding potential is not reasonable for half the gene. All the coding potential is included in the start site for the self-trained GeneMark and all coding potential is reasonable. /note=SD (Final) Score: -2.584. This is the best final score in PECAAN. /note=Gap/overlap: The gap is -4 which is indicative that this gene belongs to an operon. So, this start site trumps all other start sites. /note=Phamerator: pham: 2353. Date 10/09/2023. All members of this pham are in cluster AZ, including Aldolin and DrManhattan. /note=Starterator: Start site 9 in Starterator is the most called start site and manually annotated in 32 of 33 non-draft genes in the pham. Start site 9 correlates to start site 41278 in MissSwiss. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is at 41278. /note=Function call: NKF. On phagesDB pBLAST, the top two hits have significant e-values < 10e-6 (Aldolin’s e-value is 4e-21 and DrManhattan’s e-value is 4e-21) for unknown function proteins. NCBI pBLAST confirms these hypothetical protein hits for Aldolin and DrManhattan (100% coverage, 90% alignment, and e-value of 1.6e-23). There were no hits from CDD and no informative hits from HHpred (all the e-values were well above a reasonable threshold of 10e-3). So, we can only call the function as NKF. /note=Transmembrane domains: DeepTMHMM predicts no TMDs. So, this is not a transmembrane protein. /note=Secondary Annotator Name: Santaolaya, Cristal /note=Secondary Annotator QC: I have QC`ed this annoation and agree with the location and function calls made by the primary annotator. Good job! CDS 41433 - 41789 /gene="68" /product="gp68" /function="hypothetical protein" /locus tag="MissSwiss_68" /note=Original Glimmer call @bp 41433 has strength 9.88; Genemark calls start at 41433 /note=SSC: 41433-41789 CP: yes SCS: both ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 96.6102% 3.19362E-59 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.628, -3.3861058366776207, no F: hypothetical protein SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage DrManhattan] ],,YP_009815413,84.4828,3.19362E-59 SIF-HHPRED: SIF-Syn: According to the SEAPHAGES criteria, in order to call the function as HNH endonuclease there must be a HNH amino acid sequence over a span of 30 amino acids. In the HHpred that was run, there was a resulting HNH sequence therefore this gene functions as HNH endonuclease. In pham maps there was synteny with phage DrManhattan however there was no function called. HNH endonuclease was called downstream in another gene however the SEAPHAGES criteria supports this call. /note=Primary Annotator Name: Santaolaya, Cristal /note=Auto-annotation: The start site is called at 41433 by both Glimmer and GeneMark and the start codon is GTG. /note=Coding Potential: All of the coding potential is within the ORF on the forward strand so this is a forward gene. In both GeneMark Self and Host, coding potential was present. /note=SD (Final) Score: -3.386. This is the second best final score on PECAAN however the candidate containing the best final score had a large overlap of 187 whereas the chosen candidate had one of 4 bp which made it a better selection. The given Z-score is 2.628. /note=Gap/overlap: -4 overlap. This overlap is seen within operons often and is conserved in other AZ1 phages like Adolin and DrManhattan. /note=Phamerator: pham: 117889. Date 10/09/2023. This pham contains only 24 AZ cluster members. Some of the phages within this pham are Adolin, Nitro, and DrManhattan /note=Starterator: Starterator states that start site number 13 was called the most often in 8 out of the 16 non-draft genes within the pham. MissSwiss was found to have start site 14 which has 2 manual annotations in Adolin and DrManhattan which are both in cluster AZ1 as well. This start site can be found at 41433 in MissSwiss which supports the calls made by GeneMark and Glimmer. /note=Location call: This gene can be deemed as real based on the data collected and analyzed based on the guidelines. The start site is at 41433 bp. /note=Function call: HNH endonuclease. On Phagesdb BLAST, there were good hits obtained of 5e-52 for DrManhattan and 3e-49 for Adolin both of which listed the function as unknown. NCBI pBLAST gave a hit for DrManhattan for hypothetical protein HNH endonuclease with 96.6102% coverage and alignment of 84.4828%. Its e-value was 3e-59. CDD gave no hits and HHpred had one hit with an e-value of 0.047. It described the function as a restriction endonuclease. /note=Transmembrane domains: No TMDs were predicted by DeepTMHMM so this cannot be a membrane protein. /note=Secondary Annotator Name: Shera, Simer /note=Secondary Annotator QC: Based on the PECAAN notes, I agree with the location and function call for this gene. CDS 42073 - 42246 /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="MissSwiss_69" /note= /note=SSC: 42073-42246 CP: yes SCS: neither ST: NI BLAST-Start: [hypothetical protein SEA_ADOLIN_70 [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 7.54243E-31 GAP: 283 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.125, -2.338663114094478, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ADOLIN_70 [Arthrobacter phage Adolin]],,QHB36652,100.0,7.54243E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Trinh, Uyen /note=Auto-annotation: No call sites for Glimmer and Genemark yet. Start codon is TTG, which is not the most common start codon, but it is used in about 7% of all genes, providing evidence for the location call. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.339. It is the best final score on PECAAN. Z-score of 3.125. /note=Gap/overlap: Gap of 174 bp with the upstream gene. Relatively large gap and there are not usually large non-coding gaps between genes, providing refuting location call. /note=Phamerator: This gene could potentially be an orpham. /note=Starterator: N/A /note=Location call: Considering coding potential generally trumps all other evidence, this gene is a real gene and has a start site at 42073 bp. No information was available for Starterator, Glimmer, and Genemark. /note=Function call: NKF. NKF. The top results for phagesDB BLAST (e-value: 1e-25) and NCBI (e-value: 7e-31) indicated either function unknown or hypothetical protein. Both HHpred and CDD did not provide any relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=AF - looks good CDS 42246 - 42548 /gene="70" /product="gp70" /function="HNH endonuclease" /locus tag="MissSwiss_70" /note= /note=SSC: 42246-42548 CP: yes SCS: neither ST: NA BLAST-Start: [HNH endonuclease [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 5.69308E-53 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.136, -3.303565774717962, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage DrManhattan] ],,YP_009815415,98.0,5.69308E-53 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,66.0,97.7 SIF-Syn: HNH endonuclease, this is the last gene in the genome, and the function is conserved in the same position, as seen in phage DrManhattan /note=Primary Annotator Name: Peri, Jayasree /note= /note=Auto-annotation: Because this gene was added after, there is no data for auto-annotation. ATG is the start codon. /note= /note=Coding Potential: Coding potential in this ORF looks to be on the forward strand only, confirming that this is a forward gene; both Self and Host GeneMark indicate that there is coding potential for the last half of the ORF. /note= /note=SD (Final) Score: The SD (Final) Score is -3.304, which is the best final score on PECAAN (the least negative). /note= /note=Gap/overlap: There is an overlap of -1bp, indicating that the gene might be part of an operon. /note= /note=Phamerator: Because this gene was added in, there is no data for Phamerator. /note= /note=Starterator: Because this gene was added in, there is no data for Starterator. /note= /note=Location call: Based on the evidence, this is a real gene, and the most likely start site is 42246. /note= /note=Function call: HNH Endonuclease. All non-draft hits on PhagesDB Blast show the function of HNH endonuclease (e-value < 10^-22), and several NCBI Blast hits show the same function of HNH endonuclease (≥99% coverage, +65% identity, and e-value < 10^-26). The top hit in HHPred indicated function of HNH endonuclease with 97.72% probability, 93.1338% coverage, and E-value of 1.5 x 10^-5. CDD had a hit for HNH endonuclease with 42% coverage and E-value of 5.98 x 10^-4. /note= /note=Transmembrane domains: DeepTMHMM doesn’t predict the presence of any transmembrane domains.