CDS 140 - 571 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="Soondubu_1" /note=Original Glimmer call @bp 140 has strength 13.57; Genemark calls start at 140 /note=SSC: 140-571 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Arthrobacter phage Liebe] ],,NCBI, q1:s1 99.3007% 4.23064E-58 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.077, -2.583959800616441, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Arthrobacter phage Liebe] ],,YP_009817033,76.1589,4.23064E-58 SIF-HHPRED: Terminase_4 ; Phage terminase, small subunit,,,PF05119.15,46.1538,98.7 SIF-Syn: There is synteny of this gene in other AZ cluster final draft phage genomes (Adolin and Amyev) as the terminase small subunit. /note=Primary Annotator Name: Bonthala, Praneel /note=Auto-annotation start source: Glimmer and GeneMark both call the start site at 140. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential in both GeneMark Self and Host does not cover the entire gene though, only covering between base pairs 225 and 525. There is synteny of this gene in other AZ cluster final draft phage genomes (Adolin and Amyev) as the terminase small subunit. /note=SD (Final) Score: -2.584. This is the highest Final Score on PECAAN. /note=Gap/overlap: This is the first gene in the genome, so no gap is shown. /note=Phamerator: Pham: 116163. Date 10/2/2023. The pham is shared by 208 members and by many AZ cluster members. /note=Starterator: Soondubu has start site 27@140 with 37 manual annotations. This start site was found in 53 of 208 genes in the pham. However, it was not the most commonly annotated start site, which is 31@109 found in 58 of 173 non-draft genomes. /note=Location call: Based on the above evidence, this is most likely a real gene. However, the start site of the gene is ambiguous. /note=Function call: Terminase small subunit. The top 3 PhagesDB blast hits have the function of a terminase small subunit with an e-value less than 10^-44. All of the nonhypothetical proteins on the NCBI blast hits with an e-value less than 10^-44 are also designated as terminase small subunits with 100% coverage and >55% identity. HHPred had 2 relevant hits with >46% coverage and e-values lower than 2.4e-7, both of which are terminase small subunits. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Yao, Jiayu /note=Secondary Annotator QC: I agree with this annotation. CDS 568 - 2274 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="Soondubu_2" /note=Original Glimmer call @bp 568 has strength 13.85; Genemark calls start at 568 /note=SSC: 568-2274 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 100.0% 0.0 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.185, -4.271191715599192, yes F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Arthrobacter phage Adumb2043] ],,YP_010677912,90.1408,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,93.4859,100.0 SIF-Syn: This gene is a terminase (large subunit) . Upstream gene is a terminase (small subunit) , and the downstream gene is a portal protein. This shows synteny with Adolin( AZ) /note=Primary Annotator Name: Faulkner, Cheyenne /note=Auto-annotation: Glimmer and Genemark were used to predict the start site. Both call the start at 568. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.271. This is the highest final (least negative) score on PECAAN. /note=Gap/overlap: -4 bp. This a small overlap, which could mean that this gene is part of an operon. This means that there is no room for a gene downstream. This overlap is also conserved in other AZ phages such as Adolin & Amyev. /note=Phamerator: pham:116018 as of 10/3/2023. It is conserved, found in both Adolin(AZ) & Amyev (AZ). /note=Starterator: Start site 66 in starterator was manually annotated in 31/1237 non-draft genomes. Start 66 is not the most annotated start site & Soondubu does not have the most annotated start site. Start 66 is also found in Amyev. This evidence agrees with the start site called by Glimmer & GeneMark /note=Location call: Based on the evidence above, this is a real gene and the start site is most likely 568. /note=Function call: Terminase. The top 3 phagesdb BLAST hits have the function of termiase(E-Values=0), and 5/5 of the NCBI BLAST hit also have a function of Terminase (Coverage 99%+, 81%+ identity, & E-Value=0). HHpred had a hit for termianse with a probability of 100%, 93% coverage, and an E-value of 6.2*10^-40. CDD did not have any relevant hits. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs thus, is not a membrane protein. /note=Secondary Annotator Name: Yao, Jiayu /note=Secondary Annotator QC: I agree with this annotation. CDS 2296 - 3684 /gene="3" /product="gp3" /function="portal protein" /locus tag="Soondubu_3" /note=Original Glimmer call @bp 2296 has strength 18.08; Genemark calls start at 2296 /note=SSC: 2296-3684 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage Liebe] ],,NCBI, q1:s1 98.2684% 0.0 GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.315, -2.305049668942183, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage Liebe] ],,YP_009817035,77.9093,0.0 SIF-HHPRED: Portal protein; Bacteriophage, SPP1, Portal Protein, Head completion proteins, Connector Complex, DNA Channel, VIRAL PROTEIN; 2.7A {Bacillus subtilis},,,7Z4W_L,90.4762,100.0 SIF-Syn: This gene is a portal protein. Upstream gene is terminase (large subunit), and the downstream gene is NKF. No synteny with any other phages. /note=Primary Annotator Name: Wong, Michael /note=Auto-annotation: Both Glimmer & Genemark that agree on same start site (2296); ATG start codon called. /note=Coding Potential: Reasonable coding potential predicted within putative ORF. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: SD score is the best (i.e. -2.305) as it is the least negative out of all other options. /note=Gap/overlap: The gap with the upstream gene is reasonable (21bps). There are no alternative start candidates that are reasonable. The length of the gene is acceptable given the auto-annotated start (chosen start site as well). /note=Phamerator: On 10/4/23, gene was found in pham #116016. The pham in which the gene is conserved is in other members of the cluster/subcluster to which the phage belongs; Liebe and Maureen were used for comparison. Phamerator called for the function of the gene to be a portal protein. Functions called were consistent and found in the approved function list. /note=Starterator: The reasonable start site choice that is conserved among members of the pham to which the gene belongs is 99. Start: 99 @2296 has 13 MA`s. Found in 17 of 1642 ( 1.0% ) of genes in pham. Start 99 is also found in Liebe and Maureen. This evidence agrees with the start site called by Glimmer & GeneMark. The start number called the most often in the published annotations is 73, it was called in 298 of the 1525 non-draft genes in the pham. /note=Location call: The gathered evidence (z-score is the best (highest), codon is common (ATG), final score is the best (least negative)) suggests that the original start site of 2296 is the best possible start site. The gene is a real gene (conserved in phamerator and good coding potential) and the potential start site of 2296 is the most likely potential start site candidate. The potential start site candidate of 2296 seems the most likely as it covers all coding potentials and is called by both Glimmer and Genmark. /note=Function call: Predicted function is portal protein, based on hits from PhagesDB BLASTp and NCBI BLASTp, both of which had 1 hit with high query coverage (>98%), high identity (>75%), and a low e-value (0.0). Within HHPred’s best Pfam hit, there was strong evidence that matched the portal protein function (100 probability, ~90% coverage, 2.3e-34). There are no specific requirements listed on the approved functions list. CDD had 1 hit for a portal protein with high coverage (>85%) and low e-value (6.75008e-35). /note=Transmembrane domains: There is an absence of TMDs as predicted by DeepTMHMM. This is because the function of portal protein is a well-known membrane protein that doesn’t need TMHMM to identify that it is a membrane protein (and it won’t show), but has sufficient evidence in HHPred and BLASTp for its function (see function call). /note=Secondary Annotator Name: Gallagher, Hannah /note=Secondary Annotator QC: The only thing I would change is you checked a box for Soondubu as evidence but everything else looks great! I agree with this annotation. CDS 3698 - 5719 /gene="4" /product="gp4" /function="capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin" /locus tag="Soondubu_4" /note=Original Glimmer call @bp 3698 has strength 15.56; Genemark calls start at 3698 /note=SSC: 3698-5719 CP: yes SCS: both ST: SS BLAST-Start: [capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin [Arthrobacter phage Ascela]],,NCBI, q1:s1 99.2571% 0.0 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.227, -2.484606983760709, yes F: capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin SIF-BLAST: ,,[capsid maturation protease and VIP2-like ADP-ribosyltransferase toxin [Arthrobacter phage Ascela]],,WGH21527,73.3038,0.0 SIF-HHPRED: d.166.1.1 (A:265-550) automated matches {Anthrax bacillus (Bacillus anthracis) [TaxId: 1392]} | CLASS: Alpha and beta proteins (a+b), FOLD: ADP-ribosylation, SUPFAM: ADP-ribosylation, FAM: ADP-ribosylating toxins,,,SCOP_d4dv8a1,37.4443,99.5 SIF-Syn: The gene in the upper stream is portal protein, while the gene in the lower stream is NKF (like VResidence and Janeemi). /note=Primary Annotator Name: Yao, Jiayu /note=Auto-annotation: Glimmer, GeneMark (host) and GeneMark (self) all agreed on the start site 3698. /note=Coding Potential: coding potential is covered within the start site, and no reverse coding potential is shown /note=SD (Final) Score:-2.458, the least negative score among all the start sites /note=Gap/overlap: 13, relatively small overlap nor gap in between /note=Phamerator: 10/4/2023, it was found in pham 2324. There were 51 members in this pham, and 16 of them were drafts. Phages MissSwiss_Draft and Tallboi_Draft from the AZ cluster also have the same pham. /note=Starterator: 10/2/23, the most often called start site number was 3, it was called in 32 of the 35 non-draft genes in the pham. The auto-annotated start is 3, and it should be the start site for this gene because it is the most conserved among all the genes and the most annotated start site. /note=Location call: Based on the prediction of GeneMark and Glimmer, together with other evidence like Z-value (3.227), final score (-2.458) and gap value, the gene should be real and the start site is 3399. /note=Function call: VIP2-like ADP-ribosyltransferase toxin. because the gene VResidence_4 and Iter_4 from Phagesdb test all have relatively close scores (810 and 809) to Soondubu_4 (1432), and the e-values are 0. NCBI Blast shows the relatively same score with high coverage (99%) and identity (60%) with e-values of 0. The coverage percentage of VIP2-like ADP-ribosyltransferase toxin (37%) from HHpred is above 35% threshold and a low e-value (2.1e-11). CDD also has a hit for VIP2-like ADP-ribosyltransferase toxin with a low e-value (3.15e-21). Since this function is in the approved function list, the function of this gene should be VIP2-like ADP-ribosyltransferase toxin. /note=Transmembrane domains: No transmembrane domains are predicted. /note=Secondary Annotator Name: Faulkner, Cheyenne /note=Secondary Annotator QC: I agree with this annotation. The primary annotator needs to make the following changes: /note=1. Make sure not to use draft genomes as evidence for Phagesdb BLAST CDS 5790 - 6143 /gene="5" /product="gp5" /function="hypothetical protein" /locus tag="Soondubu_5" /note=Original Glimmer call @bp 5790 has strength 12.24; Genemark calls start at 5811 /note=SSC: 5790-6143 CP: yes SCS: both-gl ST: NI BLAST-Start: [hypothetical protein [Arthrobacter sp. AQ5-05] ],,NCBI, q1:s1 100.0% 2.55699E-63 GAP: 70 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.326, -1.993391246735709, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter sp. AQ5-05] ],,WP_113763083,88.8889,2.55699E-63 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mathkour, Yusef /note=auto-annotation: The glimmer and gene mark start do not agree with one another, the glimmer start= 5920 and the genemark start= 5811.The start codon is ATG therefore there is a Higher probability of using this codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The chosen start site covers all coding potential. /note=SD (Final) Score: -1.993 is the best Final score on Peccan, The Z score is 3.326 this is the best Z score value on Peccan. /note=Gap/overlap: 70bp. Somewhat large, but ultimately reasonable because the gap is conserved in other phages (DrManhattan, Adolin) and there is no coding potential in the gap that might be a new gene. /note=phamerator: pham: 116466 date 10/3/23, its conserved, found in the AZ1 cluster with phages; Adolin (AZ1) and DrManhattan(AZ1). There are 51 members in the Pham and 18 are drafts. The assigned function for both genes in NKF. the gene length appears to be conserved in the pham /note=Starterator: Start: 14 @5790 has 41/47 MA`s. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 5790. /note=Function call: NKF; there is no evidence of a function in comparable phages. On HHPRED the most similar function call has probability is 75.9 which is below the 80% threshold showing a sign that it is not closely related in function. The % Coverage: 39.3162% which is above the 35% threshold required but the E-value: is 30, this is significantly higher than what is required which is below 1*10^-6 showing it has no correlation to a known function. PhagesDB blast for (Adolin) as a low e value (9e-35) and a high score of 144 the function of the PhagesDb blast has no known function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Shao, Sarah /note=Secondary Annotator QC: /note=1. For phamerator, how many phages are in the pham? Does the gene length appear to be conserved/is it similar in this gene? /note=2. For starterator, I would change the starterator box to say not informative, and change PECAAN notes to reflect this since none of the start sites in starterator have MAs. /note=3. For function call - discuss individual tools, currently notes only discuss HHPRED. For example, currently no mention of phagesdb BLAST which has relevant hits of function unknown (low e-values). Check off relevant evidence. CDS 6283 - 6819 /gene="6" /product="gp6" /function="scaffolding protein" /locus tag="Soondubu_6" /note=Original Glimmer call @bp 6283 has strength 13.82; Genemark calls start at 6283 /note=SSC: 6283-6819 CP: yes SCS: both ST: SS BLAST-Start: [head scaffolding protein [Arthrobacter phage Tweety19] ],,NCBI, q9:s21 94.9438% 4.36816E-46 GAP: 139 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.999, -3.5088110057482766, yes F: scaffolding protein SIF-BLAST: ,,[head scaffolding protein [Arthrobacter phage Tweety19] ],,YP_010678397,64.4444,4.36816E-46 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_b,60.1124,97.2 SIF-Syn: This gene exhibits synteny with DrManhattan (AZ) and Eraser (AZ). /note=Gene (stop@6283 F) /note=PECAAN Notes /note=Primary Annotator Name: Moore, Joshua /note=Auto-annotation: Both GeneMark and Glimmer call the start at 6283. /note=Coding Potential: The ORF has significant coding potential for this gene only on the forward strand. Potential is found in GeneMark Host and GeneMark Self. /note=SD (Final) Score: -3.509. This is the best final score on PECAAN. The z-score is also the highest at 2.999. /note=Gap/overlap: The gap is large at 139 bp. However, a large gap in this gene is conserved in other phages such as DrManhattan (AZ) and Eraser (AZ), and there is no coding potential in the gap. /note=Phamerator: Pham 1850. 10/4/2023. It is conserved, found in DrManhattan (AZ) and Eraser (AZ) /note=Starterator: There are 50 non-draft phages in this pham, and 47 of them call start site 17. Soondubu calls start site 18, and is the only phage in this pham to do so. However, the Final Score, z-score, length of gene, and coding potential all favor start site 18, and start site 17 is not present in Soondubu. /note=Location call: Based on the above evidence, this gene is a real gene, and has a start site at 6283. /note=Function call: Scaffolding Protein. Many PhagesDB BLAST hits agree with the suggested function of a scaffolding protein with e-values clustered between e-36 to e-46. The top 22 NCBI BLAST also have a function of either scaffolding or head scaffolding protein (89%+ coverage, e-40 to e-46). CDD had no relevant hits. HHPred had a hit that agreed with scaffolding protein with 97.19% probability, 60.1124% coverage, and an e-value of 0.021. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Mathkour, Yusef /note=Secondary Annotator QC:I agree with this annotation. add the function of the forward and backwords genes. CDS 6844 - 7797 /gene="7" /product="gp7" /function="major capsid protein" /locus tag="Soondubu_7" /note=Original Glimmer call @bp 6844 has strength 15.87; Genemark calls start at 6844 /note=SSC: 6844-7797 CP: yes SCS: both ST: SS BLAST-Start: [major head protein [Arthrobacter phage Phives] ],,NCBI, q5:s3 98.7382% 4.46102E-164 GAP: 24 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.315, -2.305049668942183, yes F: major capsid protein SIF-BLAST: ,,[major head protein [Arthrobacter phage Phives] ],,YP_010677642,85.7595,4.46102E-164 SIF-HHPRED: Major capsid protein; P22 Bacteriophage, VIRUS; 3.3A {Salmonella phage P22},,,5UU5_A,90.2208,100.0 SIF-Syn: Major capsid protein, upstream gene is scaffolding protein, downstream is head-to-tail adaptor, just like in phages Phives and Yang. /note=Primary Annotator Name: Shao, Sarah /note=Auto-annotation: Glimmer and GeneMark, agreed start site (6844), ATG /note=Coding Potential: Reasonable coding potential predicted, start site covers all coding potential /note=SD (Final) Score: -2.305. It is the best final score on PECAAN. /note=Gap/overlap: 24bp gap (upstream). Reasonable gap, no coding potential shown in the gap (GeneMark), no other potential start codons that would minimize the gap. /note=Phamerator: pham: 228. Date 10/3/23. It is conserved, found in many other phages including Phives (AZ) and Yang (AZ). 288 phages are members of the pham, almost all have function of major capsid protein and have similar gene length (900-1,200bp). /note=Starterator: Starterator did not provide any helpful information to support a specific start site, none of the called start sites had any manual annotations. Date 10/3/23. /note=Location call: Based on the above evidence, this is a real gene with a likely start site at 6844. /note=Function call: Major capsid protein. The top non-draft phagesdb BLAST hits all have the function of major capsid protein [(Phives, 1e-127), (Yang, 1e-126), (JohnDoe, 1e-126), (Elezi, 1e-125), (Nitro, 1e-125), etc.]. The 2 top NCBI BLAST hits have the function of major head protein (98% coverage, 74.20% identity, e-value 5e-164) and several other top NCBI BLAST hits have the function of major capsid protein. CDD showed one hit for a coat protein (P22_CoatProtein, e-value of 6.62e-08, pfam11651). HHpred showed several relevant hits, most of which shared the function of major capsid protein (99.95% probability, 90.2% coverage, e-value of 2.5e-25). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Faulkner, Cheyenne /note=Secondary Annotator QC: I agree with this annotation. No changes needed at this time. CDS 7865 - 8263 /gene="8" /product="gp8" /function="head-to-tail adaptor" /locus tag="Soondubu_8" /note=Original Glimmer call @bp 7865 has strength 11.92; Genemark calls start at 7865 /note=SSC: 7865-8263 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail adaptor [Arthrobacter phage Janeemi]],,NCBI, q1:s1 97.7273% 3.59898E-62 GAP: 67 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.315, -2.0162541296952132, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Arthrobacter phage Janeemi]],,UVK63529,76.2238,3.59898E-62 SIF-HHPRED: Yqbg; Putative Head-Tail Connector Protein Yqbg from Bacillus subtilis and similar proteins.,,,cd08053,83.3333,99.0 SIF-Syn: Head-to-tail adaptor; upstream is major capsid protein and downstream is head-to-tail stopper, just like in YesChef (AZ). /note=Primary Annotator Name: Chang, Amanda /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 7865. The start codon is a TTG start, which has a low frequency amongst other start codons (ATG, GTG) for protein-coding genes. However, this start covers the entire coding region unlike the other start site candidates. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found at high intensity on GeneMark Host and Self. /note=SD (Final) Score: -2.106. This is the best SD (final score) and it is associated with the largest Z-score (3.315), which is higher than the standard of above 1.8/2 being a good z-score. The other gene candidates also do not include the entire coding region, which makes them unfavorable in comparison to the selected start site and have more negative final scores and smaller z-scores (<1.8). /note=Gap/overlap: There is 67bp gap. This is quite large for a gap (>-1/+4) however there is little to no coding potential in the gap that would indicate a gene being present in that region. A similar gap is conserved in other phages in the AZ cluster such as YesChef and Yang. /note=Phamerator: pham: 11634 as of 10/3/23. It is conserved in 80 phages, including YesChef and Yang (AZ). /note=Starterator: Start site 7 in Starterator was the most manually annotated in 32 out of 60 non-draft genes in this pham. Start 7 is 7865 in Soondubu. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 7865. /note=Function call: Head-to-tail adaptor. The top 5 non-draft Phagesdb Blast hits have the function of head-to-tail adaptor with e-values all smaller than 10^-50. 4 out of the top 5 NCBI Blast hits call the function of this gene to be a head-to-tail adaptor (all of which had e-values under 10^-60, coverage +97%, and identity +69%. HHpred had two hits for head-tail connector which was the previous name for this function as noted on the SEA-PHAGES official function list (name was changed 3/6/2019). The two HHpred hits had high probability (+98%), high coverage (+75%), low e-values (<10^-5). CDD had no hits. Function is on the SEA-PHAGES official function list. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Shao, Sarah /note=Secondary Annotator QC: I agree with the function and location call. CDS 8264 - 8614 /gene="9" /product="gp9" /function="head-to-tail stopper" /locus tag="Soondubu_9" /note=Original Glimmer call @bp 8264 has strength 11.34; Genemark calls start at 8264 /note=SSC: 8264-8614 CP: yes SCS: both ST: NI BLAST-Start: [head closure Hc1 [Arthrobacter phage KeAlii] ],,NCBI, q1:s1 97.4138% 3.39501E-31 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.919, -2.7653702713535186, yes F: head-to-tail stopper SIF-BLAST: ,,[head closure Hc1 [Arthrobacter phage KeAlii] ],,YP_010678127,68.6957,3.39501E-31 SIF-HHPRED: Head completion protein gp16; Bacteriophage, SPP1, Portal Protein, Head completion proteins, Connector Complex, DNA Channel, VIRAL PROTEIN; 2.7A {Bacillus subtilis},,,7Z4W_5,91.3793,99.7 SIF-Syn: Head-to-tail stopper; upstream is head-to-tail adaptor and downstream is NKF, just like in KeAlii (AZ1). /note=Primary Annotator Name: Gallagher, Hannah /note=Auto-annotation: Genemark and Glimmer. Both call the start site at 8264. The start codon is ATG which has a high frequency in start codons. /note=Coding Potential: Coding potential is in the forward ORF strand only, indicating that this is a forward gene. GeneMark Self and Host found coding potential. Start site 8264 covers the entire coding potential. /note=SD (Final) Score: The Final score is -2.765 which is the highest of the two entries. This start site also has the highest Z-score of 2.919 which suggests the presence of a credible ribosome binding site. /note=Gap/overlap: There is a gap of 0 base pairs between start site of 8264 and the gene upstream which is very reasonable. The gap is conserved in other phages (Jameeni, KeAlli, and Wildwest). The length of the gene with no gap is 351 base pairs which is a valid gene length. /note=Phamerator: Pham 116108. Date 10/03/2023. It is conserved; and found in other phages (Janeemi, KeAlii, and Wildwest) that belong to the AZ1 subcluster. /note=Starterator: Start 31 was manually annotated in 0/263 non-draft genes in pham 116108. Start 31 is 8264 in Soondubu. Start 31 is called 0% of the non-draft phages in pham 116108. The other potential start site, 50, is not manually annotated in any other phages despite its presence in other phages. The gene does not have the most annotated start site in its sequence (start 35). This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene, and the likely start site is 8264. /note=Function call: Head-to-tail stopper. The top non-draft Phagesdb BLAST hits have a function of head-to-tail stopper [(Janeemi, 1e-27), (KeAlii, 1e-27), (Wildwest, 9e-28), (Adolin, 3e-27), (DrManhattan, 3e-27), etc.]. 5 of the top 5 NCBI Blastp hits call the function to be a head-to-tail stopper (all of which had an e-value under 5e-29, coverage >97%, and identity >61%). CDD has no hits for this sequence. HHpred had two hits for head-to-tail stopper (coverage >91%, probability >99.7%, and e-values <2.1e-15). This protein function is on the SEA-Phage-approved function list and has the HHPRED alignment to SPP1 16 required for this function. /note=Transmembrane domains: Deep TMHMM does not predict any TMD`s, therefore it is not a membrane protein. /note=Secondary Annotator Name: Faulkner, Cheyenne /note=Secondary Annotator QC: I agree with this annotation. No changes needed at this time. CDS 8626 - 8931 /gene="10" /product="gp10" /function="hypothetical protein" /locus tag="Soondubu_10" /note=Original Glimmer call @bp 8626 has strength 12.4; Genemark calls start at 8662 /note=SSC: 8626-8931 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein HOU48_gp11 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 99.0099% 1.18155E-29 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.315, -2.033982896655645, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp11 [Arthrobacter phage DrManhattan] ],,YP_009815354,74.5098,1.18155E-29 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bonthala, Praneel /note=Auto-annotation: Glimmer calls the start site at 8626 whereas GeneMark calls the start site at 8662. /note=Coding Potential: Coding potential in the ORF is in the forward strand only, indicating that this is a forward gene. The coding potential covers the entire strand. /note=SD (Final) Score: -2.034 for start 8626. This is the highest Final Score on PECAAN. /note=Gap/overlap: The gap is 11 for start 8626. This is the smallest gap on PECAAN. /note=Phamerator: Pham: 116309. Date 10/2/2023. The pham is shared by 100 members and by many AZ cluster members. /note=Starterator: Soondubu has a start site 23@8626 with 6 manual annotations. However, it was not the most commonly annotated start site, which is 24 found in 48 of 77 non-draft genomes. This start site also has the highest Z-score, which is 3.315. Additionally, when this start site is present, it is called 90% of the time. /note=Location call: Based on the above evidence, this is most likely a real gene with a start site at 8626. /note=Function call: No known function. There are no significant PhagesDB blast or NCBI blast hits (e-value < 10e-11) with a known function. HHPred had 1 relevant hit as a family of unknown function with >76% coverage and e-value equal to 0.053. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Moore, Joshua /note=Secondary Annotator QC: Mention the z-score. Start 8626 has a strong z-score and would help your argument. In the Starterator section, mention that Soondubu doesn’t have the Most Annotated start site, and that the start site you chose is called 90% of the time it is present. This makes it okay that it isn’t in the Most Annotated start site (since it can’t be). Typo in your Function call (“with a known function” -> “with an unknown function”). You could also ignore the HHPred hit since the e-value is really high, but I suppose it doesn’t hurt to mention. Everything else looks good. You also don’t need to have the synteny box filled out. CDS 8928 - 9338 /gene="11" /product="gp11" /function="tail terminator" /locus tag="Soondubu_11" /note=Original Glimmer call @bp 8928 has strength 7.78; Genemark calls start at 8928 /note=SSC: 8928-9338 CP: no SCS: both ST: SS BLAST-Start: [tail terminator [Arthrobacter phage Phives] ],,NCBI, q5:s6 97.0588% 1.89931E-47 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.568, -5.092387591382226, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage Phives] ],,YP_010677647,73.7226,1.89931E-47 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,96.3235,99.4 SIF-Syn: This is a Tail terminator. The gene upstream has NKF and the gene downstream is a major tail protein. This shows synteny with Adolin(AZ). /note=Primary Annotator Name: Faulkner, Cheyenne /note=Auto-annotation: Glimmer and Genemark are both used to call the start at 8928. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host, but the chosen start site is not fully included in the Host-trained GeneMark, while it is fully included in the Self-trained GeneMark. /note=SD (Final) Score: -5.092. This is not the best score, but is extremely close to the best score which is -5.030. Due to the overlap information below, this information may not be necessary. /note=Gap/overlap:-4. This a small overlap, which could mean that this gene is part of an operon. This means that there is no room for a gene upstream. This overlap is not conserved in other AZ phages such as Adolin & Amyev, which could mean that the start site to this gene is incorrectly called. /note=Phamerator: pham:116194 as of 10/4/2023. It is conserved, found in both Adolin(AZ) & Wildwest(AZ1). /note=Starterator:Start site 7 in starterator was manually annotated in 35/153 non-draft genomes and it’s called 100% of the time when present in the genome. Start 7 is not the most annotated start site & Soondubu does not have the most annotated start site. Start 7 is also found in Wildwest(AZ1) and Adolin(AZ1). This evidence agrees with the start site called by Glimmer & GeneMark /note=Location call: Based on the evidence above, this is a real gene and the start site is most likely 8928 as called by Glimmer and GeneMark. /note=Function call: Tail Terminator. The top 3 phagesdb BLAST hits have the function of Tail Terminator (E-Values<2*10^-38), and 3/5 of the NCBI BLAST hits also have a function of Tail Terminator (Coverage 73%+, 59%+ identity, & E-Value<8.8*10^-47). HHpred had a hit for a Tail Terminator with a probability of 99.4%, 96.3% coverage, and an E-value of 3.5*10^-10. CDD did not have any hits. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs thus, is not a membrane protein. /note=Secondary Annotator Name: Mathkour, Yusef /note=Secondary Annotator QC: I agree with this annotation. No changes needed at this time. CDS 9369 - 9917 /gene="12" /product="gp12" /function="major tail protein" /locus tag="Soondubu_12" /note=Original Glimmer call @bp 9369 has strength 19.24; Genemark calls start at 9369 /note=SSC: 9369-9917 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage Emotion]],,NCBI, q1:s1 98.3516% 1.61008E-92 GAP: 30 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.315, -2.0162541296952132, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage Emotion]],,WGH21361,82.6087,1.61008E-92 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_L,92.8571,98.6 SIF-Syn: This gene is a major tail protein. Upstream gene is tail terminator, and the downstream gene is tail assembly chaperone. Gene has synteny with Emotion. /note=Primary Annotator Name: Wong, Michael /note=Auto-annotation: Both Glimmer & Genemark that agree on same start site (9369); ATG start codon called /note=Coding Potential: Reasonable coding potential predicted within putative ORF. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: SD score is the best (i.e. -2.016) as it is the least negative out of all other options. /note=Gap/overlap: The gap with the upstream gene is reasonable (30bps). There are no alternative start candidates that are reasonable. The length of the gene is acceptable given the auto annotated start (chosen start site as well). /note=Phamerator: On 10/4/23, gene was found in pham #116263. The pham in which the gene is conserved is in other members of the cluster/subcluster to which the phage belongs; Emotion was used for comparison. Phamerator called for the function of the gene to be a major tail protein. Functions called were consistent and found in the approved function list. /note=Starterator: The reasonable start site choice that is conserved among members of the pham to which the gene belongs is 8. Start: 8 @9369 has 99 MA`s. Found in 99 of 101 of genes in pham. Start 99 is also found in Emotion. This evidence agrees with the start site called by Glimmer & GeneMark. The start number called the most often in the published annotations is 8, it was called in 99 of 101 of non-draft genes in pham. /note=Location call: The gathered evidence (z-score is the best (highest), codon is common (ATG), final score is the best (least negative)) suggests that the original start site of 9369 is the best possible start site. The gene is a real gene (conserved in phamerator and good coding potential) and the potential start site of 9369 is the most likely potential start site candidate. The potential start site candidate of 9369 seems the most likely as it covers all coding potentials and is called by both Glimmer and Genmark. /note=Function call: Predicted function is major tail protein, based on hits from PhagesDB, BLASTp, and NCBI BLASTp, both of which had 1 hit with high query coverage (>92%), high identity (>75%), and a low e-value (3e-75). Within HHPred’s best Pfam hit, there was strong evidence that matched the portal protein function (98.6 probability, ~93% coverage, 2.6e-6). There are no specific requirements listed on the approved functions list. CDD had no hits. /note=Transmembrane domains: There is an absence of TMDs as predicted by DeepTMHMM. /note=Secondary Annotator Name: Chang, Amanda /note=Secondary Annotator QC: I agree with this annotation. All of the evidence provided support the function and location calls made. I would also check once more for transmembrane domains using DeepTMHMM as the TMD tools on PEECAN are out of date. CDS 10020 - 10292 /gene="13" /product="gp13" /function="tail assembly chaperone" /locus tag="Soondubu_13" /note=Original Glimmer call @bp 10020 has strength 17.86; Genemark calls start at 10020 /note=SSC: 10020-10292 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage Liebe] ],,NCBI, q3:s2 97.7778% 1.05839E-28 GAP: 102 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.077, -3.544192673744953, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage Liebe] ],,YP_009817046,74.1573,1.05839E-28 SIF-HHPRED: Phage_TAC_10 ; Phage tail assembly chaperone,,,PF10963.11,80.0,95.1 SIF-Syn: The gene upstream is major tail protein, and the gene downstream is tail assembly chaperone (like in phages Liebe and Maureen). /note=Primary Annotator Name: Yao, Jiayu /note=Auto-annotation: Glimmer, GeneMark (host) and GeneMark (self) all agreed on the start site 10020. /note=Coding Potential: coding potential is covered within the entire ORF, and no reverse coding potential is shown /note=SD (Final) Score: -3.544, within the threshold and only one candidate is shown Gap/overlap: 102. This is a large gap (>-1/+4) but there is little coding potential in the gap which represents a gene present in the region, and the gap is conserved in more than two phages. /note=Phamerator: 10/8/2023, it was found in pham 116385. There were 72 members in this pham, and 19 of them were drafts. Phages JasmineDragon_Draft and MiniMommy_Draft from the AZ cluster also have the same pham. /note=Starterator: 10/2/23, the most often called start site number was 6, it was called in 48 of the 53 non-draft genes in the pham. The auto-annotated start is 6, and it should be the start site for this gene because it is the most conserved among all the genes and the most annotated start site, which is location 10020. /note=Location call: Based on the prediction of GeneMark and Glimmer, together with other evidence like Z-value (3.077), final score (-3.544) and gap, the gene should be real and the start site is 10020. /note=Function call: Tail assembly chaperone, because the genes have the same function, Maureen_13 and Liebe_13 have relatively close scores (107) to Soondubu_draft_13 (185), and the e-values are close to 0. Maureen and Liebe were found using Phagesdb BLAST. NCBI BLAST shows that the two hits with the same function, gene from phage VResidence and gene from phage Liebe have highest scores (108 and 109, except those of the hypothetical proteins) with high coverage (97% and 94%) and identity (around 58%) while the e-value is close to 0. The coverage percentage of tail assembly chaperone from HHpred is around 80% with a relatively low e-value (0.82). No CDD hits are found. Since the function is in the approved function list, the function of this gene should be tail assembly chaperone. /note=Transmembrane domains: No transmembrane domains are predicted. /note=Secondary Annotator Name: Moore, Joshua /note=Secondary Annotator QC: Mention that coding potential covers the entire ORF, not just the start site. When talking about the gap size being big, also mention that this gap appears to be conserved in at least 2 other phages (I quickly looked and it looked conserved in London). Don’t mention drafts in the Phamerator section. You can also mention that this is the only start site for Soondubu on Starterator. For the function call, you should mention that Maureen and Liebe were found using Phagesdb BLAST. You should also fully capitalize BLAST for NCBI BLast (“BLast” -> “BLAST”). For NCBI BLAST, checkmark the boxes on PECAAN for VResidence and Liebe that you mentioned. Your synteny box appears unfinished. CDS join(10020..10286,10286..10639) /gene="14" /product="gp14" /function="tail assembly chaperone" /locus tag="Soondubu_14" /note= /note=SSC: 10020-10639 CP: yes SCS: neither ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage VResidence]],,NCBI, q6:s4 95.1456% 4.05588E-80 GAP: -273 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.077, -3.544192673744953, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage VResidence]],,UYL87620,77.0732,4.05588E-80 SIF-HHPRED: SIF-Syn: The gene upstream is a tape measure protein and the gene downstream is a tail assembly protein (like in phages Asa16 and Ascela). /note=Primary Annotator Name: Mathkour, Yusef /note=auto-annotation: The glimmer and gene mark start do agree with one another, the glimmer start= 10361 and the genemark start= 10361.The start codon is GTG therefore there is a Medium probability of using this codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The chosen start site covers all coding potential. /note=SD (Final) Score: -6.994 is the second best Final score on Peccan, The Z score is 0.888 this is the second best Z score value on Peccan. The gene with the best z score and final score are /note=Gap/overlap: 68BP. Somewhat large, but ultimately reasonable because the gap is conserved in other phages (Adumb2043, Phives) and there is no coding potential in the gap that might be a new gene. /note=phamerator: pham: 107294 date 10/4/23, its conserved, found in the AZ1 cluster with phages; Adumb2043 (AZ1) and Phives(AZ1) . The assigned function for both genes in tail assembly chaperone. /note=Starterator:Start site 5 was found in Starterator Found in 3 of 5 ( 60.0% ) of genes in pham there were no Manual Annotations of this start. Start site 5 was called 100.0% of time when present. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 10,361. /note=tail assembly chaperone; there is evidence of a tail assembly chaperone protein in comparable phages. On phagesdb Blast phage Adumb2043 has an e value of 1e-28, and a score of 123, the gene is a tail assembly chaperon. This shows evidence that this gene is also a tail assembly chaperon. NCBI BLAST has 97.8261% coverage, an evalue of 6.79913e-33 and an identity of 28.7805% to a tail assembly chaperon /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Moore, Joshua /note=Secondary Annotator QC: All the appropriate boxes appear to be checked, but there are no PECAAN notes. This might have to do with the edits made to this gene as a translational frameshift. CDS 10652 - 13690 /gene="15" /product="gp15" /function="tape measure protein" /locus tag="Soondubu_15" /note=Original Glimmer call @bp 10652 has strength 12.94; Genemark calls start at 10652 /note=SSC: 10652-13690 CP: yes SCS: both ST: NA BLAST-Start: [tail length tape measure protein [Arthrobacter phage Adaia] ],,NCBI, q71:s54 44.8617% 1.48331E-159 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.326, -2.0720764396375664, yes F: tape measure protein SIF-BLAST: ,,[tail length tape measure protein [Arthrobacter phage Adaia] ],,YP_010649368,51.2539,1.48331E-159 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,86.4624,99.7 SIF-Syn: The tape measure protein of Soondubu exhibits synteny with the tape measure protein in VResidence (AZ) and KeAlii (AZ). The flanking genes are similar and all 3 genomes exhibit an overlapping tail assembly chaperone directly before the tape measure protein. /note=Gene (stop@13690 F) /note=PECAAN Notes /note=Primary Annotator Name: Moore, Joshua /note=Auto-annotation: Both GeneMark and Glimmer call the start at 10652. /note=Coding Potential: The ORF has significant coding potential for this gene only on the forward strand. Potential is found in GeneMark Host and GeneMark Self. /note=SD (Final) Score: -2.072. This is the best final score on PECAAN. The z-score is also the highest at 3.326. The start codon is GTG. /note=Gap/overlap: The gap is only 12bp. This is small enough to not be of note. /note=Phamerator: Pham 111337. 10/04/23. This gene is an orpham. /note=Starterator: This gene is an orpham. /note=Location call: Based on the above evidence, this gene is a real gene, and has a start site at 10652. /note=Function call: Tape measure protein. Many PhagesDB BLAST hits agree with the suggested function of a tape measure protein with e-values clustered between e-144 to e-83. There are many NCBI BLAST hits that also have a function of tape measure protein (identity 40%+, coverage 40%+, e-value < e-150). CDD had no relevant hits. HHPred had a hit that agreed with tape measure protein with 99.7% probability, 86.4624% coverage, and an e-value of 4.5e-8. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Chang, Amanda /note=Secondary Annotator QC: I agree with this annotation. The evidence supports the location and function call sufficiently. I would a brief explanation for the 12bp gap as it falls out of the preferred gap/overlap threshold of +4/-1 bp. Be sure to add in what the function is in the synteny box as well. CDS 13702 - 14559 /gene="16" /product="gp16" /function="minor tail protein" /locus tag="Soondubu_16" /note=Original Glimmer call @bp 13702 has strength 12.93; Genemark calls start at 13702 /note=SSC: 13702-14559 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 99.2982% 8.19306E-99 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.315, -1.953940808934884, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage VroomVroom]],,WIC90166,68.8811,8.19306E-99 SIF-HHPRED: HYPOTHETICAL PROTEIN 19.1; VIRAL PROTEIN, DISTAL TAIL PROTEIN; 2.95A {BACILLUS PHAGE SPP1},,,2X8K_A,98.9474,100.0 SIF-Syn: Minor tail protein, upstream gene is tape measure protein and downstream gene is minor tail protein just like in phages Emotion and VroomVroomn. This is conserved in most phages. /note=Primary Annotator Name: Shao, Sarah /note=Auto-annotation: Glimmer and GeneMark, agreed start site (13702), ATG /note=Coding Potential: Reasonable coding potential predicted, start site covers all coding potential in GeneMark Self and Host. /note=SD (Final) Score: -1.954. It is the best final score on PECAAN. /note=Gap/overlap: 11bp gap (upstream). Reasonable gap, no coding potential shown in the gap (GeneMark), no other potential start codons that would minimize the gap. /note=Phamerator: pham 116299. Date 10/4/23. It is conserved, found in many other phages including Phives (AZ), Yang (AZ). 104 phages are members of the pham, almost all that list a function have function of minor tail protein and have similar gene length (850-900bp). /note=Starterator: Start site 10 in Starterator was the most annotated of the candidate start sites called for Soondubu. Start site 10 was found in 14/104 genes in pham and manually annotated in 3/80 genes in this pham. Start 10 is 13702 in Soondubu. This agrees with the start site predicted by Glimmer and GeneMark. Date 10/4/23. /note=Location call: Based on the above evidence, this is a real gene with a likely start site at 13702. /note=Function call: Minor tail protein. The top non-draft phagesdb BLAST hits all have the function of minor tail protein [(Emotion, 4e-90), (VroomVroom, 2e-86)]. The 2 top NCBI BLAST hits have the function of minor tail protein [(Vroom Vroom, 99% coverage, 54.93% identity, e-value 8e-99), (Emotion, 100% coverage, 55.75% identity, e-value 1e-98)]. There were no hits from CDD. HHpred showed several relevant hits, most of which had the function of distal tail protein (99.97% probability, 98% coverage, e-value of 2.9e-28). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Mathkour, Yusef /note=Secondary Annotator QC:I agree with this annotation. add the down stream gene in the syntony box CDS 14575 - 15576 /gene="17" /product="gp17" /function="minor tail protein" /locus tag="Soondubu_17" /note=Original Glimmer call @bp 14575 has strength 14.53; Genemark calls start at 14575 /note=SSC: 14575-15576 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Emotion]],,NCBI, q1:s1 99.6997% 2.45468E-108 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.823, -2.9625409806968013, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Emotion]],,WGH21366,66.4615,2.45468E-108 SIF-HHPRED: Receptor Binding Protein; beta sandwich domain, phage receptor binding protein, Lactococcus lactis pellicle cell wall polyphosphosaccharide, VIRAL PROTEIN; 1.75A {Lactococcus phage 1358},,,4L9B_A,49.5495,99.0 SIF-Syn: minor tail protein; upstream is a minor tail protein and downstream is a minor tail protein, just like in VroomVroom (AZ). /note=Primary Annotator Name: Chang, Amanda /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 14575. The start codon is an ATG start. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found at high intensity on GeneMark Host and Self. /note=SD (Final) Score: -2.963. It is the best final score on PECAAN. /note=Gap/overlap: 15bp. This gap is bigger than the recommended gap size (+4bp) but is conserved in Emotion and VroomVroom (AZ4). There is also no coding potential in this gap which would indicate a gene being added. /note=Phamerator: pham: 116990 as 10/4/23. It is conserved; found in Emotion and VroomVroom. There are 6 members in this pham; 3 of which are drafts. /note=Starterator: Start site 1 in Starterator was manually annotated in 3/3 non-draft genes in this pham. Start site 1 is at 14575 in Soondubu. This evidence agrees with the suggested start sites from Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is most likely a real gene and most likely starts at 14575. /note=Function call: Minor tail protein. The top 3 non-draft Phagesdb Blast hits have the function of minor tail protein with e-values all smaller than 10^-50. There were no HHpred hits for minor tail protein functions. However, there are HHpred hits that are related to the functions of phage tails such as “Receptor-binding protein of phage tail.” SEA-PHAGES also states that function calls can be based on synteny, specifically with structural genes being seen in a certain order, which for this case would be a minor tail protein following a tape measure protein. This sequence is seen in Soondubu for this gene which is additional evidence of this gene being a minor tail protein. The top 2 NCBI Blast hits call a minor tail protein function for this gene and both have e-values under 10^-100, coverage +99%, and identity +50%. CDD had no hits. The SEA-PHAGES official function list states that if the protein is glycine-rich in addition with syntenic evidence, it could be evidence to call the gene a minor tail protein. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Shao, Sarah /note=Secondary Annotator QC: /note=1. For phamerator, how many other phages in pham? /note=2. Minor fix, try to make sure only 2-3 pieces from evidence are selected for each tool. CDS 15573 - 16658 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="Soondubu_18" /note=Original Glimmer call @bp 15573 has strength 15.65; Genemark calls start at 15573 /note=SSC: 15573-16658 CP: yes SCS: both ST: SS BLAST-Start: [Tail protein [Brevibacterium phage Rousseau]],,NCBI, q3:s4 98.0609% 1.16444E-96 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.999, -3.5088110057482766, yes F: minor tail protein SIF-BLAST: ,,[Tail protein [Brevibacterium phage Rousseau]],,CAH1193710,59.2493,1.16444E-96 SIF-HHPRED: Prophage MuSo2, 43 kDa tail protein; MuSo2, Shewanella oneidensis MR-1, Structural Genomics, PSI-2, Protein Structure Initiative, Midwest Center for Structural Genomics, MCSG; HET: MSE; 2.1A {Shewanella oneidensis} SCOP: l.1.1.1, b.106.1.1,,,3CDD_E,95.0138,99.8 SIF-Syn: Minor tail protein. Upstream is a minor tail protein and downstream is a minor tail protein, just like in VroomVroom (AZ) and Emotion (AZ). /note=Primary Annotator Name: Gallagher, Hannah /note=Auto-annotation: Genemark and Glimmer. Both call the start site at 15573. The start codon is GTG which has a high frequency in start codons. /note=Coding Potential: Coding potential is in the forward ORF strand only, indicating that this is a forward gene. GeneMark Self and Host found coding potential. Start site 15573 covers the entire coding potential. /note=SD (Final) Score: The Final score is -3.509, the highest of all start sites. This start site also has the highest Z-score of 2.999 which suggests the presence of a credible ribosome binding site. /note=Gap/overlap: -4 overlap with the upstream gene, indicating the gene is in an operon. This is the longest gene length of the start site candidates at 1086 bp which is greater than the minimum bp to be a gene. /note=Phamerator: Pham 114910. Date 10/04/2023. It is conserved; and found in other phages (Aoka, BirthdayBoy, Candle, and Eraser, etc.) that belong to the AZ, AE, DV, FO, DG, R, DV, or AC clusters. /note=Starterator: Start 9 was manually annotated in 11/77 non-draft genes in Pham 114910 and called 100% of the time when present. Start 9 is also present in cluster AC and cluster R. Start 9 is 15573 in Soondubu. The most manually annotated start 14 is called for in 32/77 non-draft genes, but Soondubu does not have this start site. /note=Location call: Based on the above evidence, this is a real gene, and the likely start site is 15573. /note=Function call: Minor tail protein. The top non-draft Phagesdb BLAST hits have a function of a minor tail protein ([VroomVroom (AZ4), 8e-88], [LuckyBarnes (unknown), 1e-80], [MaGuCo (AZ2), 4e-73], [Liebe (AZ2), 6e-72], and [Maureen (AZ2), 6e-72]). 5 of the top NCBI Blast hits that had a function were minor tail protein with an e-value <1.31e-82, coverage>98%, and identity>39. HHPred has no hits related to collagen-like proteins, but more than three hits are related to tail proteins with a probability>99.7, coverage>95%, and e-value<2.5e-14. CDD has no hits for this sequence. The sequence is glycine-rich and in the syntenic region of minor tail proteins allowing it to correspond to SEA-Phages approved function list of a minor tail protein. /note=Transmembrane domains: Deep TMHMM does not predict any TMDs, therefore it is not a membrane protein. It is likely an outside protein. /note=Secondary Annotator Name: Moore, Joshua /note=Secondary Annotator QC: When referencing other phages, list their cluster right after their name (i.e. BirthdayBoy (DV). On Starterator, when referencing other clusters (AC, R), explain why they are significant. For Phagesdb and NCBI BLAST, you should not check more than 3 boxes of evidence from each tool. CDS 16663 - 20409 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="Soondubu_19" /note=Original Glimmer call @bp 16663 has strength 16.05; Genemark calls start at 16663 /note=SSC: 16663-20409 CP: yes SCS: both ST: NA BLAST-Start: [minor tail protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 24.6795% 2.20015E-43 GAP: 4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.874, -3.5113798064032244, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage VroomVroom]],,WIC90169,16.2512,2.20015E-43 SIF-HHPRED: Probable central straight fiber; Bacteriophage, Siphophage, T5, baseplate, VIRAL PROTEIN; 3.88A {Escherichia phage T5},,,7ZQB_i,14.4231,97.6 SIF-Syn: While this gene is an orpham, it is syntenic with genes 20 in Amyev and Asa16, which are minor tail proteins. /note=Primary Annotator Name: Bonthala, Praneel /note=Auto-annotation: Glimmer and GeneMark call the start site at 16663. /note=Coding Potential: Coding potential in the ORF is in the forward strand only, indicating that this is a forward gene. The coding potential covers the entire strand. /note=SD (Final) Score: -3.511 for start 16663. This is the highest Final Score on PECAAN. /note=Gap/overlap: The gap is 4 for start 16663. This is the smallest gap on PECAAN. /note=Phamerator: Pham: 109857. The pham is an orpham. /note=Starterator: There is no starterator report for this gene, indicating that it may be an orpham. It also shares no synteny with other phages in PECAAN. /note=Location call: Based on the above evidence, this is most likely a real gene with a start site at 16663. /note=Function call: Minor tail protein. PhagesDB blast has 2 significant hits of a minor tail protein with e-value < 1e-65. There are no significant hits with NCBI blast or HHPred, as all hits have a coverage <15% despite low e-values. However, because the gene is syntenic with minor tail proteins in other phages such as Amyev and Asa16, there is evidence it is a minor tail protein. /note=Secondary annotator: Yao, Jiayu /note=Secondary annotator QC: I agree with this annotation. CDS 20423 - 20572 /gene="20" /product="gp20" /function="hypothetical protein" /locus tag="Soondubu_20" /note=Original Glimmer call @bp 20423 has strength 22.75; Genemark calls start at 20423 /note=SSC: 20423-20572 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Salipaludibacillus agaradhaerens] ],,NCBI, q1:s1 89.7959% 0.00865075 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.227, -2.484606983760709, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Salipaludibacillus agaradhaerens] ],,WP_257821332,69.5652,0.00865075 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Faulkner, Cheyenne /note=Auto-annotation: Glimmer and Genemark are both used to call the start at 20423. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host, but the chosen start site is not fully included in the GeneMark Self and Host, it is very close. /note=SD (Final) Score: -2.485. This is the best score found. /note=Gap/overlap: 13. This is a small gap, and due to the lack of synteny Soondubu has with other AZ phages in this region of the genome, it is unlikely there is a gene upstream of this gene. /note=Phamerator: pham:112301 as of 10/4/2023. It is an orpham /note=Starterator: No starterator profile because this gene is an orpham /note=Location call: Based on the data above /note=Function call: NKF. The top 3 phagesdb BLAST hits have the function of Unknown (E-Values=0.23) and only one of the hits calls a function of Tape measure protein (E-value=2.6) None of these E-values are compelling evidence. 3/3 of the NCBI BLAST hits also have an unknown function(Coverage 77%+, 39%+ identity, & E-Values<0.018) These scores do not provide compelling evidence of a function. HHpred had CDD did not have any hits. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs thus, is not a membrane protein. /note=Secondary Annotator Name: Mathkour, Yusef /note=Secondary Annotator QC: I agree with this annotation. No changes needed at this time. CDS 20650 - 21585 /gene="21" /product="gp21" /function="endolysin" /locus tag="Soondubu_21" /note=Original Glimmer call @bp 20650 has strength 13.58; Genemark calls start at 20743 /note=SSC: 20650-21585 CP: yes SCS: both-gl ST: NI BLAST-Start: [N-acetylmuramoyl-L-alanine amidase [Zhihengliuella halotolerans]],,NCBI, q1:s1 99.3569% 6.26071E-128 GAP: 77 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.823, -3.4897410997597818, yes F: endolysin SIF-BLAST: ,,[N-acetylmuramoyl-L-alanine amidase [Zhihengliuella halotolerans]],,WP_102157715,71.246,6.26071E-128 SIF-HHPRED: N-acetylmuramoyl-L-alanine amidase amiD; ZINC AMIDASE, PGRP, Peptidoglycan Recognizing Protein, AmpD, N-ACETYLMURAMYL-L-ALANINE AMIDASE, Cell wall biogenesis/degradation, Hydrolase, Lipoprotein, Membrane, Metal-binding; HET: GOL, AH0; 1.75A {Escherichia coli},,,3D2Y_A,93.5691,99.9 SIF-Syn: Upstream is NKF and downstream is NKF. No synteny with other phages. /note=Primary Annotator Name: Wong, Michael /note=Auto-annotation: Glimmer (20650) and GeneMark (20743) do not agree on the same start site; ATG start codon called for 20650 /note=Coding Potential: Reasonable coding potential predicted within putative ORF (weak coding potential before 20900). Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: SD score is the best (i.e. -3.490) as it is the least negative out of all other options. /note=Gap/overlap: The gap with the upstream gene is 77 bps (the smallest gap compared to other start sites). There are no alternative start candidates that are reasonable. The length of the gene is acceptable given the auto annotated start (chosen start site as well). /note=Phamerator: On 10/4/23, gene was found in pham #115854. The pham in which the gene is conserved is in other members of the cluster/subcluster to which the phage belongs; CallinAllBarbz and BaileyBlu were used for comparison. Phamerator called for the function of the gene to be an endolysin. Functions called were consistent and found in the approved function list. /note=Starterator: No currently available starterator! /note=Location call: The gathered evidence (z-score is the best (highest), codon is common (ATG), final score is the best (least negative)) suggests that the original start site of 20650 is the best possible start site. The gene is a real gene (conserved in phamerator and good coding potential) and the potential start site of 20650 is the most likely potential start site candidate. The potential start site candidate of 20650 seems the most likely as it covers all coding potentials and is called by only Glimmer. /note=Function call: Predicted function is endolysin, N-acetylmuramoyl-L-alanine amidase, based on hits from PhagesDB, BLASTp, and NCBI BLAST. For PhagesDB, there was a low e-value (1e-35). For NCBI BLAST, there was query coverage (>99%), decent identity (>63%), and a low e-value (6.26e-128). Within HHPred’s best Pfam hit, there was strong evidence that matched the portal protein function (99.9 probability, ~94% coverage, 1.6e-20). There are no specific requirements listed on the approved functions list. CDD had no hits. /note=Transmembrane domains: There is an absence of TMDs as predicted by DeepTMHMM. /note=Secondary Annotator Name: Faulkner, Cheyenne /note=Secondary Annotator QC: I agree with this annotation. No changes needed at this time. CDS 21582 - 21845 /gene="22" /product="gp22" /function="hypothetical protein" /locus tag="Soondubu_22" /note=Original Glimmer call @bp 21582 has strength 12.88; Genemark calls start at 21582 /note=SSC: 21582-21845 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Pseudarthrobacter siccitolerans]],,NCBI, q14:s19 85.0575% 5.22541E-26 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.668, -5.412417138717724, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Pseudarthrobacter siccitolerans]],,WP_050053703,66.3043,5.22541E-26 SIF-HHPRED: SIF-Syn: The gene upstream is endolysin, the gene downstream is NKF. /note=Primary Annotator Name: Yao, Jiayu /note=Auto-annotation: Glimmer, GeneMark (host) and GeneMark (self) all agreed on the start site 21582. The gene displays synteny with other genes. /note=Coding Potential: coding potential is covered within the start site, and no reverse coding potential is shown /note=SD (Final) Score: -5.412, the second least negative score among all the start sites /note=Gap/overlap: -4, relatively small overlap and gap in between and much smaller gap than other start sites. /note=Phamerator: 10/4/2023, it was found in pham 117690. There were 57 members in this pham, and 16 of them were drafts. Phages Adolin and DrManhattan from the AZ cluster also have the same pham. /note=Starterator: 10/9/23, the most often called start site number was 13, it was called in 10 of the 40 non-draft genes in the pham. It does not have auto-annotated start. Since start number 13 does not exist in this gene, Starterator fails to provide useful information. /note=Location call: Based on the prediction of GeneMark and Glimmer, together with other evidence like Z-value (1.668), final score (-5.412) and gap value, the gene should be real and the start site is 21582. /note=Function call: Function unknown, because the gene with unknown function, VroomVroom_22 and DrManhattan_21 from Phagesdb test all have the closest scores (99) to Soondubu_22 (171), and the e-values are close to 0. NCBI Blast shows that the hypothetical protein has the highest coverage and identity. HHpred also gave no useful results. No CDD hits are found. The function of this gene should be no known function. /note=Transmembrane domains: No transmembrane domains are predicted. /note=Secondary Annotator Name: Shao, Sarah /note=Secondary Annotator QC: /note=1. Be careful writing notes for final score, there is one more negative score. /note=2. Gap/overlap - is overlap conserved? /note=3. Starterator - now has a starterator report, add this section. /note=4. Function call - for phagesdb no need to mention scores, just the e-score CDS 21859 - 22137 /gene="23" /product="gp23" /function="membrane protein" /locus tag="Soondubu_23" /note=Original Glimmer call @bp 21859 has strength 17.28; Genemark calls start at 21859 /note=SSC: 21859-22137 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Emotion]],,NCBI, q1:s1 89.1304% 4.19667E-26 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.326, -2.2821867859826788, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Emotion]],,WGH21372,75.0,4.19667E-26 SIF-HHPRED: SIF-Syn: The downstream function is (NFK). The upstream function is deoxynucleoside monophosphate kinase /note=Primary Annotator Name: Mathkour, Yusef /note=auto-annotation: The glimmer and gene mark start do agree with one another, the glimmer start= 21859, and the genemark start= 21859. The start codon is ATG therefore there is a Higher probability of using this codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The chosen start site covers all coding potential. /note=SD (Final) Score: -2.282 is the best Final score on Peccan, The Z score is 3.326 this is the best Z score value on Peccan. /note=Gap/overlap: 13bp. the gap is reasonable because the gap is conserved in other phages (phives) and there is no coding potential in the gap that might be a new gene. This gene is also a orpham /note=phamerator: pham: 118590 date 10/9/23, it`s conserved, and found in the AZ1 cluster with phage; phives (AZ1) and AZ4 cluster phage;emotion. The assigned function for both genes in NKF. /note=Starterator: No starterator found /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 22137. /note=Function call: NKF; there is no evidence of a function in comparable phages. On HHPRED the most similar function call has a probability is 53.5 which is below the 80% threshold showing a sign that it is not closely related in function. The % Coverage: 61.9565% which is above the 35% threshold required but the E-value: is 140, this is significantly higher than what is required which is below 1*10^-6 showing it has no correlation to a known function. Using phagesDB blast, phage Emotion has no known function and an e value of 4e-24 and a score of 108 suggesting similarities between the genes. However, DeepTMHMM shows evidence of 2 transmembrane proteins therefore looking at the seaphages function call list, its a membrane protein. /note=Transmembrane domains: DeepTMHMM 2 TMDs both of them are 19 amino acids long, because this protien has no known function they are both membrane proteins /note=Secondary Annotator Name: Chang, Amanda /note=Secondary Annotator QC: For the phamerator notes, it is stated that the gene is conserved in other phages within the AZ cluster however according to pham maps, this is not true as the gene is an orpham as indicated by the starterator line. I would fix the wording of that because it`s a bit confusing/contradicting. I would include the evidence from PhagesDB Blastp and NCBI Blast as well. It`s a weird that PhagesDB and NCBI contradict, but for both, the e-values are quite good. I think this gene could be a membrane protein according to the evidence provided by NCBI and the fact that you received two TMDs from DeepTMHMM which is a criteria listed on the SEA-PHAGES official function list. There are also grammatic/spelling errors. CDS 22265 - 22837 /gene="24" /product="gp24" /function="deoxynucleoside monophosphate kinase" /locus tag="Soondubu_24" /note=Original Glimmer call @bp 22265 has strength 20.19; Genemark calls start at 22265 /note=SSC: 22265-22837 CP: yes SCS: both ST: SS BLAST-Start: [deoxynucleoside monophosphate kinase [Arthrobacter phage KeAlii] ],,NCBI, q1:s1 97.3684% 4.75686E-63 GAP: 127 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.919, -2.7653702713535186, yes F: deoxynucleoside monophosphate kinase SIF-BLAST: ,,[deoxynucleoside monophosphate kinase [Arthrobacter phage KeAlii] ],,YP_010678142,65.0246,4.75686E-63 SIF-HHPRED: DEOXYNUCLEOSIDE MONOPHOSPHATE KINASE; TRANSFERASE, PHOSPHOTRANSFERASE; HET: OCS, DGP; 2.0A {Enterobacteria phage T4} SCOP: c.37.1.1,,,1DEK_A,96.8421,99.8 SIF-Syn: /note=Gene (stop@22837 F) /note=PECAAN Notes /note=Primary Annotator Name: Moore, Joshua /note=Auto-annotation: Both GeneMark and Glimmer call the start at 22265. /note=Coding Potential: The ORF has significant coding potential for this gene only on the forward strand. Potential is found in GeneMark Host and GeneMark Self. /note=SD (Final) Score: -2.765. This is the best final score on PECAAN. The z-score is also the highest at 2.919. The start codon is ATG. /note=Gap/overlap: The gap is 127bp. This is generally conserved in KeAlii (AZ) and Nitro (AZ). /note=Phamerator: Pham 97531. 10/04/23. It is conserved, and found in KeAlii (AZ) and Nitro (AZ) /note=Starterator: Soondubu does not have the most annotated start in Starterator. However, its start is called 98.3% of the time it is present, and agrees with GeneMark and Glimmer. It has 39/190 manual annotations. /note=Location call: Based on the above evidence, this gene is a real gene, and has a start site at 22265. /note=Function call: Deoxynucleoside monophosphate kinase. Many PhagesDB BLAST hits agree with the suggested function of a tape measure protein with e-values clustered between e-51 to e-44. There are many NCBI BLAST hits that also have a function of tape measure protein (identity 51%+, coverage 97%+, e-value < e-59). CDD had a relevant hit for deoxynucleoside monophosphate knase (20.26% identity, 81% coverage, 5.46 e-17 e-value). HHPred had 2 hits that agreed with deoxynucleoside monophosphate kinase (99.8% probability, 96%+ coverage, e-value < e-15). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Yao, Jiayu /note=Secondary Annotator QC: I agree with this annotation. CDS 22948 - 23559 /gene="25" /product="gp25" /function="hypothetical protein" /locus tag="Soondubu_25" /note=Original Glimmer call @bp 22948 has strength 13.99; Genemark calls start at 22948 /note=SSC: 22948-23559 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Arthrobacter sp. EPSL27] ],,NCBI, q7:s10 91.133% 1.53006E-67 GAP: 110 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.192, -5.358093050197135, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter sp. EPSL27] ],,WP_066431321,69.7436,1.53006E-67 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shao, Sarah /note=Auto-annotation: Glimmer and GeneMark, agreed start site (22948), ATG /note=Coding Potential: Reasonable coding potential predicted, start site covers all coding potential in GeneMark Self and Host. /note=SD (Final) Score: -5.358. It is the best final score on PECAAN. /note=Gap/overlap: 110bp gap (upstream). Somewhat large, but reasonable because gap is also shown in other phages (Nitro, Adumb2043) and there is no coding potential in this gap (GeneMark Host/Self). /note=Phamerator: pham 1819. Date 10/4/23. It is conserved, found in many other phages including Nitro (AZ), Adumb2043 (AZ), CallinAllBarbz (FP). 71 phages are members of the pham, almost none show a function and all have similar gene length (575-615bp). /note=Starterator: Start site 19 is the “Most Annotated” start, called in 35/52 non-draft genes in the pham (92.5% of time when present). Start 19 is 22948 in Soondubu. This evidence agrees with the start site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene with a likely start site at 22948. /note=Function call: Function unknown (NFK). The top non-draft phagesdb BLAST hits all have function unknown [(Nitro, 6e-56), (Adumb2043, 1e-55)]. The top NCBI BLAST hits also are hypothetical proteins [(EPSL27, coverage 91%, identity 60.54%, e-value 2e-67), (Adumb2043, coverage 94%, identity 55.21%, e-value 3e-67)]. There were no hits from CDD. No significant HHpred hits (all hits have large e-values >50). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Mathkour, Yusef /note=Secondary Annotator QC:I agree with this annotation. add the synteny box. CDS 23790 - 24626 /gene="26" /product="gp26" /function="exonuclease" /locus tag="Soondubu_26" /note=Original Glimmer call @bp 23790 has strength 15.05; Genemark calls start at 23790 /note=SSC: 23790-24626 CP: yes SCS: both ST: SS BLAST-Start: [exonuclease [Arthrobacter phage Phives] ],,NCBI, q1:s1 98.5611% 2.55725E-174 GAP: 230 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.315, -2.305049668942183, yes F: exonuclease SIF-BLAST: ,,[exonuclease [Arthrobacter phage Phives] ],,YP_010677662,93.1408,2.55725E-174 SIF-HHPRED: Mitochondrial genome maintenance exonuclease 1; human MGME1, DNA complex, DNA exonuclease, DNA BINDING PROTEIN; 2.702A {Homo sapiens},,,5ZYT_B,83.0935,99.8 SIF-Syn: Exonuclease; upstream is pham 1819 and downstream is LAGLIDADG endonuclease like in Phives (AZ) [note the LAGLIDADG is not directly downstream but similar in distance]. /note=Primary Annotator Name: Chang, Amanda /note=Auto-annotation: Glimmer and GeneMark. Both agree on the start at 23790. The start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found at high intensity on GeneMark Host and Self. /note=SD (Final) Score: -2.305. This is the best final score on PECAAN. The RBS score is good evident of the z-score being greater than 2. /note=Gap/overlap: There is a gap of 230bp. This gap is quite large but it is conserved in other phages in this cluster such as Phives (AZ) and Ascela (AZ) and there is very little coding potential in this gap. /note=Phamerator: pham: 106209 as of 10/4/23. It is conserved in Phives (AZ) and Ascela (AZ). /note=Starterator: Start site 38 is the most manually annotated start with 53 of 125 non-draft genes in this pham. Start site 38 is at 23790 in Soondubu. This start site agrees with the start sites from GeneMark and Glimmer. /note=Location call: Based on the evidence above, this is most likely a real gene and most likely starts at 23790. /note=Function call: Exonuclease. 3 of the top 5 non-draft Phagesdb Blast hits call the function to be exonuclease with the other two being specified exonucleases (Cas4 family exonuclease and RecB-like exonuclease). Each have e-values smaller than 10^-130. The top 2 HHpred hits call the genes function as an exonuclease specifically for mitochondrial genome maintenance. These HHpred hits have high probability (+99%), high coverage (+80%), and low e-values (<10^-15). 3 out of the top 5 NCBI Blast hits call the function of this gene to be an exonuclease. These NCBI Blast hits had low e-values (<10^-150), high coverage (+98%), and high identity (+80%). There were no CDD hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gallagher, Hannah /note=Secondary Annotator QC: Talk about how the final score relates to ribosome binding. Everything else is good, I agree with the primary annotator! CDS 24626 - 24961 /gene="27" /product="gp27" /function="hypothetical protein" /locus tag="Soondubu_27" /note=Original Glimmer call @bp 24626 has strength 14.07; Genemark calls start at 24626 /note=SSC: 24626-24961 CP: yes SCS: both ST: NI BLAST-Start: [secreted protein [Mycobacterium phage Theia] ],,NCBI, q1:s1 98.1982% 2.96995E-19 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.752, -4.015954512931413, yes F: hypothetical protein SIF-BLAST: ,,[secreted protein [Mycobacterium phage Theia] ],,YP_009214314,60.5505,2.96995E-19 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gallagher, Hannah /note=Auto-annotation: Genemark and Glimmer. Both call the start site at 24626. The start codon is ATG which has a high frequency in start codons. /note=Coding Potential: Coding potential is in the forward ORF strand only, indicating that this is a forward gene. GeneMark Self and Host found coding potential. Start site 24626 covers the entire coding potential with a very minimal amount of the earliest coding potential cut-off. /note=SD (Final) Score: The Final Score is -4.016 the highest score of the three start sites. This start site has the highest Z-score of 2.752, which suggests the presence of a credible ribosome binding site. /note=Gap/overlap: -1 overlap which is suggestive of this gene belonging to an operon system. The start site 24626 has the smallest gap/overlap and changing the start site to 24452 would create a very large gap with no coding potential and a lower final score/z-score. This overlap is conserved in Dublin (A5). /note=Phamerator: Pham 116093 found on 10/5/2023. It is conserved and found in other phages like Dublin (A5), Cen1621 (EH), and Hirko (FL). Found in several other varying clusters. /note=Starterator: Start #45 is the most annotated with 45/305 non-draft genes in Pham 116093 while present in 127/305 non-draft genes. This gene does not have this start site. Start #56 has no manual annotations (0/21) but is called for in 21/305 non-draft genes. Its two other starts, 1 and 43 have no manual annotations and are not called by GeneMark and Glimmer. /note=Location call: Based on the above evidence, this is a real gene, and the likely start site is 24626. /note=Function call: NKF. All Phagesdb non-draft hits with an e-value91%, identity>43%, and alignment>60%. CDD returned no hits for this sequence. HHPred has no protein hits with a defined function that has a significant probability, coverage, or e-value. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. This gene does have a signal peptide. /note=Secondary Annotator Name: Chang, Amanda /note=Secondary Annotator QC: I believe you missed the NCBI hits that you selected. In addition, the hits you selected had functions whereas other genes from the NCBI blast that were unselected were hypothetical proteins and fulfilled the requirements to be viable for selection. This may be optional, but I would also add that starterator suggested 2 other potential starts however they also have 0 MAs and do not agree with Glimmer/GeneMark, unlike the one you called. CDS 24961 - 25359 /gene="28" /product="gp28" /function="LAGLIDADG endonuclease" /locus tag="Soondubu_28" /note=Original Glimmer call @bp 24961 has strength 11.51; Genemark calls start at 24961 /note=SSC: 24961-25359 CP: yes SCS: both ST: SS BLAST-Start: [LAGLIDADG endonuclease [Arthrobacter phage Emotion]],,NCBI, q2:s5 99.2424% 4.01823E-71 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.919, -3.6727816321281046, yes F: LAGLIDADG endonuclease SIF-BLAST: ,,[LAGLIDADG endonuclease [Arthrobacter phage Emotion]],,WGH21383,86.5672,4.01823E-71 SIF-HHPRED: d.95.2.0 (A:257-389) automated matches {Baker`s yeast (Saccharomyces cerevisiae) [TaxId: 4932]} | CLASS: Alpha and beta proteins (a+b), FOLD: Homing endonuclease-like, SUPFAM: Homing endonucleases, FAM: automated matches,,,SCOP_d2ab5a1,78.7879,99.6 SIF-Syn: This gene has synteny with endonuclease genes in phages Adolin and Amyev. /note=Primary Annotator Name: Bonthala, Praneel /note=Auto-annotation: Glimmer and GeneMark call the start site at 24961. /note=Coding Potential: Coding potential in the ORF is in the forward strand only, indicating that this is a forward gene. The coding potential covers the entire strand. /note=SD (Final) Score: -3.673 for start 24961. This is the highest Final Score on PECAAN. /note=Gap/overlap: The gap is -1 for start 16663. This is the smallest gap on PECAAN and could be part of an operon. /note=Phamerator: Date: 10/9/2023. Pham: 116388. The pham is shared by 71 phages and many members of the AZ cluster. /note=Starterator: Date: 10/9/2023. Soondubu has a start site 22@24961 with 38 manual annotations. It was the most commonly annotated start site, which is found in 38 of the 52 non-draft genomes in the pham. /note=Location call: Based on the above evidence, this is most likely a real gene with a start site at 24961. /note=Function call: LAGLIDADG endonuclease. The top 3 hits of Phagesdb Blast call a LAGLIDADG endonuclease with e-value < 1e-50. The top 3 hits of HHPred all call endonuclease proteins with coverage > 78% and e-value < 1e-12. The top 3 hits of NCBI Blast all call LAGLIDADG endonuclease proteins with identity > 79%, coverage > 99%, and e-values < 1e-69. CDD also calls a LAGLIDADG endonuclease protein, but with a high e-value and low identity and coverage. The gene is also syntenic with endonuclease genes in other AZ cluster phages such as Adolin and Amyev. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Shao, Sarah /note=Secondary Annotator QC: /note=1. Add phamerator and starterator dates CDS 25556 - 26260 /gene="29" /product="gp29" /function="recombination directionality factor" /locus tag="Soondubu_29" /note=Original Glimmer call @bp 25556 has strength 18.68; Genemark calls start at 25556 /note=SSC: 25556-26260 CP: yes SCS: both ST: SS BLAST-Start: [recombination directionality factor [Arthrobacter phage MaGuCo]],,NCBI, q1:s1 100.0% 1.81346E-126 GAP: 196 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.884, -2.979545903895001, yes F: recombination directionality factor SIF-BLAST: ,,[recombination directionality factor [Arthrobacter phage MaGuCo]],,WGH20322,82.8452,1.81346E-126 SIF-HHPRED: Gp3-like ; Recombination directionality factor-like,,,PF18897.3,87.1795,100.0 SIF-Syn: This gene is a recombination directionality factor. The gene upstream is LAGLIDADG endonuclease and the gene downstream is NKF. /note=Primary Annotator Name: Faulkner, Cheyenne /note=Auto-annotation: Glimmer and GeneMark were both used to call the start at 25556 /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host, but the chosen start site is not fully included in the GeneMark Self and Host, it is very close. /note=SD (Final) Score: -2.980. This is the highest final (least negative) score on PECAAN. /note=Gap/overlap: 196. This is a large gap which could mean that there is a gene upstream of the gene or the start site called by Glimmer and GeneMark is incorrect. This gap is not conserved in Adoline, but is conserved in Amyev. /note=Phamerator: pham 848 as of 10/4/2023. It is conserved and found in Wildwest (AZ1) and Amyev (AZ). /note=Starterator: Start site 37 in starterator was manually annotated in 56/119 non-draft genomes. Start 37 is the most annotated start site & Soondubu does have the most annotated start site & calls it. Start 37 is also found in Adolin(AZ) & Amyev(AZ). This evidence agrees with the start site called by Glimmer & GeneMark /note=Location call: Based on the evidence above, this is a real gene and the start site is most likely 25556. /note=Function call: Recombination directionality factor. ⅔ top phagesdb BLAST hits have the function of Recombination directionality factor(E-Values<1*10^-102), and 2/6 of the top NCBI BLAST hits also have a function of Recombination directionality factor (Coverage 99%+, 75%+ identity, & E-Values,1.4*10^-123). HHpred had a hit for Recombination directionality factor with a probability of 100%, 87% coverage, and an E-value of 6.2*10^-32. CDD did not have any relevant hits. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs thus, is not a membrane protein. /note=Secondary Annotator Name: Shao, Sarah /note=Secondary Annotator QC: I agree with location and function call. CDS 26260 - 26451 /gene="30" /product="gp30" /function="hypothetical protein" /locus tag="Soondubu_30" /note=Original Glimmer call @bp 26260 has strength 14.24; Genemark calls start at 26260 /note=SSC: 26260-26451 CP: yes SCS: both ST: SS BLAST-Start: GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.656, -4.213475109731446, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wong, Michael /note=Auto-annotation: Both Glimmer & Genemark that agree on same start site (26260); ATG start codon called /note=Coding Potential: Reasonable coding potential predicted within putative ORF. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: SD score is the best (i.e. -4.213) as it is the least negative out of all other options. /note=Gap/overlap: The overlap with the upstream gene is reasonable (1bp) - operon. There are no alternative start candidates that are reasonable. The length of the gene is acceptable given the auto annotated start (chosen start site as well). /note=Phamerator: On 10/4/23, gene was found in pham #118699. It is an orphan. For phamerator, VroomVroom was used for comparison and no known function could be called. /note=Starterator: No currently available starterator - orpham! /note=Location call: The gathered evidence (z-score is the best (highest), codon is common (ATG), final score is the best (least negative)) suggests that the original start site of 26260 is the best possible start site. The gene is a real gene (conserved in phamerator and good coding potential) and the potential start site of 26260 is the most likely potential start site candidate. The potential start site candidate of 26260 seems the most likely as it covers all coding potentials and is called by both Glimmer and Genemark. /note=Function call: Predicted function is NKFbased on hits from PhagesDB, BLASTp, and NCBI BLAST. For PhagesDB, there was a relatively low e-value (3e-5). For NCBI BLAST, there were no hits. Within HHPred’s best Pfam hit, there was moderate evidence that matched the unknown function (72.3 probability, ~94% coverage). CDD had no hits. /note=Transmembrane domains: There are 2 TMDs predicted by DeepTMHMM meaning that the gene translates into a transmembrane protein. /note=Secondary Annotator Name: Shao, Sarah /note=Secondary Annotator QC: I agree with location and function call. CDS 26525 - 26788 /gene="31" /product="gp31" /function="NrdH-like glutaredoxin" /locus tag="Soondubu_31" /note=Original Glimmer call @bp 26525 has strength 17.14; Genemark calls start at 26525 /note=SSC: 26525-26788 CP: yes SCS: both ST: SS BLAST-Start: [thioredoxin domain [Arthrobacter phage BaileyBlu] ],,NCBI, q2:s4 90.8046% 5.45562E-26 GAP: 73 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.418, -6.945489798766537, no F: NrdH-like glutaredoxin SIF-BLAST: ,,[thioredoxin domain [Arthrobacter phage BaileyBlu] ],,YP_010677538,71.7647,5.45562E-26 SIF-HHPRED: GLUTAREDOXIN-LIKE PROTEIN NRDH; ELECTRON TRANSPORT, NRDH, THIOREDOXIN, GLUTAREDOXIN, REDOX PROTEIN; 1.7A {ESCHERICHIA COLI} SCOP: c.47.1.1,,,1H75_A,89.6552,99.2 SIF-Syn: The gene upstream is NrdH-like protein, the gene downstream is metallophosphatase. /note=Primary Annotator Name: Yao, Jiayu /note=Auto-annotation: Glimmer, GeneMark (host) and GeneMark (self) all agreed on the start site 26525. /note=Coding Potential: Host-Trained GeneMark coding potential is covered within the start site, and no reverse coding potential is shown /note=SD (Final) Score: -6.945, the least negative score among all the start sites /note=Gap/overlap: 73, relatively large overlap but the smallest among all the options /note=Phamerator: 10/9/2023, it was found in pham 116399. There were 84 members in this pham, and 6 of them were drafts. Phages Adolin and DrManhattan from the AZ cluster also have the same pham. /note=Starterator: 10/2/23, the most often called start site number was 63, it was called in 46 of the 83 non-draft genes in the pham. It does not have auto-annotated start. Start site number 63 should be the start site for this gene because it is the most often called site number. /note=Location call: Based on the prediction of GeneMark and Glimmer, together with other evidence like Z-value (1.418), final score (-6.945) and gap value, the gene should be real and the start site is 26525. /note=Function call: The function should be NrdH-like glutaredoxin, because CallinAllBarbz and BaileyBlue genes with the same function have the scores closest to Soondubu (98 and 77), and the e-values are close to 0. NCBI Blast shows that CallinAllBarbz and BaileyBlue genes with the same function have the highest scores (102 and 67.8) with high coverage (90% and 72%) and e-values close to 0. The CDD hit shows it belongs to the NrdH redoxin family. The HHpred shows that NrdH-like glutaredoxin has the highest scores with a probability close to 100% and e-value close to 0. Therefore, the function should be NrdH-like glutaredoxin. /note=Transmembrane domains: No transmembrane domains are predicted. /note=Secondary Annotator Name: Wong, Michael /note=Secondary Annotator QC: For auto-annotation, what are the other species that display synteny? For coding potential, which sources did you pull from? Everything else looks good - great job! CDS 26785 - 27357 /gene="32" /product="gp32" /function="metallophosphoesterase" /locus tag="Soondubu_32" /note=Original Glimmer call @bp 26785 has strength 13.57; Genemark calls start at 26785 /note=SSC: 26785-27357 CP: yes SCS: both ST: SS BLAST-Start: [phosphoesterase [Arthrobacter phage Adumb2043] ],,NCBI, q1:s1 100.0% 2.20866E-89 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.077, -2.583959800616441, yes F: metallophosphoesterase SIF-BLAST: ,,[phosphoesterase [Arthrobacter phage Adumb2043] ],,YP_010677943,80.203,2.20866E-89 SIF-HHPRED: MPP_AQ1575; Aquifex aeolicus AQ1575 and related proteins, metallophosphatase domain. This family includes bacterial and archeal proteins homologous to AQ1575, an uncharacterized Aquifex aeolicus protein.,,,cd07390,94.2105,99.8 SIF-Syn: the gene upstream is holliday junction resolvase, and the gene downstream is nrdh-like glutaredoxin /note=Primary Annotator Name: Mathkour, Yusef /note=auto-annotation: The glimmer and gene mark start agree with one another, the glimmer start= 26785 and the genemark start= 26785.The start codon is ATG therefore there is a Higher probability of using this codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The chosen start site covers all coding potential. /note=SD (Final) Score: -2.584 is the best Final score on Peccan, The Z score is 3.077 this is the best Z score value on Peccan. /note=Gap/overlap: -4bp. this indicates that its on an operon. /note=phamerator: pham: 85087 date 10/10/23, its conserved, found in the fp cluster with phages; BaileyBlu (fp) and CallinAllBarbz(fp) . The assigned function for both genes in phosphoesterase. /note=Starterator:Start site 56 was found in Starterator 42 of 51 ( 2.0% )of 139 ( 30.2% ) of genes in pham, there were 32 of 123 Manual Annotations of this star. Start site 56 was called 100.0% of time when present. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 26785. /note=Function call: metallophosphatase; there is evidence of a metallophosphatase function in comparable phages. On HHPRED the most similar function call has a probability is 99.8 which is above the 80% threshold showing a sign that it is closely related in function. The % Coverage: 94.2105% which is above the 35% threshold required but with an E-value: of 2.1e-19, this is significantly lower than what is required which is below 1*10^-6 showing it has a correlation to the known function. NCI blast has 68.5279% identity and 100% coverage, it also has an e value of 2.20866e-89 to a metallophosphoesterase suggesting similar functions. Finally, CDD has 34.9462% identity, 91.5789%coverage and an e value of 3.52634e-31 /note=supporting the metallophosphoesterase function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Faulkner, Cheyenne /note=Secondary Annotator QC: I agree with this annotation. The primary annotator needs to make the following changes: /note=1. Mark phages as evidence on Phagesdb BLAST, HHPRED, NCBI Blast, and CDD /note=2. Under function call, make sure to mention the hits found NCBI Blast and CDD /note=3. Make sure to fill out the synteny box CDS 27354 - 27797 /gene="33" /product="gp33" /function="Holliday junction resolvase" /locus tag="Soondubu_33" /note=Original Glimmer call @bp 27354 has strength 13.06; Genemark calls start at 27354 /note=SSC: 27354-27797 CP: yes SCS: both ST: SS BLAST-Start: [holliday junction resolvase [Arthrobacter phage KeAlii] ],,NCBI, q1:s1 100.0% 1.55211E-74 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.568, -3.627004739933808, yes F: Holliday junction resolvase SIF-BLAST: ,,[holliday junction resolvase [Arthrobacter phage KeAlii] ],,YP_010678153,85.034,1.55211E-74 SIF-HHPRED: Holliday junction resolvase; archeal holliday junction resolvase helicase DNA binding enzyme phage 15-6 thermus thermophilus, RECOMBINATION; HET: SO4, MSE; 2.5A {Thermus thermophilus phage 15-6},,,7BGS_B,74.1497,99.6 SIF-Syn: This gene has synteny with other Holliday junction resolvases in KeAlii (AZ) and Nitro (AZ), with their metallophosphatase genes coming just upstream of this gene and another gene with an identical pham coming just downstream. /note=Gene (stop@27797 F) /note=PECAAN Notes /note=Primary Annotator Name: Moore, Joshua /note=Auto-annotation: Both GeneMark and Glimmer call the start at 27354. /note=Coding Potential: The ORF has significant coding potential for this gene only on the forward strand. Potential is found in GeneMark Host and GeneMark Self. /note=SD (Final) Score: -3.627. This is the best final score on PECAAN. The z-score is also the highest at 2.568. The start codon is GTG. /note=Gap/overlap: The gap is -4. This indicates that this gene is likely part of an operon. /note=Phamerator: Pham 116215. 10/04/23. It is conserved, and found in KeAlii (AZ) and Nitro (AZ) /note=Starterator: Soondubu does not have the most annotated start in Starterator. However, its start is called 100% of the time it is present, and agrees with GeneMark and Glimmer. It has 37/129 manual annotations. This is the second most annotated start site. /note=Location call: Based on the above evidence, this gene is a real gene, and has a start site at 27354. /note=Function call: Holliday junction resolvase. Many PhagesDB BLAST hits agree with the suggested function of Holliday junction resolvase with e-values clustered between e-60 to e-43. There are many NCBI BLAST hits that also have a function of Holliday junction resolvase (identity 70%+, coverage 99.3%+, e-value < e-66). CDD had no relevant hits. HHPred had 2 hits that agreed with Holliday junction resolvase (with no reference to RusA or RuvC) (99.1%+ probability, 70.7%+ coverage, e-value < e-8). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Chang, Amanda /note=Secondary Annotator QC: I agree with this location and function call. [optional] Adding that the start chosen from starterator was the 2nd most annotated start site. Be more specific in synteny box (what pham/functions are the genes upstream/downstream) and add the function of the gene. CDS complement (27794 - 27961) /gene="34" /product="gp34" /function="hypothetical protein" /locus tag="Soondubu_34" /note=Original Glimmer call @bp 27961 has strength 1.91; Genemark calls start at 27961 /note=SSC: 27961-27794 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_KEALII_68 [Arthrobacter phage KeAlii]],,NCBI, q1:s3 96.3636% 1.5362E-6 GAP: 198 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.821, -3.0301037898918786, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KEALII_68 [Arthrobacter phage KeAlii]],,WEM34621,51.5152,1.5362E-6 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shao, Sarah /note=Auto-annotation: Glimmer and GeneMark, agreed start site (27961), ATG /note=Coding Potential: Limited coding potential, but there is one medium peak of coding potential in GeneMark Host. Reasonable coding potential predicted in GeneMark Self. Start site covers all coding potential in GeneMark Self and Host. /note=SD (Final) Score: -3.030. It is not the best final score on PECAAN, but is still a good final score. /note=Gap/overlap: 198bp gap. Somewhat large, but reasonable because gap is also shown in other phages (KeAlii, Phives) and there is no coding potential in this gap (GeneMark Host/Self). /note=Phamerator: pham 116533. Date 10/9/23. It is conserved, found in many other phages including KeAlii (AZ), Phives (AZ). 39 phages are members of the pham, almost none show a function and all have similar gene length (150-180bp). /note=Starterator: pham 116533. Split information, only two start sites have MA’s and both have 1 (start site 22 and 24). Start 22 (27961) is closer to most annotated start site, but another phage chose start 24 (27949). Starterator ultimately did not provide helpful information to support a specific start site. Date 10/9/23. /note=Location call: Two likely start sites, start sites produce two genes of very different lengths. Chosen start site produces gene length similar to other genes in the pham. Based on the above evidence, this is a real gene with a likely start site at 27961. /note=Function call: Function unknown (NFK). One non-draft phagesdb BLAST hit has a low e-value (KeAlii, 4e-7) and shows function unknown. Other non-draft phagesdb BLAST hits call the same function, but show slightly larger e-values [(Phives, 3e-6), (Tuck, 3e-6)]. The top NCBI BLAST hits also are hypothetical proteins [(KeAlii, coverage 96%, identity 42.42%, e-value 1.5e-6), (Pseudoarthrobacter siccitolerans, coverage 96%, identity 50.88%, e-value 2.33e-6)]. There were no hits from CDD. No significant HHpred hits (all hits have large e-values >10, and low probability <70%). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bonthala, Praneel /note=Secondary Annotator QC: I agree with the chosen start site and designation of function as NKF. However, the chosen start site does not have the best final score - just a note. CDS 28160 - 30655 /gene="35" /product="gp35" /function="DNA primase/helicase" /locus tag="Soondubu_35" /note=Original Glimmer call @bp 28160 has strength 16.5; Genemark calls start at 28160 /note=SSC: 28160-30655 CP: yes SCS: both ST: SS BLAST-Start: [phage/plasmid primase, P4 family [Arthrobacter sp. EPSL27] ],,NCBI, q1:s1 100.0% 0.0 GAP: 198 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.919, -2.8454123590742793, yes F: DNA primase/helicase SIF-BLAST: ,,[phage/plasmid primase, P4 family [Arthrobacter sp. EPSL27] ],,WP_066431284,90.8323,0.0 SIF-HHPRED: Primase D5; DNA helicase, D5_N domain, DUF5906 domain, Pox_D5 domain, SF3 helicase, VIRAL PROTEIN;{Vaccinia virus Copenhagen},,,8APM_C,50.5415,100.0 SIF-Syn: DNA primase/helicase; upstream is pham 116533 and downstream is DNA polymerase I just like in Phives (AZ) [note: DNA pol is not directly after DNA primase/helicase in Phives but rather two genes downstream]. /note=Primary Annotator Name: Chang, Amanda /note=Auto-annotation: Glimmer and GeneMark. Both call the start site at 28160. The start site is a TTG, which is typically seen in a smaller frequency compared to ATG and GTG however this start is conserved and the final score is the best available. /note=Coding Potential: Coding potential in this ORF is only on the forward strand, indicating this is a forward gene. Coding potential is found at high intensity on GeneMark Host and Self. /note=SD (Final) Score: -2.845. This is the best final score (least negative) on PECAAN. /note=Gap/overlap: +198. This gap is quite large however there is no coding potential as seen on GeneMark Host and Self. This gap is also conserved in other phages of the same cluster (AZ) such as VResidence and DrManhattan. /note=Phamerator: pham: 85082 as of 10/8/23. It is conserved in DrManhattan (AZ) and YesChef (AZ). /note=Starterator: Start site 49 is the most manually annotated start with 57 out of 117 non-draft genes in the pham. Start site 49 is at 28160 in Soondubu. This agrees with the start site suggested by Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is most likely a real gene and most likely starts at 28160. /note=Function call: DNA primase/helicase. The top 4 PhagesDB hits of non-draft genes call the function of this gene to be DNA primase/helicase and each have e-values of 0. The top 3 HHpred hits call the function of this gene to be DNA helicase and primase each with high coverage (+35%), high probability (+99%), and low e-values (<10^-25). The top 3 NCBI Blast hits call the gene’s function to be DNA primases and helicases with high probability (+80%), high coverage (100%), and low e-values (0). There are 2 CDD hits that call this function with high coverage (+35%) and low e-values (0). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gallagher, Hannah /note=Secondary Annotator QC: Possibly could discuss how TTG is a lower probability. Otherwise, I agree with the primary annotation! CDS 30664 - 30771 /gene="36" /product="gp36" /function="hypothetical protein" /locus tag="Soondubu_36" /note= /note=SSC: 30664-30771 CP: yes SCS: neither ST: NA BLAST-Start: [hypothetical protein SEA_EMOTION_44 [Arthrobacter phage Emotion]],,NCBI, q6:s9 85.7143% 2.26528E-8 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.159, -4.4057176022952405, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_EMOTION_44 [Arthrobacter phage Emotion]],,WGH21393,65.0,2.26528E-8 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gallagher, Hannah /note=Auto-annotation: Neither Genemark nor Glimmer are available for this gene as it was added through manual annotation. /note=Coding Potential: Coding potential, while small, is in the forward ORF strand only, indicating that this is a forward gene. GeneMark Self and Host found coding potential. Start site 30664 covers the entire coding potential and is the only start site available. /note=SD (Final) Score: The Final Score is -4.406 and it is the only final score since this is the only start site. This start site has a Z-score of 2.159 which suggests the presence of a credible ribosome binding site. /note=Gap/overlap: 8 bp gap which is highly probable and allows the gene to be 108 bp which is smaller than 120 bp. This gene is conserved in some other phages like Emotion (AZ4) and CallinAllBarbz (FP). The gap is not very conserved in other phages as there is usually an overlap between this gene and the upstream gene. /note=Phamerator: Phamerator is not available as of 10/11/2023 since this gene was added to the phage genome. It could potentially belong to Pham 89866 or 90454 which some FP and AZ phages belong to /note=Starterator: Starterator is not available as of 10/11/2023 since this gene was added to the phage genome. /note=Location call: Based on the above evidence, this is a real gene, and the likely start site is 306664. /note=Function call: NKF. Phagesdb BLAST`s top hits are Emotion (AZ4) with an e-value of 1e-9 and CallinAllBarbz (FP) with an e-value of 7e-8. NCBI Blast had two significant hits that were hypothetical proteins for Emotion (e-value of 2.265e-8) and CallinAllBarbz (1.2e-6). Neither of these hits had a function. There were no significant hits for HHPred. No hits were returned by CDD. All of the above evidence means that this gene has no known function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Freise, Amanda /note=Secondary Annotator QC: CDS 30879 - 32723 /gene="37" /product="gp37" /function="DNA polymerase I" /locus tag="Soondubu_37" /note=Original Glimmer call @bp 30879 has strength 17.49; Genemark calls start at 30879 /note=SSC: 30879-32723 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase [Arthrobacter phage SWEP2]],,NCBI, q1:s1 100.0% 0.0 GAP: 107 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.077, -2.5052746077145835, yes F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase [Arthrobacter phage SWEP2]],,USL85053,87.9479,0.0 SIF-HHPRED: Apicoplast DNA polymerase; DNA polymerase, exonulease, apicoplast, Plasmodium falciparum, REPLICATION, TRANSFERASE; HET: PEG, EDO; 2.5A {Plasmodium falciparum (isolate 3D7)},,,7SXQ_B,96.7427,100.0 SIF-Syn: DNA helicase/primase upstream; ligase downstream which is also found in Elezi (AZ1) and Eraser (AZ1). /note=Primary Annotator Name: Gallagher, Hannah /note=Auto-annotation: Genemark and Glimmer. Both call the start site at 30879. The start codon is TTG which has a lower frequency in start codons but is still probable. /note=Coding Potential: Coding potential is in the forward ORF strand only, indicating that this is a forward gene. GeneMark Self and Host found coding potential. Start site 30879 covers the entire coding potential. /note=SD (Final) Score: Start 30879 has a Final Score of -2.505 which is the highest of the entries. The Z-score is -2.505 which is also the highest of the entries and suggests the presence of a credible ribosome binding site. /note=Gap/overlap: 223 gap with the upstream gene that is conserved in Cassia (AZ1), Adolin (AZ1), and Dr. Sierra (AZ1). Start 30879 is the LORF and the length of the gene is 1845 which is very reasonable for a gene that encodes DNA polymerase I. /note=Phamerator: Pham 117330 as of 10/5/2023. The pham is conserved in over 100 Phagesdb BLAST entries that mostly encode DNA polymerase I which is consistent with SEA-Phages approved function list. Non-draft phages Cassia (AZ1), MaGuCo (AZ3), and BaileyBlu (FP) belong to pham 117330. /note=Starterator: Start 72 @30879 has 52 MA`s and is called for in 73/900 genes. Start 72 is called 97.3% of time when present. Start 72 has phages in AZ1, FP, EH, A2, AZ4, A9, and AZ3. The most common start site (not available in this gene) is start 73 which has 820/900 non-draft manual annotations and is called for in 98.6% of time when present. /note=Location call: Based on the above evidence, this is a real gene, and the likely start site is 30789. /note=Function call: DNA Polymerase I. The top non-draft Phagesdb BLAST hits have a function of DNA Polymerase I ([Cassia, e-value of 0], [Adumb2043, 0], [Asa16, 0], [Elezi, 0], [Eraser, 0] etc.). The top 3 HHPred hits are related to DNA polymerase I and have probability of 100, coverage>95%, and e-value of 0. The top non-draft phages NCBI BLAST hits have a function of DNA Polymerase I ([SWEP2, 79.64% identity, 87.95% aligned,100% coverage, and an e-value of 0], [Adumb2043, 79.71% identity, 86.80% aligned, 100% coverage, and an e-value of 0], etc.). CDD returned several hits for DNA Polymerase I with identity >29%, 45.6% alignment, coverage>46%, and an e-value of 0. Phagesdb BLAST, NCBI BLAST, HHPred, and CDD are strongly suggestive of this gene being a DNA Polymerase I. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Yao, Jiayu /note=Secondary Annotator QC: I agree with this annotation. CDS 32720 - 33019 /gene="38" /product="gp38" /function="DNA ligase" /locus tag="Soondubu_38" /note=Original Glimmer call @bp 32720 has strength 7.0; Genemark calls start at 32720 /note=SSC: 32720-33019 CP: yes SCS: both ST: NI BLAST-Start: [DNA ligase [Arthrobacter phage BaileyBlu] ],,NCBI, q6:s2 92.9293% 3.67838E-43 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.568, -3.627004739933808, yes F: DNA ligase SIF-BLAST: ,,[DNA ligase [Arthrobacter phage BaileyBlu] ],,YP_010677546,81.0526,3.67838E-43 SIF-HHPRED: d.142.2.2 (A:) Adenylation domain of NAD+-dependent DNA ligase {Enterococcus faecalis [TaxId: 1351]} | CLASS: Alpha and beta proteins (a+b), FOLD: ATP-grasp, SUPFAM: DNA ligase/mRNA capping enzyme, catalytic domain, FAM: Adenylation domain of NAD+-dependent DNA ligase,,,SCOP_d3ba9a_,63.6364,99.1 SIF-Syn: The gene has synteny with DNA ligase genes in AZ phages such as Adolin and Amyev. /note=Primary Annotator Name: Bonthala, Praneel /note=Auto-annotation: Glimmer and GeneMark call the start site at 32720. /note=Coding Potential: Coding potential in the ORF is in the forward strand only, indicating that this is a forward gene. The coding potential covers the entire strand. /note=SD (Final) Score: -3.627 for start 32720. This is the highest Final Score on PECAAN. This start site also has the highest Z-score at 2.568. /note=Gap/overlap: The gap is -4 for start 32720. This is the smallest gap on PECAAN and could be part of an operon. /note=Phamerator: Pham: 117720. The pham is shared by 49 phages and many members of the AZ cluster. /note=Starterator: No available starterator report. /note=Location call: Based on the above evidence, this is most likely a real gene with a start site at 32720. /note=Function call: DNA ligase. The top 3 non-draft genome hits of Phagesdb Blast call a DNA ligase with e-value < 1e-29. The top 3 hits of HHPred all call DNA ligase proteins with coverage > 62%, e-value < 1e-8, and probabilities > 99%. The top 3 hits of NCBI Blast all call DNA ligase proteins with identity > 61%, coverage > 92%, and e-values < 1e-35. The top 3 hits of CDD also call DNA ligase proteins with low coverage and identity but e-values of < 1e-9. The gene is also syntenic with DNA ligase genes in other AZ cluster phages such as Adolin and Amyev. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Moore, Joshua /note=Secondary Annotator QC: For the coding potential, you could mention the slight dropoff at the end after 32900. Also, don’t forget the All GM Coding Capacity dropdown box. For the SD (Final) Score, don’t forget to mention the Z-score, which would help your argument. Since the gene is part of a large pham, there should be a Starterator report. You are likely between database updates (as I am too). For HHPred, don’t forget to mention the probability (which is quite high for your selected pieces of evidence). CDS 33016 - 33327 /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="Soondubu_39" /note=Original Glimmer call @bp 33016 has strength 11.27; Genemark calls start at 33016 /note=SSC: 33016-33327 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Pseudarthrobacter siccitolerans] ],,NCBI, q1:s1 98.0583% 5.95427E-19 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.216, -2.156361006712914, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Pseudarthrobacter siccitolerans] ],,WP_050053686,57.6923,5.95427E-19 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Faulkner, Cheyenne /note=Auto-annotation: Glimmer and GeneMark were both used to call the same start as 33016. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host, but the chosen start site is not fully included in the GeneMark Self and Host, it is very close. /note=SD (Final) Score: -2.156. This is the highest final (least negative) score on PECAAN /note=Gap/overlap: -4. This overlap could mean that this gene is part of an operon. This overlap is conserved with Amyev (AZ) and Adolin (AZ). /note=Phamerator: pham: 965 as of 10/4/2023. It is conserved, found in both Adolin(AZ) & Amyev (AZ). /note=Starterator: Start site 37 in starterator was manually annotated in 87/104 non-draft genomes. Start 37 is the most annotated start site & Soondubu does have the most annotated start site & calls it. Start 37 is also found in Adolin(AZ) & Amyev(AZ). This evidence agrees with the start site called by Glimmer & GeneMark /note=Location call: Based on the evidence above, this is a real gene and the start site is most likely 33016. /note=Function call: NKF. The top 3 phagesdb BLAST hits have the function of Unknown(E-Values<2*10^-16), and 5/5 of the NCBI BLAST hit also have a function of Hypothetical protein(Coverage 91%+, 39%+ identity, & E-Value<6*10^-16). HHpred did not have any significant hits with high coverage(nothing>34%) or high probability (nothing>51%). CDD did not have any relevant hits. This is also the case for phages Adolin (AZ) and Amyev (AZ). /note=Transmembrane domains: DeepTMHMM did not predict any TMDs thus, is not a membrane protein. /note=Secondary Annotator Name: Mathkour, Yusef /note=Secondary Annotator QC: Make sure to fill in the () in the synteny box! CDS 33487 - 34296 /gene="40" /product="gp40" /function="DNA binding protein" /locus tag="Soondubu_40" /note=Original Glimmer call @bp 33487 has strength 18.98; Genemark calls start at 33487 /note=SSC: 33487-34296 CP: yes SCS: both ST: SS BLAST-Start: [DNA binding protein [Arthrobacter phage Lizalica] ],,NCBI, q3:s2 98.1413% 3.93293E-96 GAP: 159 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.402, -4.176752302706706, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Arthrobacter phage Lizalica] ],,YP_010677608,70.0,3.93293E-96 SIF-HHPRED: RNA polymerase sigma factor RpoS; Transcription-activator, DNA/RNA, SigmaS, beta`, TRANSCRIPTION, Transferase-DNA complex; 3.26A {Escherichia coli},,,6OMF_F,97.3978,100.0 SIF-Syn: Upstream is NKF and downstream is NKF. No synteny with other species. /note=Primary Annotator Name: Wong, Michael /note=Auto-annotation: Both Glimmer & Genemark that agree on same start site (33487); ATG start codon called /note=Coding Potential: Reasonable coding potential predicted within putative ORF. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: SD score is the best (i.e. -4.177) as it is the least negative out of all other options. /note=Gap/overlap: The gap with the upstream gene is large (159 bp); however, after analyzing the coding potentials, there does not seem to be evident coding potential within the 159 bp gap (in both forward and reverse). There are no alternative start candidates that are reasonable. The length of the gene is acceptable given the auto annotated start (chosen start site as well). /note=Phamerator: On 10/9/23, gene was found in pham #102451. The pham in which the gene is conserved is in other members of the cluster/subcluster to which the phage belongs; JohnDoe and Lizalica was used for comparison. Phamerator called for the function of the gene to be a DNA Binding Protein. Functions called were consistent and found in the approved function list. /note=Starterator: The reasonable start site choice that is conserved among members of the pham to which the gene belongs is 35. Start: 35 @33487 has 0 MA`s. Found in 1 of 133 (0.8%) of genes in pham. The start number called the most often in the published annotations is 42, it was called in 42 of 107 of non-draft genes in pham. /note=Location call: The gathered evidence (z-score is the best (highest), codon is common (ATG), final score is the best (least negative)) suggests that the original start site of 33487 is the best possible start site. The gene is a real gene (conserved in phamerator and good coding potential) and the potential start site of 33487 is the most likely potential start site candidate. The potential start site candidate of 33487 seems the most likely as it covers all coding potentials and is called by both Glimmer and Genemark. /note=Function call: Predicted function is a DNA binding protein based on hits from PhagesDB, BLASTp, and NCBI BLAST. For PhagesDB, there was a low e-value (1e-76). For NCBI BLAST, there was query coverage (>98%), decent identity (>56%), and a low e-value (3.93e-96). Within HHPred’s best Pfam hit, there was strong evidence that matched the DNA binding protein function (100 probability, ~97% coverage, 7.1e-26). There are no specific requirements listed on the approved functions list. CDD had 1 hit for a DNA binding protein with high coverage (>92%) and low e-value (1.2e-8). /note=Transmembrane domains: There is an absence of TMDs as predicted by DeepTMHMM. /note=Secondary Annotator Name: Faulkner, Cheyenne /note=Secondary Annotator QC: I agree with this annotation. No changes needed at this time. CDS 34415 - 34756 /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="Soondubu_41" /note=Original Glimmer call @bp 34415 has strength 18.22; Genemark calls start at 34415 /note=SSC: 34415-34756 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU48_gp47 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s2 94.6903% 3.46121E-41 GAP: 118 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.999, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp47 [Arthrobacter phage DrManhattan] ],,YP_009815390,80.1802,3.46121E-41 SIF-HHPRED: SIF-Syn: The gene upstream is DNA-binding protein, the gene downstream is SprT-like protease. /note=Primary Annotator Name: Yao, Jiayu /note=Auto-annotation: Glimmer, GeneMark (host) and GeneMark (self) all agreed on the start site 34415. The gene displays synteny with other genes. /note=Coding Potential: coding potential is covered within the start site, and no reverse coding potential is shown /note=SD (Final) Score: -2.601, the least negative score among all the start sites /note=Gap/overlap: 118, relatively large overlap but the smallest among all the options, and there is little coding potential. /note=Phamerator: 10/9/2023, it was found in pham 115459. There were 12 members in this pham, and 3 of them were drafts. Phages Adolin and DrManhattan from the AZ cluster also have the same pham. /note=Starterator: 10/2/23, the most often called start site number was 8, it was called in 7 of the 9 non-draft genes in the pham. The auto-annotated start site is 8. Start site number 8 should be the start site for this gene because it is both the most often called site and the most conserved start site. /note=Location call: Based on the prediction of GeneMark and Glimmer, together with other evidence like Z-value (2.999), final score (-2.601) and gap value, the gene should be real and the start site is 34415. /note=Function call: The function should be no known function, because CallinAllBarbz and BaileyBlue genes with the same function have the scores closest to Soondubu (174 and 166), and the e-values are close to 0. NCBI Blast shows that CallinAllBarbz and BaileyBlue genes with the same function have the highest scores (169 and 150) and the coverage is 94% and 84% while the identity is around 75%, which is pretty high. No CDD hits are found. The HHpred shows that the protein of unknown function has the highest score and a acceptable coverage (38% and 30%). Therefore, the function should be no known function. /note=Transmembrane domains: No transmembrane domains are predicted. /note=Secondary Annotator Name: Moore, Joshua /note=Secondary Annotator QC: Don’t forget the All GM Coding Capacity box. When mentioning synteny, give an example of 1 or 2 genes that exhibit synteny. For the SD (Final) Score, don’t forget to mention the Z-score, which would help your argument. For the gap of 118, include arguments about the lack of coding potential in the gap as well as the conservation of the gap among other genes that exhibit synteny. For HHPred, don’t check the two pieces of evidence checked, because their e-values are so high. It is better to consider HHPred as having had no relevant hits. Mention that CDD has no relevant hits. For the transmembrane domains, mention the use of TMHMM. CDS 34899 - 35489 /gene="42" /product="gp42" /function="SprT-like protease" /locus tag="Soondubu_42" /note=Original Glimmer call @bp 34899 has strength 17.7; Genemark calls start at 34899 /note=SSC: 34899-35489 CP: yes SCS: both ST: SS BLAST-Start: [SprT-like domain-containing protein [Pseudarthrobacter siccitolerans] ],,NCBI, q1:s1 100.0% 1.31321E-121 GAP: 142 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.823, -2.9625409806968013, yes F: SprT-like protease SIF-BLAST: ,,[SprT-like domain-containing protein [Pseudarthrobacter siccitolerans] ],,WP_050053680,91.8367,1.31321E-121 SIF-HHPRED: SprT-like domain-containing protein Spartan; DPC repair protease, DNA BINDING PROTEIN; HET: FLC, MLZ, ADP; 1.5A {Homo sapiens},,,6MDW_A,51.5306,99.5 SIF-Syn: /note=Primary Annotator Name: Mathkour, Yusef /note=auto-annotation: The glimmer and gene mark start do agree with one another, the glimmer start= 34899, and the genemark start= 34899. The start codon is ATG therefore there is a Higher probability of using this codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The chosen start site covers all coding potential. /note=SD (Final) Score: -2.963 is the best Final score on Peccan, The Z score is -2.963 this is the best Z score value on Peccan. /note=Gap/overlap: 142bp. the gap is reasonable large but the gap is conserved in other phages (VResidence) and there is no coding potential in the gap that might be a new gene. /note=phamerator: pham: 1210 date 10/9/23, its conserved, found in the AZ1 cluster with phage; VResidence (AZ1) and Berrie_Draft (AZ1). The assigned function for both Berrie_Draft is NKF but the function of VResidence is SprT-like protease /note=Starterator: Startsite 46 was found in 61 of 102 ( 59.8% ) of genes in pham. there are 43 of 82 Manual Annotations of this start and the start site is Called 98.4% of time when present. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 34899. /note=Function call: sprt-like protease; there is evidence of a function in comparable phages. On HHPRED the most similar function call has a probability is 99.5% which is above the 80% threshold showing a sign that it is closely related in function. The % Coverage: 51.5306% which is above the 35% threshold required but the E-value: is 9.6e-13, this is lower than what is required; being below 1*10^-6 showing it has a correlation to a known function. Phagesdb BLAST shows a hit with pahge VResidence, that has a score of 340 and a e value of 1e-93 for a sprt- like protease. NCBI BLAST has an identity of 83.6735% and 100% coverage and an e value of 1.31321e-121 for a sprt- like protease. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gallagher, Hannah /note=Secondary Annotator QC: Just need to check boxes for evidence since you have strong hits on NCBI, Phagesdb Blast, and CDD. These hits should also be discussed when talking about the function. CDS 35605 - 36405 /gene="43" /product="gp43" /function="hypothetical protein" /locus tag="Soondubu_43" /note=Original Glimmer call @bp 35605 has strength 6.98; Genemark calls start at 36088 /note=SSC: 35605-36405 CP: no SCS: both-gl ST: NA BLAST-Start: [HNH endonuclease signature motif containing protein [Actinoallomurus iriomotensis] ],,NCBI, q9:s12 94.7368% 8.53609E-93 GAP: 115 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.16, -5.373853743578363, no F: hypothetical protein SIF-BLAST: ,,[HNH endonuclease signature motif containing protein [Actinoallomurus iriomotensis] ],,WP_285574563,66.4179,8.53609E-93 SIF-HHPRED: d.4.1.8 (A:513-673) CRISPR-associated endonuclease Cas9/Csn1, HNH domain {Actinomyces naeslundii [TaxId: 1115803]} | CLASS: Alpha and beta proteins (a+b), FOLD: His-Me finger endonucleases, SUPFAM: His-Me finger endonucleases, FAM: HNH domain from CRISPR-associated protein Cas9,,,SCOP_d4ogca2,26.6917,98.4 SIF-Syn: /note=AF: Hits to HNH endonucleases, but I did not see the requisite H-N-H in a 20-30 aa span. /note=--- /note=Primary Annotator Name: Moore, Joshua /note=Auto-annotation: Glimmer calls the start site at 35605 while GeneMark calls the start site at 36088. Glimmer appears to be correct. /note=Coding Potential: The ORF only has coding potential on the forward strand, but is missing coding potential in significant portions of the ORF. Potential is found in GeneMark Host and GeneMark Self. /note=SD (Final) Score: -5.374 for 35605. This is the best final score on PECAAN. The z-score is the second highest at 2.16 with the highest at 2.168. The start codon is ATG. The scores are much worse on start site 36088 at -7.144 and 1.296, respectively. /note=Gap/overlap: The gap is 115 bp. There is no coding potential in this gap. This is not generally conserved in similar phages such as KeAlii (AZ), Nitro (AZ), B22 (BW), or Doucette (BW). However, if this were not a real gene, then the gap between the gene prior to this one and the gene after would be about 1000 bp. /note=Phamerator: Pham 109741. 10/04/23. This is an orpham. /note=Starterator: This gene is an orpham. /note=Location call: Based on the extremely significant gap that would be left if this gene were removed, this gene is a real gene. Based on the above evidence, this gene has a start site of 35605. /note=Function call: HNH Endonuclease. Some PhagesDB BLAST hits agree with this function, but their e-values are only e-5. There are many NCBI BLAST hits that agree (identity 51%+, coverage 92.48%+, e-value < e-86). CDD had a relevant hit for HNH Endonuclease with some identity (37.5%), poor coverage (19.17%), but a decent e-value (3.57e-7). HHPred had a relevant hit for HNH Endonuclease (98.4% probability, 26.69% coverage, e-value 9.8 e-6). HNN motif found within 30 aa’s, which is suggested as an alternative to HNH motif. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Bonthala, Praneel /note=Secondary Annotator QC: I agree with both the location and function calls. I would check with an instructor to see if the coding potential in the ORF is suspect or not, because it looks acceptable to me! CDS 36476 - 37435 /gene="44" /product="gp44" /function="DNA methyltransferase" /locus tag="Soondubu_44" /note=Original Glimmer call @bp 36476 has strength 9.66; Genemark calls start at 36542 /note=SSC: 36476-37435 CP: yes SCS: both-gl ST: NI BLAST-Start: [DNA cytosine methyltransferase [Pseudoclavibacter terrae]],,NCBI, q5:s3 98.7461% 0.0 GAP: 70 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.821, -3.494990588194529, no F: DNA methyltransferase SIF-BLAST: ,,[DNA cytosine methyltransferase [Pseudoclavibacter terrae]],,WP_271986570,87.0662,0.0 SIF-HHPRED: Cytosine-specific methyltransferase; STRUCTURAL GENOMICS, PROTEIN STRUCTURE INITIATIVE, NEW YORK STRUCTURAL GENOMIX RESEARCH CONSORTIUM, NYSGXRC, cytosine methylase, PSI-2, New; HET: MSE; 1.75A {Shigella flexneri 2a},,,3ME5_A,97.8056,100.0 SIF-Syn: /note=Primary Annotator Name: Shao, Sarah /note=Auto-annotation: Glimmer (36476, TTG) and GeneMark (36542, ATG), different start sites. /note=Coding Potential: Reasonable coding potential predicted, in GeneMark Self and Host Glimmer start site covers all coding potential, GeneMark start site misses some coding potential. There is coding potential in one of the reverse frames. /note=SD (Final) Score: -3.495 for Glimmer start site (36476). It is the best final score on PECAAN. /note=Gap/overlap: 70bp (upstream). Somewhat large, there is no coding potential in this gap (GeneMark Host/Self). /note=Phamerator: pham 118329. Date 10/11/23. Found in phages Omnicron (K), Rando14 (K). 3 phages are members of the pham, both show function of DNA methylase and have gene length 963bp. Start site 36476 gives most similar gene length (960bp). /note=Starterator: Starterator did not provide any helpful information to support a specific start site, none of the called start sites had any manual annotations. Start site 2 (36476) is closest to most annotated start site (start site 1, called in 2/2 of non-draft genes). Date 10/11/23. /note=Location call: Based on the above evidence, this is a real gene with a likely start site at 36476. /note=Function call: Function call: DNA methyltransferase. The top non-draft phagesdb BLAST hits have the function of DNA methyltransferase (written as DNA methylase/methyltransferase) [(Cueylyss, 3e-9), (TurkishDelight, 1e-18), (Float294, 1e-16)]. The 2 top NCBI BLAST hits have the function of DNA cytosine methyltransferase [(Pseudoclavibacterterrae, 99% coverage, 81% identity, e-value 0), (Rathayibacter tritici, 100% coverage, 80% identity, e-value 0)]. CDD also had relevant hits showing function of DNA methyltransferase [(C-5 cytosine-specific DNA methylase, coverage 95%, identity 35%, e-value 0), (cytosine-C5 specific DNA methylases, coverage 95%, identity 41%, e-value 0). HHpred showed several relevant hits indicating a function of DNA methyltransferase [(Cytosine-specific methyltransferase, Shigella flexneri 2a str. 2457T, 100% probability, 98% coverage, e-value of 2.6e-41), (Modification methylase, Haemophilus parahaemolyticus, 100% probability, 98% coverage, e-value of 6.4e-41)]. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chang, Amanda /note=Secondary Annotator QC: I agree with this location and functional call. Remove line about gene being an orpham from location call. CDS complement (37484 - 37639) /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="Soondubu_45" /note=Genemark calls start at 37639 /note=SSC: 37639-37484 CP: yes SCS: genemark ST: NA BLAST-Start: GAP: 116 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.618, -4.036208988580519, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chang, Amanda /note=Auto-annotation: Just GeneMark. The start site called is at 37639 and is an ATG start. /note=Coding Potential: Coding potential of this ORF is only on the reverse strand and is decently strong on GeneMark Self and Host. The length of this gene is 156 bp, which fulfills the length requirement for protein-coding genes. The gene is not found in other phages within the AZ1 cluster. /note=SD (Final) Score: The final score is -4.036. This is the best final score on PEECAN. /note=Gap/overlap: There is +92 gap. This gap is quite large but not large enough for a protein-coding gene (gap is less than 120 bp) and there is no indication of coding potential from GeneMark Host/Self. Also, considering the gene downstream is a forward gene, the gap is justified as a 50bp gap between forward and reverse genes are required. /note=Phamerator: pham: 11138 as of 10/9/23. The gene is an orpham. /note=Starterator: Starterator was not available as of 10/9/23 because this gene is an orpham. /note=Location call: Based on the evidence above, this gene is most likely real and most likely starts at 37639. /note=Function call: No known function. There were no hits from the PhagesDB Blast, NCBI Blast or CDD. All HHpred hits have large e-values (>1) and are not consistent amongst all calls. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wong, Michael /note=Secondary Annotator QC: Looks good! CDS 37756 - 39171 /gene="46" /product="gp46" /function="serine integrase" /locus tag="Soondubu_46" /note=Original Glimmer call @bp 37732 has strength 9.96; Genemark calls start at 37720 /note=SSC: 37756-39171 CP: no SCS: both-cs ST: SS BLAST-Start: [recombinase family protein [Arthrobacter sp. EPSL27] ],,NCBI, q2:s9 98.7261% 0.0 GAP: 116 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.796, -4.119336091842039, no F: serine integrase SIF-BLAST: ,,[recombinase family protein [Arthrobacter sp. EPSL27] ],,WP_066436046,72.479,0.0 SIF-HHPRED: INTEGRASE; HYDROLASE, SERINE RECOMBINASE, UNIDIRECTIONAL, SITE-SPECIFIC RECOMBINATION; 2.15A {STREPTOMYCES PHAGE PHIC31},,,4BQQ_A,58.1741,100.0 SIF-Syn: Serine Integrase. Upstream is NKF; downstream gene is NKF. This gene has no synteny with other phages. /note=Primary Annotator Name: Gallagher, Hannah /note=Auto-annotation: Genemark and Glimmer. Genemark calls the start at 37720 which is an ATG (highly probable start codon) while Glimmer calls the start at 37732 which is an ATG as well. /note=Coding Potential: Coding potential is in the forward ORF strand only, indicating that this is a forward gene. GeneMark Self and Host found coding potential. Start site 37732 covers the entire coding potential. /note=SD (Final) Score: Start @37732 has a Final Score of -5.803 which is the 4th highest final score. The Z-score is 1.469 which is also 4th highest and suggests the presence of a credible ribosome binding site. /note=Gap/overlap: 92 gap with the upstream gene that is conserved in Wocket (CV) and Barnstormer (EH). Start @37732 is the second longest ORF and has a length of 1440 bp. The LORF has a lower Final Score and Z-score and has a larger region of no coding potential. /note=Phamerator: Pham 117714 as of 10/8/2023. The pham is conserved in over 50 phages and conserved. Found in phages Barnstormer (EH), Wocket (CV), Itza (BD), and REQ1 (CF). /note=Starterator: Start 35 @37732 is found in 20/50 genes in pham and manually annotated in 2/46 non-draft genes. It is called 15% of the time when present. Start 39 is the most annotated start site with 31/46 non-draft manual annotations when called for in 39/50 total phages. Soondubu does not have start 39. /note=Location call: Based on the above evidence, this is a real gene, and the likely start site is 37732. /note=Function call: Serine integrase. The top Phagesdb BLAST hits are for serine integrase ([UtzChips, e-value of 8e-86], [Barnstormer, 1e-85], and [Caron, 1e-85]). The top NCBI BLAST hits are related to recombinase family proteins and serine integrase with identity > 38.9%, alignment > 54.29%, coverage > 96%, and e-value <3.5e-92. HHPred has two good hits related to integrase, serine recombinase, and serine integrase with probability of 100, coverage >58%, and e-values<2.7e-32. CDD has several hits related to serine integrase and recombinase with the best identity of 32.85%, alignment of 46.7%, coverage of 26.9%, and e-value of 2.669e-21. All of this evidence is in favor of serine integrase being the function of this gene. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Faulkner, Cheyenne /note=Secondary Annotator QC: I agree with this annotation. No changes needed at this time. CDS 39482 - 39739 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="Soondubu_47" /note=Original Glimmer call @bp 39482 has strength 15.08; Genemark calls start at 39482 /note=SSC: 39482-39739 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein PQE15_gp48 [Arthrobacter phage KeAlii] ],,NCBI, q10:s7 48.2353% 0.00108169 GAP: 310 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.216, -2.156361006712914, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE15_gp48 [Arthrobacter phage KeAlii] ],,YP_010678165,31.3131,0.00108169 SIF-HHPRED: SIF-Syn: The gene has synteny with pham 117708 in AZ cluster phages. /note=Primary Annotator Name: Bonthala, Praneel /note=Auto-annotation: Glimmer and GeneMark call the start site at 39482. /note=Coding Potential: Coding potential in the ORF is in the forward strand only, indicating that this is a forward gene. The coding potential covers the entire strand. /note=SD (Final) Score: -2.156 for start 39482. This is the highest Final Score on PECAAN. /note=Gap/overlap: The gap is 310 for start 39482. This is the smallest gap on PECAAN even though it is very large, suggesting there may be another gene in the gap. /note=Phamerator: Pham: 118747. The gene is an orpham. /note=Starterator: No available starterator report because it is an orpham. /note=Location call: Based on the above evidence, this is most likely a real gene with a start site at 39482. /note=Function call: NKF. PhagesDB blast had no known function hits. HHPred had no significant hits. The top 2 hits on NCBI Blast are hypothetical proteins with no known function. CDD had no hits either. However, the gene is syntenic with genes in pham 117708 when compared to other AZ cluster phages in pham maps. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Yao, Jiayu /note=Secondary Annotator QC: I agree with this annotation. CDS 39736 - 39963 /gene="48" /product="gp48" /function="RNA binding protein" /locus tag="Soondubu_48" /note=Original Glimmer call @bp 39736 has strength 10.59; Genemark calls start at 39736 /note=SSC: 39736-39963 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Herbiconiux moechotypicola] ],,NCBI, q12:s6 85.3333% 5.6705E-7 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.514, -4.5049749909507835, yes F: RNA binding protein SIF-BLAST: ,,[hypothetical protein [Herbiconiux moechotypicola] ],,WP_259478843,53.7313,5.6705E-7 SIF-HHPRED: Ssr3341 protein; HFQ, SM, RNA-BINDING PROTEIN, SRNA, TRANSLATIONAL REGULATION, RNA BINDING PROTEIN; 1.3A {Synechocystis sp. PCC 6803},,,3HFO_A,69.3333,95.9 SIF-Syn: This gene is a rna binding protein. The gene upstream is a NKF and the gene downstream is a NKF. /note=Primary Annotator Name: Faulkner, Cheyenne /note=Auto-annotation: Glimmer and GeneMark were both used to call the same start as 39736. /note=Coding Potential: Coding potential in this ORF is on the forward and the reverse strand, indicating that this could be a reverse or forward gene. Coding potential is found in both GeneMark Self and Host, but the chosen start site is not fully included in the GeneMark Self and Host, it is very close. /note=SD (Final) Score: -4.505. This is the highest final (least negative) score on PECAAN /note=Gap/overlap: -4. This overlap could mean that this gene is part of an operon. This overlap is conserved with Amyev (AZ) and Adolin (AZ). /note=Phamerator: pham:85659 as of 10/4/2023. This is conserved with Amyev (AZ). /note=Starterator: Start site 7 in starterator was manually annotated in 12/38 non-draft genomes. Start 7 is not the most annotated start site & Soondubu does not have the most annotated start site. Start 7 is also found in Crewmate(AZ1). This evidence agrees with the start site called by Glimmer & GeneMark /note=Location call: Based on the evidence above, this is a real gene and the start site is most likely 39736. /note=Function call: RNA binding protein. Phagesdb BLAST hits have the function of RNA binding protein with (E-Values<0.021), and NCBI BLAST did not have any hits with substantial coverage or probability. HHpred did have some significant hits with high coverage(>69%) and high probability (>95%) for RNA binding protein. CDD did not have any relevant hits. This is also the case for phages Adolin (AZ) and Amyev (AZ). /note=Transmembrane domains: DeepTMHMM did not predict any TMDs thus, is not a membrane protein. /note=Secondary Annotator Name: Bonthala, Praneel /note=Secondary Annotator QC: I agree with the location and function call for the gene, even though the e-values for the functions are very high. CDS 39960 - 40541 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="Soondubu_49" /note=Original Glimmer call @bp 39960 has strength 21.4; Genemark calls start at 39960 /note=SSC: 39960-40541 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE11_gp54 [Arthrobacter phage Warda] ],,NCBI, q5:s2 32.1244% 4.25478E-4 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.421, -6.189238014254766, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE11_gp54 [Arthrobacter phage Warda] ],,YP_010677894,30.597,4.25478E-4 SIF-HHPRED: SIF-Syn: Upstream is RNA binding protein and downstream is endolysin. No direct synteny in other species. /note=Primary Annotator Name: Wong, Michael /note=Auto-annotation: Both Glimmer & Genemark that agree on same start site (39960); ATG start codon called /note=Coding Potential: Reasonable coding potential predicted within putative ORF. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: SD score is the best (i.e. -6.189) as it is the least negative out of all other options and there is a moderately high Z-score (1.421) compared to the other z-scores. /note=Gap/overlap: The overlap with the upstream gene is small (-4 bp) indicating an operon due to its specific bp overlap. There are no alternative start candidates that are reasonable. The length of the gene is acceptable given the auto annotated start (chosen start site as well). /note=Phamerator: On 10/9/23, gene was found in pham #109858 and is an orpham. Warda was used for comparison. Phamerator called for the function of the gene to be an unknown function. /note=Starterator: No currently available starterator! /note=Location call: The gathered evidence involves z-score being moderately high compared to other start sites (note that it is not the highest as 40404 has the highest z-score but 40404 has a large gap of 440 and does not have the most favorable coding potential), codon is common (ATG), final score is the best (least negative) suggests that the original start site of 39960 is the best possible start site. The gene is a real gene (conserved in phamerator and good coding potential) and the potential start site of 39960 is the most likely potential start site candidate. The potential start site candidate of 39960 seems the most likely as it covers all coding potentials and is called by both Glimmer and Genemark. /note=Function call: Predicted function is an unknown function based on low (but best) evidence hits from PhagesDB, BLASTp, and NCBI BLAST. For PhagesDB, there was a moderate e-value (6e-8). For NCBI BLAST, there was query coverage (>32%), decent identity (>20%), and a weak e-value (0.000425478). There were no relevant hits for HHPred. /note=Transmembrane domains: There is an absence of TMDs as predicted by DeepTMHMM. /note=Secondary Annotator Name: Moore, Joshua /note=Secondary Annotator QC: For the SD (Final) Score, you could mention the Z-score, although it isn’t that high. It is, however, comparable to the other z-scores. You could also mention that start 40404 appears favorable (comparable final score, better z-score), but also mention reasons why it isn’t the best choice (coding potential most likely, and especially gap). For the Starterator dropdown, choose NA. The z-score is not the highest. Typo in “(least negative))” -> “(least negative)”. When describing the start site, your wording is a little redundant and could be made more brief. For HHPred, there is no need to check the boxes since the e-value is so high. You could instead say that there are no relevant hits, although what you wrote is also okay since you seemed to say it wasn`t relevant for the most part. However, it would probably be best to just say no relevant hits. CDS 40553 - 40912 /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="Soondubu_50" /note=Original Glimmer call @bp 40553 has strength 5.83 /note=SSC: 40553-40912 CP: yes SCS: glimmer ST: SS BLAST-Start: GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.159, -4.466674028236667, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: The gene upstream is NKF, the gene downstream is NKF. /note=Primary Annotator Name: Yao, Jiayu /note=Auto-annotation: Glimmer, GeneMark (host) and GeneMark (self) all agreed on the start site 40553. The gene displays synteny with other genes. /note=Coding Potential: coding potential is covered within the start site, and no reverse coding potential is shown /note=SD (Final) Score: -4.467, the least negative score among all the start sites /note=Gap/overlap: 11, the smallest gap among all the options /note=Phamerator: 10/9/2023, it was found in pham 112288. There is only 1 member in this pham. /note=Starterator: not found. /note=Location call: Based on the prediction of GeneMark and Glimmer, together with other evidence like Z-value (2.159), final score (-4.467) and gap value, the gene should be real and the start site is 40533. /note= /note=Transmembrane domains: No transmembrane domains are predicted. /note=Secondary Annotator Name: Mathkour, Yusef /note=Secondary Annotator QC: Looks good! no edits needed! CDS 41018 - 41170 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="Soondubu_51" /note=Original Glimmer call @bp 41018 has strength 14.93; Genemark calls start at 41018 /note=SSC: 41018-41170 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomycetaceae bacterium]],,NCBI, q2:s6 92.0% 9.20702E-11 GAP: 105 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.326, -2.2821867859826788, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomycetaceae bacterium]],,NUP32907,62.963,9.20702E-11 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Moore, Joshua /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 41018. /note=Coding Potential: There is significant coding potential in the ORF of this gene. /note=SD (Final) Score: -2.282. This is the best final score on PECAAN. The z-score is 3.326 and is also the best score. The start codon is ATG. /note=Gap/overlap: The gap is 105 bp. There is no coding potential in this gap. This gene doesn`t have synteny with other genes, so the gap is not comparable to other genomes. However, the gap isn`t quite large enough to host a new gene and it has no coding potential, so it is acceptable. /note=Phamerator: Pham 117431. 10/09/23. /note=Starterator: This gene is the only gene in its pham to call this start site. However, Soondubu only has 3 start sites, and this one produces the longest gene with the shortest gap. Its other start sites do not have any MA’s either. /note=Location call: Based on the above evidence, it is likely that this is a real gene and has a start site at 41018. /note=Function call: No known function (NKF). The only relevant hits (low e-value) on PhagesDB BLAST and NCBI BLAST are also NKF, and HHPRED and CDD have no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMRs. /note=Secondary Annotator Name: Bonthala, Praneel /note=Secondary Annotator QC: I agree with the location and function call. The gap is 105 with the chosen start site, however, not 13. CDS 41173 - 41388 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="Soondubu_52" /note=Original Glimmer call @bp 41173 has strength 17.85; Genemark calls start at 41173 /note=SSC: 41173-41388 CP: yes SCS: both ST: NA BLAST-Start: GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.282, -4.213314289702982, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shao, Sarah /note=Auto-annotation: Glimmer and GeneMark, agreed suggested start site (41173), ATG /note=Coding Potential: Reasonable coding potential predicted, start site covers all coding potential in GeneMark Self and Host. /note=SD (Final) Score: -4.213. It is the best final score on PECAAN. /note=Gap/overlap: 2bp gap (upstream). Very small gap (<7bp gap). /note=Phamerator: pham 113086. Date 11/9/23. There are no other members in this pham. This gene is an orpham. /note=Starterator: The gene is an orpham. Date 11/9/23. /note=Location call: Based on the above evidence, this is a real gene (an orpham) with a likely start site at 41173. /note=Function call: Function unknown (NFK). No relevant phagesdb BLAST hits (all high e-values, >1.9). No NCBI BLAST hits. No relevant HHPRED hits (all high e-values, >3). No relevant CDD hits (only one hit, relatively high e-value of 0.004). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wong, Michael /note=Secondary Annotator QC: Agree with all other evidence given! CDS 41388 - 41612 /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="Soondubu_53" /note=Original Glimmer call @bp 41388 has strength 13.99; Genemark calls start at 41388 /note=SSC: 41388-41612 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein PQE13_gp57 [Arthrobacter phage Elezi] ],,NCBI, q4:s10 60.8108% 2.79481E-5 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.823, -3.3136498407041004, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE13_gp57 [Arthrobacter phage Elezi] ],,YP_010678035,43.662,2.79481E-5 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chang, Amanda /note=Auto-annotation: Glimmer and GeneMark both call the start site at 41388. The start site is a GTG start codon. /note=Coding Potential: The coding potential of this ORF is only on the forward strand, making this a forward gene. Coding potential is high on both GeneMark Self and Host. /note=SD (Final) Score: -3.314. This is the best final score on PECAAN. /note=Gap/overlap: -1. There is a 1 bp overlap which is favorable. /note=Phamerator: pham: 113125 as of 10/8/23. /note=Starterator: Starterator is not available as of 10/9/23 because this gene is an orpham. /note=Location call: Based on the evidence above, this gene is most likely real and most likely starts at 41388. /note=Function call: No Known Function (NFK). The top 2 phagesDB hits of non-draft genes label the function as unknown with e-values under 10^-6. There are no HHpred values that fit the criteria to utilize one as evidence (e-values are too large). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gallagher, Hannah /note=Secondary Annotator QC: No comments, I agree with the primary annotator! CDS 41605 - 41763 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="Soondubu_54" /note=Original Glimmer call @bp 41605 has strength 5.11; Genemark calls start at 41605 /note=SSC: 41605-41763 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQE16_gp56 [Arthrobacter phage Reedo] ],,NCBI, q4:s18 80.7692% 2.38388E-6 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.92, -7.007307380299076, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE16_gp56 [Arthrobacter phage Reedo] ],,YP_010678239,39.1304,2.38388E-6 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gallagher, Hannah /note=Auto-annotation: Genemark and Glimmer. Both predict the start site at 41605. This is an ATG start site which is highly probable. /note=Coding Potential: Coding potential is in the forward ORF strand only, indicating that this is a forward gene. GeneMark Self and Host found coding potential. Start site 41605 covers the entire coding potential. /note=SD (Final) Score: Start 41605 has a final score of -7.007 and Z-score of 0.92. The final score and z-score are 4th lowest out of the start sites but still reasonable to suggest the presence of a credible ribosome binding site. /note=Gap/overlap: -8 overlap which is the smallest of the start sites. Only three phages in this pham and they have not been manually annotated. This overlap is not conserved in Chickaboom and Abidatro (drafts). This gene is 159 bp which is long enough to be considered. The longest open reading frame is 312 bp but this start site has a very large overlap with the upstream gene (161 bp) and has very little coding potential. /note=Phamerator: Pham 107977 as of 10/8/2023. Pham 107977 only contains 2 other phages: Chickaboom (AS1) and Abidatro (AS1). Overall, not very conserved. /note=Starterator: Starterator as of 10/8/2023. The most annotated start site is start 4 with 1/1 non-draft genes and is called 100% when present. This gene does not have that start site, but instead has start 6 that is present in 3/3 total genes in pham (2 are drafts). There are no manual annotations of start 6 but starterator calls it 55.7% of the time when present. The other phage with start 6 is Chickaboom (AS1). /note=Location call: Based on the above evidence, this is a real gene, and the likely start site is 41605. /note=Function call: NKF. The top three hits for Phagesdb Blast are function unknown with e-values <2e-7. HHPred has no hits with good e-values (the lowest is 19). NCBI Blast has a hypothetical protein with an e-value greater than 2e-6. No CDD hits were returned. These collective observations can allow this gene to be classified as having no known function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Wong, Michael /note=Secondary Annotator QC: Insert date of starterator check. Update synteny box with upstream and downstream genes. Everything else looks good, even with a tricky start site - good job! CDS 41883 - 42125 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="Soondubu_55" /note=Original Glimmer call @bp 41883 has strength 18.2; Genemark calls start at 41802 /note=SSC: 41883-42125 CP: yes SCS: both-gl ST: NA BLAST-Start: GAP: 119 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.216, -2.156361006712914, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Bonthala, Praneel /note=Auto-annotation: Glimmer calls the start site at 41883 and GeneMark calls the start site at 41802. /note=Coding Potential: Coding potential in the ORF is in the forward strand only, indicating that this is a forward gene. The coding potential covers the entire strand. /note=SD (Final) Score: -2.156 for start 41883. This is the highest Final Score on PECAAN. /note=Gap/overlap: The gap is 119 for start 41883. This is not the smallest gap on PECAAN, but the start site is still the best identified start site for the gene. Additionally, in other AZ phage genomes such as Amyev, there are also gaps around the size of 119. /note=Phamerator: Pham: 110067. The gene is an orpham. /note=Starterator: No available starterator report because it is an orpham. /note=Location call: Based on the above evidence, this is most likely a real gene with a start site at 41883. /note=Function call: NKF. PhagesDB blast had no known function hits. HHPred had no significant hits. NCBI Blast had no known function hits. CDD had no hits either. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: Wong, Michael /note=Secondary Annotator QC: Looks great! CDS 42197 - 42469 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="Soondubu_56" /note=Original Glimmer call @bp 42197 has strength 20.91; Genemark calls start at 42197 /note=SSC: 42197-42469 CP: yes SCS: both ST: NA BLAST-Start: GAP: 71 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.077, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Faulkner, Cheyenne /note=Auto-annotation: Glimmer and Genemark were used to predict the start site. Both call the start at 42197. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found in both GeneMark Self and Host, but the chosen start site is not fully included in the Host-trained or self-trainedGeneMark, but is extremely close. /note=SD (Final) Score: -2.443. This is the best score recorded by PECAAN /note=Gap/overlap: 71bp. This is a large gap, possibly enough to be able to fit a new gene downstream. Due to the lack of synteny this area has with other AZ phages, the conservation of this gap is unknown. There is no coding potential in the gap, more than likely no gene to be inserted. /note=Phamerator: pham: 111668 as of 10/6/23. This is an orpham /note=Starterator: This is an orphan, so it doesn’t have a starterator /note=Location call: Based on the known data above, the start site is likely 42197 as called by Glimmer and GeneMark. The most compelling evidence is the the z-value for this start site as it’s significantly higher than the rest (least negative) /note=Function call: NKF. Phagesdb BLAST has no significant hits as well as NCIB blast and CDD. There are also no significant hits on HHPRED that have a low enough e-vlaue. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs thus, is not a membrane protein. /note=Secondary Annotator Name: Wong, Michael /note=Secondary Annotator QC: I agree with the annotations! Missing a selection for suggestive start site and check coding potential for 71bp gap to see if there is coding potential there - if so add it as evidence. CDS 42688 - 42990 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="Soondubu_57" /note=Original Glimmer call @bp 42688 has strength 18.31; Genemark calls start at 42688 /note=SSC: 42688-42990 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VRESIDENCE_61 [Arthrobacter phage VResidence]],,NCBI, q1:s18 75.0% 1.06973E-7 GAP: 218 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.184, -4.337049294579155, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VRESIDENCE_61 [Arthrobacter phage VResidence]],,UYL87665,36.2903,1.06973E-7 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wong, Michael /note=Auto-annotation: Both Glimmer & Genemark that agree on same start site (42688); ATG start codon called /note=Coding Potential: Reasonable coding potential predicted within putative ORF. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: SD score is the best (i.e. -4.337) as it is the least negative out of all other options. /note=Gap/overlap: The gap with the upstream gene is 218 bps; however, after analyzing the coding potentials, there does not seem to be evident coding potential within the 218 bp gap (in both forward and reverse). There are no alternative start candidates that are reasonable. The length of the gene is acceptable given the auto annotated start (chosen start site as well). /note=Phamerator: On 10/9/23, gene was found in pham #117716. The pham in which the gene is conserved is in other members of the cluster/subcluster to which the phage belongs; VResidence used for comparison. Phamerator called for the function of the gene to have no known function. Functions called were consistent and found in the approved function list. /note=Starterator: The reasonable start site choice that is conserved among members of the pham to which the gene belongs is 17. Start: 17 @33487 has 2 MA`s. Found in 6 of 50 (12%) of genes in pham. The start number called the most often in the published annotations is 18, it was called in 25 of 33 of non-draft genes in pham. /note=Location call: The gathered evidence (z-score is the best (highest), codon is common (ATG), final score is the best (least negative)) suggests that the original start site of 42688 is the best possible start site. The gene is a real gene (conserved in phamerator and good coding potential) and the potential start site of 42688 is the most likely potential start site candidate. The potential start site candidate of 42688 seems the most likely as it covers all coding potentials and is called by both Glimmer and Genemark. /note=Function call: Predicted function is no known function based on hits from PhagesDB, BLASTp, and NCBI BLAST. For PhagesDB, there was a moderate e-value (9e-9). For NCBI BLAST, there was no significant evidence due to low identity and high e-values. Within HHPred, there was also no significant evidence due to high e-values. CDD had no hits. /note=Transmembrane domains: There is an absence of TMDs as predicted by DeepTMHMM. Secondary Annotator Name: Bonthala, Praneel /note=Secondary Annotator QC: I agree with the location and function call. However, the gap is 218 bp, not 146 bp. CDS 42991 - 43305 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="Soondubu_58" /note=Original Glimmer call @bp 43006 has strength 6.76; Genemark calls start at 43006 /note=SSC: 42991-43305 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein PQD78_gp54 [Arthrobacter phage BaileyBlu] ],,NCBI, q1:s1 100.0% 4.97939E-40 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.338, -4.038464273694425, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD78_gp54 [Arthrobacter phage BaileyBlu] ],,YP_010677558,75.0,4.97939E-40 SIF-HHPRED: SIF-Syn: The gene upstream is , the gene downstream is NKF. /note=Primary Annotator Name: Yao, Jiayu /note=Auto-annotation: Glimmer, GeneMark (host) and GeneMark (self) all agreed on the start site 42991. The gene displays synteny with other genes. /note=Coding Potential: coding potential is covered within the start site, and no reverse coding potential is shown /note=SD (Final) Score: -4.038, the least negative score among all the start sites /note=Gap/overlap: 0, smallest among all the options. The synteny suggests lots of overlaps in this area, which indicates that this gene might be an operon. /note=Phamerator: 10/9/2023, it was found in pham 117844. There were 29 members in this pham, and 3 of them were drafts. Phages CallinAllBarbz and BaileyBlue from the AZ cluster also have the same pham. /note=Starterator: 10/2/23, the most often called start site number was 5 (42991), it was called in 11 of the 26 non-draft genes in the pham. The auto-annotated start site is 7. Start site number 5, which is 42991, should be the start site for this gene because it is both the most often called site and the most conserved start site. /note=Location call: Based on the prediction of GeneMark and Glimmer, together with other evidence like Z-value (2.338), final score (-4.038) and gap value, the gene should be real and the start site is 42991. /note=Function call: The function should be no known function, because CallinAllBarbz and BaileyBlue genes with the same function have the scores closest to Soondubu (174 and 166), and the e-values are close to 0 in Phagesdb Blast. NCBI Blast shows that hypothetical protein have the coverage 100% while the identity is around 67%, which is pretty high. No CDD hits are found. The HHpred shows that the protein of unknown function has the highest probability (85%) and a high coverage (51%). Therefore, the function should be no known function. /note=Transmembrane domains: No transmembrane domains are predicted. /note=Secondary Annotator Name: Wong, Michael /note=Secondary Annotator Name: Wong, Michael /note=Secondary Annotator QC: Glimmer and GeneMark start do not agree with what you stated. What genes does it display synteny with? For coding potential, specify what you used. Other than that, I agree with other annotations - good job! CDS 43289 - 43648 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="Soondubu_59" /note=Original Glimmer call @bp 43289 has strength 12.96; Genemark calls start at 43289 /note=SSC: 43289-43648 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PQD78_gp55 [Arthrobacter phage BaileyBlu] ],,NCBI, q1:s1 97.479% 9.98727E-55 GAP: -17 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.315, -2.0162541296952132, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD78_gp55 [Arthrobacter phage BaileyBlu] ],,YP_010677559,81.0345,9.98727E-55 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Mathkour, Yusef /note=auto-annotation: The glimmer and gene mark start site agree with one another, the glimmer start is = 43289 and the genemark start is = 43289.The start codon is GTG therefore there is a Higher probability of using this codon. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. The chosen start site covers all coding potential. /note=SD (Final) Score: -2.016 is the best Final score on Peccan, The Z score is 3.315 this is the best Z score value on Peccan. /note=Gap/overlap: -17bp. the gap is conserved in other phages CallinAllBarbz (fp) and BaileyBlu (fp). /note=phamerator: pham: 106834 date 10/9/23, its conserved, found in the fp cluster with phages; CallinAllBarbz (fp) and BaileyBlu (fp) . The assigned function for both genes in NKF. /note=Starterator:Start site 3 Found in 5 of 10 ( 50.0% ) of genes in Pham.In start site 3 there were Manual Annotations of this start: 4 of 7. Start site 3 was Called 100.0% of the time when present /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 43289. /note= /note=Function call: NKF; there is no evidence of a function in comparable phages. On HHPRED the most similar function call has probability is 83.9 which is above the 80% threshold showing a sign that it is closely related in function. The % Coverage: 64.7059% which is above the 35% threshold required however the E-value: is 34, this is significantly higher than what is required which is below 1*10^-6 showing it has no correlation to a known function.Phagesdb BLAST Also shows NFK with a score of 186 and an e value of 2e-47 for the phage CallinAllBarbz. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gallagher, Hannah /note=Secondary Annotator QC: You need to check boxes for evidence even if it is no known function. Also, cite evidence for a function other than HHPred. CDS 43780 - 44127 /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="Soondubu_60" /note=Original Glimmer call @bp 43780 has strength 11.42; Genemark calls start at 43780 /note=SSC: 43780-44127 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Arthrobacter sp. B2a2-09] ],,NCBI, q13:s4 89.5652% 1.7041E-31 GAP: 131 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.077, -3.748312656400878, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter sp. B2a2-09] ],,WP_269998159,67.2897,1.7041E-31 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Shao, Sarah /note=Auto-annotation: Glimmer and GeneMark, agreed start site (43780), ATG /note=Coding Potential: Reasonable coding potential predicted, start site covers all coding potential in GeneMark Self and Host. /note=SD (Final) Score: -3.748. It is the best final score on PECAAN. /note=Gap/overlap: -26bp. This is a fairly large overlap, other genes show a larger overlap conserved but the overlap is generally 10bp (Adolin, DrManhattan). /note=Phamerator: pham 116447. Date 10/10/23. It is conserved, found in many other phages including Adolin (AZ), DrManhattan (AZ). 56 phages are members of the pham, almost none show a function and all have similar gene length (Soondubu is 348bp, others in pham are 350-360bp). /note=Starterator: Starterator did not provide any helpful information to support a specific start site, none of the called start sites had any manual annotations. Start site 10 (43780) is relatively close to the most annotated start (start site 9, called in 24/40 non-draft genes, 100% of genes that have annotated start) whereas the other 2 start site options are much further away. Date 10/10/23. /note=Location call: Based on the above evidence, this is a real gene with a likely start site at 43780. This start site is closest to the most annotated start site in Starterator. Start site leads to a gene length of 348bp, which is similar to the other genes in the pham. Other start site options would result in gene lengths of 69bp and 18bp. /note=Function call: Function unknown (NFK). The top non-draft phagesdb BLAST hits have low e-values [(Adolin, 1e-16), (DrManhattan 1e-16)] and show function unknown. The top NCBI BLAST hits also are hypothetical proteins [(B2a2, coverage 89%, identity 54%, e-value 1.7e-31), (Pseudoarthrobacter siccitolerans, coverage 89%, identity 48%, e-value 9.1e-28)]. There were no hits from CDD. No significant HHpred hits (all hits have large e-values >10). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gallagher, Hannah /note=Secondary Annotator QC: What phages is the gap conserved in? Everything else looks good, I agree with the primary annotator! CDS 44124 - 44414 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="Soondubu_61" /note=Original Glimmer call @bp 44124 has strength 6.43; Genemark calls start at 44124 /note=SSC: 44124-44414 CP: yes SCS: both ST: NI BLAST-Start: [HNH endonuclease [Arthrobacter phage Tbone] ],,NCBI, q10:s4 80.2083% 1.71028E-13 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.488, -3.790975366313636, yes F: hypothetical protein SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Tbone] ],,YP_010677838,54.7619,1.71028E-13 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chang, Amanda /note=Auto-annotation: Glimmer and GeneMark both call the start of this gene at 44124. /note=Coding Potential: The coding potential of this ORF is only on the forward strand, making this a forward gene. The coding potential is high on GeneMark Host and Self. /note=SD (Final) Score: -3.791. This is the best final score on PEECAN. /note=Gap/overlap: -4. Overlap of 4 bp which is acceptable (overlaps should not exceed 7 bp). This overlap could be indicative of the gene containing an operon. /note=Phamerator: pham: 117800 as of 10/8/23. This gene is conserved in VResidence (AZ) and VroomVroom (AZ). /note=Start site 16 is the most manually annotated start with 8 of 23 non-draft genes in this pham. Soondubu does not have start 16. Soondubu only has start candidates 14 and 27; neither of which were called in other manual annotations of this gene. Start site 14 is at 44124 which is the start site that GeneMark and Glimmer called for this gene. /note=Location call: Based on the evidence above, this is most likely a real gene and most likely starts at 44124. /note=Function call: No known function (NFK). The top 3 Phagesdb Blast hits of non-draft genes in the same cluster (AZ) call this gene as having no known function with low e-values (<10^-15). There were no HHpred hits with acceptable e-values (all were +0.01). Two NCBI non-draft hits named this gene’s function as HNH endonuclease with low e-values (<10^-13) however those hits were also labeled as hypothetical proteins. Another NCBI Blast hit labels this gene as a hypothetical protein with a low e-value (<10^-13). There are no CDD hits. By checking the SEA-PHAGES official function list and forum, in order for a gene to be called a HNH endonuclease, it “must have H-N-H over a 30 aa span.” This gene’s sequence fails to contain a H-N-H sequence and therefore cannot be called an HNH endonuclease contrary to what is seen in other phages. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bonthala, Praneel /note=Secondary Annotator QC: I agree with the location and function calls! CDS 44779 - 45078 /gene="62" /product="gp62" /function="HNH endonuclease" /locus tag="Soondubu_62" /note=Original Glimmer call @bp 44779 has strength 4.55 /note=SSC: 44779-45078 CP: yes SCS: glimmer ST: SS BLAST-Start: [HNH endonuclease [Microbacterium phage IAmGroot]],,NCBI, q1:s1 98.9899% 7.47214E-47 GAP: 364 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.315, -2.0949393225970705, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Microbacterium phage IAmGroot]],,QDF14248,81.8182,7.47214E-47 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,71.7172,97.6 SIF-Syn: Gene upstream is NKF. No gene downstream. There is synteny with Reedo`s (AZ1) upstream gene in that they are both Pham 117800 and NKF. /note=Primary Annotator Name: Gallagher, Hannah /note=Auto-annotation: Glimmer, Genemark is not available on PECAAN. Glimmer calls the start site at 44779. This is an ATG start site which is highly probable. /note=Coding Potential: Coding potential is in the forward ORF strand with a small amount of coding potential in reverse strand. This should still indicate that this is a forward gene. GeneMark Self and Host found coding potential. Start site 44779 covers the entire coding potential. /note=SD (Final) Score: Start 44779 has a Final Score of -2.095 which is the highest of the available start sites and suggests the presence of a credible ribosome binding site. The Z-score for this start is 3.315 which is greater than 2 and a good score. /note=Gap/overlap: 364 bp gap with the gene upstream which is a very large gap but conserved in phages Reedo (AZ1) and KeAlli (AZ1). This start site foregos a shorter gap of 226 bp but this start site has a much lower Final Score and Z-score. The gene is long enough to be considered a gene at 300 bp. /note=Phamerator: Pham 116021 as of 10/8/2023. The pham is conserved in phages belonging to AZ1 cluster (Reedo, Phives, and Tuck ), BD2, EH, BD3, BD6, etc. The phams database has the HNH endonuclease function called for in most of the phages in this pham. This function is found in the approved function list. /note=Starterator: Start 147 which corresponds to 44779 is the most annotated start (509/926 non-draft genes) in the pham. This start is present in phages in clusters A1, A11, A12, A14, A15, A19, A2, A20, A3, A6, A9, AZ1, AZ2, AZ3, AZ4, BD2, BD3, BD4, BD6, and EH. /note=Location call: Based on the above evidence, this is a real gene, and the likely start site is 44779. /note=Function call: HNH endonuclease. The top non-draft Phagesdb BLAST hits have a function of HNH endonuclease ([Reedo, e-value of 2e-48], [Phives, 3e-48]. [Tuck, 3e-48], [Amyev, 1e-47], etc.). The top HHPred hit (5HOM_A) is an HNH endonuclease with a probability of 97.6, coverage of 71.172%, and e-value of 0.0012. The top non-draft phages NCBI BLAST hits have a function of HNH endonculease ([IamGroot, 74.75% identity, 81.81% aligned, 98.9899% coverage, and an e-value of 7.472e-47], [Dr.Sierra, 82% identity, 89% aligned, 100% coverage, and an e-value of 1.5e-46], etc.). CDD returned a hit of HNH endonucleases with a 35.08% identity, 45.6% alignment, 42.42% coverage, and an e-value of 4e-4. SEA-Phages approved function list requires an H-N-H pattern across 30 AA and this AA sequence has this pattern. The CDD hit is not the strongest evidence of HNH endonuclease, but Phagesdb BLAST, NCBI BLAST, and HHPred are suggestive of this gene being an HNH endonuclease. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chang, Amanda /note=Secondary Annotator QC: Well done! I agree with this location and function call.