CDS complement (348 - 1070) /gene="1" /product="gp1" /function="hypothetical protein" /locus tag="Phroglets_1" /note=Original Glimmer call @bp 1070 has strength 8.42; Genemark calls start at 1070 /note=SSC: 1070-348 CP: yes SCS: both ST: SS BLAST-Start: GAP: 444 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.997, -2.644643059745525, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gonzalez Hernandez, Yalit /note=Auto-annotation: Genemark and Glimmer Start: 1070 at ATG start codon. Glimmer score of 8.42 /note=Coding Potential: The gene has a reasonable coding potential predicted within the putative ORF and is prominently found in the Self-Trained Genemark, while the Host-trained Genemark doesn’t indicate strong, sustained peaks of coding potential. Downward tick observed near the called 348 bp and an upward tick observed near the 1070 bp. Coding potential found in the reverse strand only. /note=SD (Final) Score: -2.645 (acceptably low) , Z-score is higher than 2 (2.997) /note=Gap/overlap: 444; gap is very big, since this is the first gene in the genome, and Host-Trained and Self-Trained Genemarks don’t indicate a presence of any other high peaks that might indicate another gene in front. /note=Phamerator: The gene on 01/08/25 was found to be in pham number, 199094. The gene is not found in any other cluster and no function has been given. /note=Starterator: Starterator seems to state that the most conserved start site for the gene is 3, which is found in 2 of 2 genes in the pham. No manual annotations of this start, but for seen as (Phroglets_1 Start: 1070; stop 348, Start Num: 3, (3, 1070)). Also, the length of the gene is 723 bp. /note=Location call: Yes, my notes show that this is a real gene and that it has good coding potential at the start site 3: at around 1070 bp. This start site covers the coding potential for this gene. Starterator was not too useful in confirming that that would be the correct “start site”, since Phroglets is the only phage genome with that specific start site, but it aligns with the start site that Genemark and Glimmer have called. /note=Function call: The predicted function of the gene remains uncharacterized (NKF) based on the results from multiple prediction tools. In NCBI BLASTp, the only hit with a 0.022 E value would be for a histone-like nucleoid-structuring protein Lsr2 with a low sequence identity of 42.44%. In Phagesdb BLASTp, there’s a hit of 0.005 E value for a capsid maturation protease with a low 26% sequence identity. CDD showed no significant results, but HHpred suggests that it’s a Protein Lsr2, DNA-binding protein, with a 90.45 probability and an e-value of 0.73. HHpred also indicated that it has a lot of coiled coil segments, which help facilitate protein-protein interactions and allows for proteins of domains to be interlocked. This information can support the call for it being a histone-like / DNA-binding protein, as suggested with (HHpred and NCBI BLASTp). /note=Transmembrane domains: No Transmembrane protein detected, but it’s seen to be an inside, globular protein, which is in line with the prediction of it being a histone-like / nucleoid-structuring protein. /note=Secondary Annotator Name: Baugh, Alex /note=Secondary Annotator QC: I agree with location and function call. /note=Primary Annotator Review: I agree with overall location and function call of the gene being a NFK. - YGH CDS 1515 - 1994 /gene="2" /product="gp2" /function="hypothetical protein" /locus tag="Phroglets_2" /note=Original Glimmer call @bp 1515 has strength 9.08; Genemark calls start at 1515 /note=SSC: 1515-1994 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein GURGLEFERB_3 [Arthrobacter phage GurgleFerb] ],,NCBI, q5:s10 95.5975% 4.41705E-9 GAP: 444 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.812, -3.3136498407041004, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein GURGLEFERB_3 [Arthrobacter phage GurgleFerb] ],,ASZ73158,45.2381,4.41705E-9 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gonzalez Hernandez, Yalit /note=Auto-annotation: GeneMark and Glimmer start at 1515 with a glimmer score of 9.08 and a start codon of ATG /note=Coding Potential: Coding potential only observed in the Self-Trained Genemark in the forward direction, upward tick observed in 1515 start and a downwards tick observed at 1994 bp with good coding peaks encapsulated in the start and stop. In the Host-Trained Genemark, upward and downward ticks are observed in the same places (1515 - 1994 bp), but no coding peaks seen inside (at all). /note=SD (Final) Score: -3.314 (acceptable) ; Z-score (2.812) is above 2 /note=Gap/overlap: 444; gap is really big, but no coding potential or good start and stop ticks seen in between the gene before it and this gene. Also, other predicted sites are have longer gaps. Length of the gene is 480 bp. /note=Phamerator: The gene on 01/12/2025 was found to be in pham number- 199789. The gene is the only gene in pham 199789 and it has no known function. /note=Starterator: There’s no starterator (01/13/2025) for it, so it’s not really helpful for determining if 1515 is the correct start site for this gene. But this gene seems to be unique to only Phroglets, and much still needs to be determined. /note=Location call: Yes, my gene is a real gene. Evidence supports that position 1515 is the start site, with confirmation from both the Self-Trained Genemark and the Genemark and Glimmer scores. However, the absence of a Starterator and the lack of coding potential observed in the Host-Trained Genemark make it difficult to definitively confirm. The gap between this gene and the preceding one is justifiable, as a 50 bp or greater separation is required whenever there is a switch from the reverse to forward direction, which is the case here, to accommodate space for promoters. /note=Function call: NKF; In both Phagesdb and NCBI BLASTp, there are no significant matches for the gene’s function. The closest match is a helix-turn-helix transcriptional regulator, with a 43.48% identity and an E-value of 0.024 (NCBI BLAST). However, this gene is associated with hypothetical proteins in GurgleFerb_3 (E-value 2e-10), Nellie_3 (E-value 2e-10), Adat_3 (E-value 2e-10), and several others, all belonging to phage cluster AV. CDD analysis reveals a single domain hit to the XRE superfamily, which is involved in DNA-binding transcriptional regulation, with a favorable E-value of 2.86e-05. Additionally, HHpred provides two notable hits. The first is a Methyl Phosphate Synthase (Iron Oxidoreductase), with an E-value of 7.5e-9, a high probability of 99.15%, and a score of 79.36, though this does not align with other findings. The second hit is a DNA-binding repressor protein, with an E-value of 6.4e-7, a 98.89% probability, and a score of 60.84, which aligns more closely with the gene’s potential role as a transcriptional regulator. Based on these results, the function of this gene remains uncharacterized, and further evidence is required to confirm any potential function. /note=Transmembrane domains: The DeepTMHMM results indicate that the protein lacks transmembrane domains and is an inside protein that is globular, which falls in line with it being a transcription repressor. /note=Secondary Annotator Name: Baugh, Alex /note=Secondary Annotator QC: I agree with location and function call. /note=Primary Annotator Review: I agree with overall location and function call of the gene being a NFK. - YGH CDS 1987 - 2640 /gene="3" /product="gp3" /function="hypothetical protein" /locus tag="Phroglets_3" /note=Original Glimmer call @bp 1987 has strength 12.73; Genemark calls start at 1987 /note=SSC: 1987-2640 CP: yes SCS: both ST: SS BLAST-Start: GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.971, -4.841703219753294, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Gonzalez Hernandez, Yalit /note=Auto-annotation: GeneMark and Glimmer start at 1987 with a glimmer score of 12.73 and a start codon of ATG /note=Coding Potential: Coding potential only observed in the Self-Trained Genemark in the forward direction, upward tick observed in 1987 start and a downwards tick observed at 2640 bp with good coding peaks encapsulated in the start and stop. In the Host-Trained Genemark, upward and downward ticks are observed in the same places (1987 - 2640 bp), but no coding peaks seen inside (at all). /note=SD (Final) Score: -4.842 ( a bit high, but one of the lowest; acceptable score); Z-score is less than 2, but it is close enough to be acceptable (1.971). Other starts seem to have too large of a gap or don’t encompass all the coding potential. /note=Gap/overlap: -8; not too much overlap between the last gene and the next, so acceptable. Length of the gene is 654 bp. /note=Phamerator: The gene on 01/13/2025 was found to be in pham number- 199420. There’s currently no other gene that is part of the pham, and it has no known function. /note=Starterator: There’s no starterator (01/13/2025) for it, so it’s not really helpful for determining if 1515 is the correct start site for this gene. But this gene seems to be unique to only Phroglets, and much still needs to be determined. /note=Location call: Yes, my gene is a real gene with a start site of 1987. Good coding potential found in the Self-Trained Genemark within this start, and other factors such as gap, SD and Z score are acceptable and within range compared to other ones. /note=Function call: NKF; In Phagesdb Blast, there’s a significant amount of matches with an e-value of 0.064 that claims resemblance to a tape measure protein (ex. WideWale_28, Updawg_28, Equmioh13_28). HHpred doesn’t have any significant hits, with e-values being too high. No results show up for CDD or NCBI BLASTp. Not enough information to make a call on the function. /note=Transmembrane domains: The DeepTMHMM results suggest that the protein contains one alpha transmembrane domain, with the majority of the protein located inside the membrane and a small portion extending outside near the end of the sequence. While "tape-measuring" proteins are typically not classified as true transmembrane proteins, as they do not completely span the membrane, they still interact with the membrane. In this case, the protein likely plays a role in pore formation, facilitating the injection of viral DNA into the bacterial cell. /note=Secondary Annotator Name: Baugh, Alex /note=Secondary Annotator QC: I agree with location and function call. /note=Primary Annotator Review: I agree with overall location and function call of the gene being a NFK. - YGH CDS 2625 - 3494 /gene="4" /product="gp4" /function="helix-turn-helix DNA binding domain" /locus tag="Phroglets_4" /note=Original Glimmer call @bp 2625 has strength 11.02; Genemark calls start at 2625 /note=SSC: 2625-3494 CP: yes SCS: both ST: SS BLAST-Start: [helix-turn-helix DNA binding domain protein [Microbacterium phage Shocker] ],,NCBI, q124:s65 54.3253% 7.51283E-9 GAP: -16 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.619, -3.441249337670496, no F: helix-turn-helix DNA binding domain SIF-BLAST: ,,[helix-turn-helix DNA binding domain protein [Microbacterium phage Shocker] ],,YP_010755422,32.9114,7.51283E-9 SIF-HHPRED: HTH_Tnp_1_2 ; Helix-turn-helix of insertion element transposase,,,PF13022.9,44.9827,97.8 SIF-Syn: helix-turn-helix DNA binding domain, downstream gene is NFK, and upstream gene is a cyclic nucleotide sequestering protein /note=Primary Annotator Name: Gonzalez Hernandez, Yalit /note=Auto-annotation: Genemark and Glimmer Start: 2625 with a glimmer score 11.02 with a start codon GTG. /note=Coding Potential: Coding potential only observed in the Self-Trained Genemark in the forward direction, upward tick observed in 2625 start and a downwards tick observed at 3494 bp with good coding peaks encapsulated in the start and stop. In the Host-Trained Genemark, upward and downward ticks are observed in the same places (2625 - 3494 bp), but no coding peaks seen inside (at all). /note=SD (Final) Score: -3.441 (lowest score observed, acceptable); Z-score (2.619) is above 2, which is good. /note=Gap/overlap: -16; overlap is a bit big, but acceptable. Other suggested start sites have a larger gap and don’t seem to encompass the whole coding potential peaks seen in the self-trained Genemark. Length of the gene is 870 bp. /note=Phamerator: The gene on 01/13/25 was found to be in pham number- 199365. There’s currently no other gene that is part of the pham, and it has no known function. /note=Starterator: There’s no starterator (01/13/2025) for it, so it’s not really helpful for determining if 2625 is the correct start site for this gene. But this gene seems to be unique to only Phroglets, and much still needs to be determined. /note=Location call: Yes, my gene is a real gene with a start site of 2625. Good coding potential found in the Self-Trained Genemark within this start, and other factors such as gap, SD (final) score, and Z-score are acceptable and within range compared to other ones. /note=Function call: The gene in question encodes a protein with a helix-turn-helix DNA-binding domain. It shows a significant hit in Phagesdb with an E-value of 1e-12. This protein is associated with hypothetical or unknown proteins in various phages, including GurgleFerb_4 (E-value 1e-19), Nellie_4 (E-value 1e-19), Adat_4 (E-value 1e-19), and others within phage cluster AV. The closest match in NCBI BLASTp is a helix-turn-helix protein, which shares 25% identity, 50% query coverage, and an E-value of 0.003. No conserved domains are identified in CDD. HHpred identifies two notable hits: one with a phage structural protein (2AO9_8), showing a 99.36% probability, an E-value of 6.7e-11, and a score of 99.94, and another with a helix-turn-helix domain from an insertion element transposase (PF13022.11), showing a 99.31% probability, an E-value of 1.8e-10, and a score of 96.98. The second hit aligns more closely with the helix-turn-helix protein prediction from NCBI BLASTp. /note=Transmembrane domains: The DeepTMHMM results suggest that the protein does not contain transmembrane domains and is an intracellular, globular protein. Indeed, a helix-turn-helix (HTH) DNA-binding domain is classified as globular, as it consists of a compact, folded structure formed by several key secondary elements (typically three alpha helices) arranged in a specific manner to create a defined three-dimensional shape. While the HTH is a motif, when integrated with other structural components, it contributes to the formation of a globular domain within the protein. /note=Secondary Annotator Name: Baugh, Alex /note=Secondary Annotator QC: I agree with location and function call. Make sure to fill out synteny box. /note=Primary Annotator Review: I agree with overall location and function call of the gene being a Helix-turn-helix DNA binding protein. Filled out the synteny box.- YGH CDS 3497 - 3736 /gene="5" /product="gp5" /function="cyclic nucleotide sequestering protein" /locus tag="Phroglets_5" /note=Original Glimmer call @bp 3497 has strength 9.46; Genemark calls start at 3497 /note=SSC: 3497-3736 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Nitrososphaera sp.]],,NCBI, q14:s13 82.2785% 4.1506E-10 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.378, -4.001086723917044, yes F: cyclic nucleotide sequestering protein SIF-BLAST: ,,[hypothetical protein [Nitrososphaera sp.]],,MCI0563110,51.8519,4.1506E-10 SIF-HHPRED: Vs.4; Binds 3`, 3`-cGAMP, VIRAL PROTEIN; HET: 4BW; 2.0A {Tequatrovirus},,,7UQ2_A,79.7468,99.5 SIF-Syn: cyclic nucleotide sequestering protein, downstream gene is a helix-turn-helix DNA binding protein, and upstream is NFK /note=Primary Annotator Name: Gonzalez Hernandez, Yalit /note=Auto-annotation: Genemark and Glimmer Start: 3497 with a glimmer score of 9.46 and a start codon of GTG /note=Coding Potential: Coding potential only observed in the Self-Trained Genemark in the forward direction, upward tick observed in 3497 start and a downwards tick observed at 3736 bp with good coding peaks encapsulated in the start and stop. In the Host-Trained Genemark, upward and downward ticks are observed in the same places (3497 - 3736 bp), but no coding peaks seen inside (at all). /note=SD (Final) Score: -4.001 (lowest score observed, acceptable). Z-score (2.378) is above the 2 and deemed acceptable. /note=Gap/overlap: 2; gap is miniscule and acceptable. Length of the gene is 240 bp. /note=Phamerator: The gene identified on 01/13/25 is part of pham number 199963, which contains a total of 251 members. The pham includes subclusters related to the A cluster (2, 5, 15, 9, 20), Cluster DQ, and Cluster E, among others. Three phages have been annotated with specific functions: Balomaji_102, which is classified as a cyclic nucleotide inhibitor from Cluster E, 198 bp in length; StellaBean_101, which is a cyclic nucleotide sequestering protein from Cluster E, also 198 bp; and Sunfish_10, a cyclic nucleotide sequestering protein, with no assigned cluster and a length of 246 bp. These functions are consistent with those called in the SEA-PHAGES database and are accepted as valid. /note=Starterator: Staterator was run on 01/10/25 and has 251 members with 37 drafts.The start number with the most published annotation is 70, called in 91 of the 214 non-draft genes in the pham. Start 55 is called in Phroglets_5 and is found in 1/251 with no manual annotations (55, 3497). Phroglets does not have the 70 start site, and other start candidates don’t encompass all the coding potential. /note=Location call: Yes, my notes confirm that this is a valid gene with strong coding potential at the start site, which is located at position 3,497 bp (55:). Starterator provided some assistance in pinpointing the start site for Phroglets, although it is important to note that no other genome shares this specific start site—it is a singleton and does not belong to any cluster. This start site is well-supported by the gene`s coding potential, and additional factors such as the gap, SD (final) score, and Z-score all fall within acceptable ranges. /note=Function call: Cyclic Nucleotide Sequestering Protein: In Phagesdb BLAST, a significant match was found with Barebow_4, identified as a cyclic nucleotide sequestering protein, with an E-value of 1e-04. However, NCBI BLAST did not provide any functional annotations for the hits, listing them only as hypothetical proteins. No conserved domains were detected in CDD. On the other hand, HHpred identified two notable hits: one to a viral protein from Tequatrovirus (7UQ2_A), which binds 3`,3`-cGAMP, with a 99.75% probability and an E-value of 1.3e-17, and another to a viral protein from Pseudomonas phage PaP2 (8H39_C), identified as an inhibitor complex, with a 99.59% probability and an E-value of 9.8e-15. These HHpred hits, along with other evidence, were used to classify Balomoji_102 as a cyclic nucleotide sequestering protein. For further details, see the document: https://docs.google.com/document/d/1fSkgaFkh0M1hqmrTow60vIwDzzEEmHhOm2yhtxau9a8/edit?tab=t.0 here. Additional modeling using alphafold confirms that it`s a homohexamer. /note=Transmembrane domains: The DeepTMHMM results indicate that the protein lacks transmembrane domains and is an inside protein that is globular, doesn’t contradict the claim of it being a cyclic nucleotide sequestering protein, but doesn’t add on to it. /note=Secondary Annotator Name: Baugh, Alex /note=Secondary Annotator QC: I agree with location and function call. Make sure to fill out the synteny box. /note=Primary Annotator Review: I agree with overall location and function call of the gene being a cyclic nucleotide sequestering protein. Synteny Box was filled out.- YGH CDS 3739 - 4023 /gene="6" /product="gp6" /function="hypothetical protein" /locus tag="Phroglets_6" /note=Original Glimmer call @bp 3739 has strength 8.61; Genemark calls start at 3736 /note=SSC: 3739-4023 CP: no SCS: both-gl ST: NI BLAST-Start: [DUF6275 family protein [Paenarthrobacter nicotinovorans] ],,NCBI, q2:s3 87.234% 6.12191E-25 GAP: 2 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.49, -3.628472733356124, yes F: hypothetical protein SIF-BLAST: ,,[DUF6275 family protein [Paenarthrobacter nicotinovorans] ],,WP_400456318,76.8293,6.12191E-25 SIF-HHPRED: DUF6275 ; Family of unknown function (DUF6275),,,PF19791.2,82.9787,100.0 SIF-Syn: /note=Primary Annotator Name: Pimentel, Patricia /note=Auto-annotation: Glimmer calls the gene to start at 3739, GenMark calls the gene to start at 3736. /note=Coding Potential: At the 3700-4200 range, there are no peaks present, therefore no conclusion can be ascertained on the coding potential based on this region alone. Downstream this region, In the nucleotide range 5200-6000, there is some good coding potential in this gene, as when comparing the direct sequence and complementary sequences, there are two peaks, those of which do not overlap. However, there is also some peaks present upstream this gene in the range 400-1000, which overlap between direct and complementary sequences in this region, suggesting the presence of bad coding potential. Coding potential is inconclusive at this point in analysis. Q/C recommended. There is some synteny present with Sunfish (gene 5 phroglets, gene 10 sunfish). /note=SD (Final) Score: +2.49 Z-score. The final score is -3.628 for the gene starting at 3739. This score is optimal and above the 1.8 threshold and supports the starting call site of 3739. /note=Gap/overlap:The gap for this gene is +2 which is good since it is below the 50 threshold limit. Supports call site of 3739 /note=Phamerator: As of 11/11/2025, the pham is 182449. This pham has 43 members,18 of which are drafts, which are in different clusters. There are 6 clusters represented in this pham. /note=Starterator: The auto-annotated start site is start 20, which corresponds to the position of 3739. There is no “most-annotated start site.” /note=Location call: Starts at 3739. Based on the evidence, my gene is a real gene as it has some synteny and has good coding potential. 3739 seems like the most likely location call start site as it has the best Z and final score. /note=Function call: unknown function call. /note=Transmembrane domains: There are 0 transmembrane domains. /note=Secondary Annotator Name: Iasi, Matilde /note=Secondary Annotator QC: Regarding coding potential, I agree on the inconclusive call, as there are no observable peaks in this region, as it lacks coding potential. I see a conserved gene in Cantare, which is not at the identical position but is of similar length and surrounded by similar genes. I agree on the gap but the lack of coding potential could suggest ambiguities when determining if this is a real gene. Z-score and final score however are in an optimal range suggesting this is a real gene but with some coding potential. I agree on the start site being called at 3739, due to the gap length, the optimal score values and the synteny with other phages` genomes. No function can be established with the present information. Low BLAST hits and no HHpred hits. /note= /note=Primary Annotators` Response to Secondary; Comments reviewed and corrections reflected. I agree with the secondary`s comments. CDS 4016 - 4558 /gene="7" /product="gp7" /function="minor tail protein" /locus tag="Phroglets_7" /note=Original Glimmer call @bp 4016 has strength 13.73; Genemark calls start at 4016 /note=SSC: 4016-4558 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein FDI47_gp05 [Arthrobacter phage Adat] ],,NCBI, q2:s6 98.3333% 7.60135E-34 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.806, -2.9754816289569264, yes F: minor tail protein SIF-BLAST: ,,[hypothetical protein FDI47_gp05 [Arthrobacter phage Adat] ],,YP_009613229,57.4586,7.60135E-34 SIF-HHPRED: CD70 antigen; Complex, TNF, Costimulation, Trimer, IMMUNE SYSTEM; HET: MAN, BMA, NAG, TRS, FUC; 2.69A {Homo sapiens},,,7KX0_A,77.2222,97.7 SIF-Syn: Minor tail protein: Allows virus entry into cells, localisation of viral components to the nucleus, in DNA binding, capsid formation and stability. /note=Primary Annotator Name: Pimentel, Patricia /note=Auto-annotation: Both Glimmer and GeneMark call the gene to start at 4016. /note=Coding Potential: There is good coding potential in this gene, as there are two peaks present in the direct sequence region which do not appear and therefore do not overlap in the complementary sequence within the 5200-6000 range. /note=SD (Final) Score: The Z-score is 2.806 while the final score is -2.975 for the call site of 4016. This score is optimal and above the 1.8 threshold. /note=Gap/overlap: There is an overlap of 8. There may be coding potential present within this gap, which requires further investigation to confirm. /note=Phamerator: As of 11/10/2025, the pham is 88246. This pham has 7 members, 1 of which are drafts. There are 6 genes that call start number 2 the most annotated start site which translates to 87.5% of the genes in pham. /note=Starterator: The auto-annotated start site is start number 1, which corresponds to the start position of 4016. There is no “Most annotated” start site. /note=Location call: Starts at 4016. Based on the evidence, my gene is a real gene. There is synteny present between genes 8 and 9 from Phroglets and 16 and 17 from Shocker Phage. /note=Function call: There are hits present on Phagesdb BLAST, HHPred, and NCBI with high probability and coverage that suggest this gene has the function of a minor tail protein. HHpred offers insightful elaboration on this gene’s function, as there is evidence of multiple hits present in immune system stimulations which could be investigated further when developing a research proposal that utilizes the unique function of this gene to inform pharmaceutical and/or ecological applications. /note=Transmembrane domains: According to DeepTMHMM, there are 0 total transmembrane domains. /note=Secondary Annotator Name: Iasi, Matilde /note=Secondary Annotator QC: I have reviewed the information and based on the optimal scores, the good coding potential and the PhagesDB and NCBI BLAST hits, I agree that this gene encodes the minor tail protein with the start site at 4016. /note=Primary Annotator`s Response to Secondary Annotator: I agree with the secondary`s comments and have reflected all comments. CDS 4561 - 4836 /gene="8" /product="gp8" /function="hypothetical protein" /locus tag="Phroglets_8" /note=Original Glimmer call @bp 4561 has strength 5.97; Genemark calls start at 4561 /note=SSC: 4561-4836 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein FDI47_gp07 [Arthrobacter phage Adat] ],,NCBI, q1:s1 100.0% 2.19002E-24 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.883, -2.895726594994071, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI47_gp07 [Arthrobacter phage Adat] ],,YP_009613231,71.4286,2.19002E-24 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Pimentel, Patricia /note=Auto-annotation: Both Glimmer and GenMark call the gene to start at 4561. /note=Coding Potential: There is good coding potential present in the forward direction of the ORF, as supported by the pham maps and host-trained genemark. /note=SD (Final) Score: The Z-score is 2.883 while the final score is -2.896. The Z-score is above the 1.8 threshold which indicates that this is a real gene. /note=Gap/overlap: There is no data provided for the gap in this gene, which is concerning and requires further investigation. /note=Phamerator: As of 01/10/2025, the pham is 6815. This pham has 8 members, 1 of which are drafts. /note=Starterator: The auto-annotated start site is found at start number 4. There is no “most-annotated” start site in this gene. /note=Location call: Starts at 4561, if it were included and considered a real gene. /note=Function call: There are a few hits on HHPred suggestive of minor tail protein. NCBI blast also indicates the presence of hypothetical protein function in sequences with high percentage identity, coverage, and alignments. /note=Transmembrane domains: There are 0 transmembrane present in this gene, according to DeepTMHMM. /note=Secondary Annotator Name: Iasi, Matilde /note=Secondary Annotator QC: After reviewing the information, I agree on the unknown function call, as there are no significant hits on NCBI BLAST and HHPred which suggest a potential function. I am confused about the lack of gap information, but I still think this is a real gene with the start site at 4561. I am not sure why this gene is not included. /note=Primary Annotator`s response to Secondary Annotator: I would assert that the gene is a minor tail protein due to the provided evidence listed above such as phamerator, phages db, and hhpred. CDS 4845 - 6437 /gene="9" /product="gp9" /function="terminase" /locus tag="Phroglets_9" /note=Original Glimmer call @bp 4845 has strength 9.5; Genemark calls start at 4845 /note=SSC: 4845-6437 CP: no SCS: both ST: NI BLAST-Start: [terminase large subunit domain-containing protein [Streptomyces rochei] ],,NCBI, q1:s14 98.8679% 6.86402E-113 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.402, -4.6382689799084815, no F: terminase SIF-BLAST: ,,[terminase large subunit domain-containing protein [Streptomyces rochei] ],,WP_386783807,54.5627,6.86402E-113 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,83.0189,99.9 SIF-Syn: /note=Primary Annotator Name: Pimentel, Patricia /note=Auto-annotation: Both Glimmer and GeneMark call the gene to start at 4845. /note=Coding Potential: There is some coding potential present within the forward strand of the direct sequence, however since this is a singleton there are no complementary sequences in the other strand. /note=SD (Final) Score: The Z-score of this gene is 2.402 and the final score is -4.638. The Z-score is above the 1.8 threshold. /note=Gap/overlap: There is a large of 286, however when looking at the Host-Trained Genemark, there are little to no hits present between the gaps which suggests that this may not be a real gene. /note=Phamerator: As of 01/13/2025, the pham is 7230 which has 8 members, 1 of which is a draft. There are two clusters represented in this pham: singleton, AV. /note=Starterator: The start number called the most often in the annotations is 2, genes /note=Location call: The gene starts at 4845. /note=Function call: terminase, as supported by the start call at 4845 and Phagesdb BLAST results which display high e value scores that correlate with terminase function and synteny with other genes who contain the same function. HHPRED also displays several hits for large terminanse subunits with high % coverage. NCBI blast also reports several hits for terminase protein, evident through the high % coverage and small gaps which are indicative of the accuracy and reliability of the function call. /note=Transmembrane domains: /note=Secondary Annotator Name: Iasi, Matilde /note=Secondary Annotator QC: I have reviewed the information and agree with the function and location call oof the first annotator. CDS 6452 - 7240 /gene="10" /product="gp10" /function="NFK" /locus tag="Phroglets_10" /note=Original Glimmer call @bp 6452 has strength 10.21; Genemark calls start at 6452 /note=SSC: 6452-7240 CP: no SCS: both ST: NI BLAST-Start: GAP: 14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.49, -4.678551597699442, yes F: NFK SIF-BLAST: SIF-HHPRED: BssS ; BssS protein family,,,PF13991.9,24.0458,30.4 SIF-Syn: /note=Primary Annotator Name: Pimentel, Patricia /note=Auto-annotation: Both Glimmer and GeneMark call the gene to start at 6452. /note=Coding Potential: There is some coding potential present within the forward strand of the direct sequence, however since this is a singleton there are no complementary sequences in the other strand. /note=SD (Final) Score: The Z-score of this gene is 2.49 and the final score is -4.679. The Z-score is above the 1.8 threshold, which suggests that this is a real gene. /note=Gap/overlap: There is a small gap of 14 which is reputable since it is below the 50 threshold limit. Supports call site of 6462. /note=Phamerator: This is an orpham, therefore no sequence can be generated. There is 1 member present and no clusters present. /note=Starterator: This is an orpham, therefore no sequence can be generated. /note=Function call: NKF due to the noncontributory sequence data provided. There is synteny present across Jasmine, Adat, Brad however the function calls are NKF. HHPRED shows some significant hits for BssS protein family as well as for a subunit of N-acyletethanolamine-hydrolyzing acid amidase, however the evidence is not supported by other calls from other programs, and the % probability and coverage are both too low. /note=Transmembrane domains: /note=Secondary Annotator Name: Iasi, Matilde /note=Secondary Annotator QC: I have reviewed the primary annotator`s comments and I agree on their location and function calls. CDS 7306 - 9471 /gene="11" /product="gp11" /function="portal protein" /locus tag="Phroglets_11" /note=Original Glimmer call @bp 7306 has strength 8.12; Genemark calls start at 7306 /note=SSC: 7306-9471 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage Jasmine] ],,NCBI, q54:s39 92.3717% 0.0 GAP: 65 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.727, -3.139452255336754, no F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage Jasmine] ],,YP_009594303,61.5922,0.0 SIF-HHPRED: Portal protein; Complex, VIRAL PROTEIN; 3.8A {Ralstonia phage GP4},,,8JOV_r,71.2899,100.0 SIF-Syn: Phroglets shares possible functions with the related phages Jasmine_15 and Casserole_13. Both of these are non-draft phages and have the function of a portal protein. The two best hits on HHpred also calls a portal protein function. For Jasmine and Casserole phages, NCBI Blast calls the function as a portal protein as well. CDD did not provide any significant hits. /note=Primary Annotator Name: Khachaturov, Allyson /note=Auto-annotation: Both Glimmer and GeneMark declare the start site as 7306. GeneMarkS has a consistent open reading frame from the start site to the stop. PECAAN also calls the start site to be at 7306 with a start codon of ATG. Since Phroglets is a singleton, it does not share any gene alignment on PhamMaps. Overall, this gene is a real gene due to the agreement on the start site on Glimmer, GeneMark, and PhagesDB blast. /note=Coding Potential: High coding potential due to the agreement of the start site on Glimmer, GeneMark, and PhagesDB blast. /note=SD (Final) Score: -3.139. Falls within the range needed to identify a viable start site. Shows more evidence towards this gene being a real gene. /note=Gap/overlap:-50. This gene has an overlap of -50 base pairs meaning that this start site is viable. This information is consistent with the information presented on GeneMarkS. Furthermore, the Z-score which is 2.727 falls within the range for a viable start site. /note=Phamerator: This gene has a pham number of 7115. There are 8 other phages within this pham. 6 of them are in cluster AV, while 2 of them (including froglets) are considered singletons. /note=Starterator: According to starterator, Froglets does not share the same start as any other member of its pham. This indicates that the gene may not be conserved and lacks synteny. This evidence is consistent with the fact that Phroglets is singleton. /note=Location call: Overall, based on the information above I would agree within the location call at 7306. /note=Function call: According to Phages DB blast, Phroglets shares possible functions with the related phages Jasmine_15 and Casserole_13. Both of these are non-draft phages and have the function of a portal protein. The two best hits on HHpred also calls a portal protein function. For Jasmine and Casserole phages, NCBI Blast calls the function as a portal protein as well. CDD did not provide any significant hits. Based off of all the evidence provided, I would call the function as a portal protein. /note=Transmembrane domains: DeepTMHMM states that this genes protein sequences does not yield any transmembrane proteins (TMRs=0). This is consistent with the other data due to the fact that portal proteins are responsible for packaging viral DNA. /note=Secondary Annotator Name: Mathew, Taniya /note=Secondary Annotator QC: I have QC`ed this location and function call and agree with the primary annotator. Remember to fill out the synteny box for this gene though! /note=-Comments addressed: AK CDS 9527 - 10417 /gene="12" /product="gp12" /function="hypothetical protein" /locus tag="Phroglets_12" /note=Original Glimmer call @bp 9527 has strength 8.49; Genemark calls start at 9527 /note=SSC: 9527-10417 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Streptomyces griseosporeus] ],,NCBI, q1:s1 97.973% 3.44119E-22 GAP: 55 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.848, -5.095062958286978, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces griseosporeus] ],,WP_190221170,51.9231,3.44119E-22 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Khachaturov, Allyson /note=Auto-annotation: Both Glimmer and GeneMark agree on start site at 9527 /note=Coding Potential: GeneMarkS shows consistent coding potential throughout the open reading frame. There is little visible gaping/ overlap. This gene does not share significant synteny with other phages; however, it has slight overlap with the Phage Attoomi which is also a singleton. This phage is not mentioned within Blast search results from Phages DB therefore, this is not conclusive evidence towards synteny with this phage. Overall, due to the fact that Glimmer and GeneMark agree on the start site at 9527 with a stop codon of ATG and is a real gene. /note=SD (Final) Score: The final score for this gene is -5.095 which falls within the range to call the start site. Furthermore the Z-score 1.848, falls within the range to confirm that the suggested start site is viable. /note=Gap/overlap: Has a gap of 55 base pairs which falls within the acceptable range for gaps between genes. The length of the gene is 891 base pairs. /note=Phamerator: This gene falls within pham 199179. According to PhagesDB, there are no other phages within this pham other than Phroglets. /note=Starterator: Starterator did not yield any results due to the fact that Phroglets is an orpham. /note=Location call: Overall, based on the information above, I would call the location at 9527. /note=Function call: According to Phages DB, many of the phages within the AV cluster,which have shown synteny with other annotated genes, have an unknown function starting at this genes start site. HHPRED did not provide any significant hits. NCBI Blast also showed results for no known function or a hypothetical protein but not for Arthobacter, only Stretomyces. This evidence, therefore, can not be considered. CDD also did not provide any significant hits. /note=Based on the data shown by Phages DB, the function is unknown. /note=Transmembrane domains: According to DeepTMHMM, this protein is inside and not a transmembrane protein. /note=Secondary Annotator Name: Mathew, Taniya /note=Secondary Annotator QC: I have QC`ed this location and function call and agree with the primary annotator. Remember to check the "All GM Coding capacity Box" above! /note=-Comments addressed: AK CDS 10449 - 11672 /gene="13" /product="gp13" /function="major capsid protein" /locus tag="Phroglets_13" /note=Original Glimmer call @bp 10449 has strength 7.46; Genemark calls start at 10449 /note=SSC: 10449-11672 CP: yes SCS: both ST: SS BLAST-Start: [major head protein [Arthrobacter phage Jasmine] ],,NCBI, q1:s1 100.0% 2.66017E-125 GAP: 31 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.312, -2.0111200136961407, yes F: major capsid protein SIF-BLAST: ,,[major head protein [Arthrobacter phage Jasmine] ],,YP_009594305,65.8416,2.66017E-125 SIF-HHPRED: Putative major capsid protein; Major capsid proteins, VIRUS; 3.35A {Nostoc phage A1},,,7F38_A,99.0172,99.9 SIF-Syn: This gene is a part of the pham 7207. According to PhagesDB, there are 8 other members of this pham. These are the same members from starterator which include Adat, Brad, Casserole, GurgleFerb, Jasmine, Nellie, and Shocker. All of these are in the AV cluster and have a gene length range between 1215-1224 basepairs. Shocker is also a singleton. All of these genes are listed to have function of a major capsid protein. This gene is a part of the pham 7207. The start of this gene is shared with many other non-draft phages including Adat_15, Brad_15, Casserole_15, GurgleFerb_15, Jasmine_17, Nellie_15, and Shocker_23. Despite Phroglets being a singleton, the variety of phages that share this start site indicate that there is synteny amongst these phages and further justify this start site to be correct. This also indicates that this gene is highly conserved. /note=Primary Annotator Name: Khachaturov, Allyson /note=Auto-annotation: Both glimmer and gene mark agree on the start site at 10449. The length of the gene is 1224 which is an acceptable length for the gene and indicates that it has coding potential. Furthermore, GeneMarkS shows little gapping between genes with ample amounts of peaks to indicate coding potential in both the reverse and direct sequences. According to Pham maps, there are no finalized phage genomes that align with Phroglets indicating no synteny in with other phages. This makes sense due to the fact that Phroglets is a singleton. It should be noted that the start codon is ATG /note=Coding Potential: Overall, this phage has high coding potential. /note=SD (Final) Score: The SD score is 3.312. The Z-score -2.011. Both the SD and Z-score fall within the ranges required to call this particular start site. /note=Gap/overlap: The gap for this gene is 31 basepairs which is low. This means that no further genes can be added at the start of this gene and also provides more evidence for 10449 to be the start site. Overall based on the evidence, this is in fact a real gene. Glimmer and GeneMark both call this gene and the data provided for possible gene candidates all fall within an acceptable range to call the start site and show that it is a real gene. There is also no gene overlap. /note=Phamerator: This gene is a part of the pham 7207. According to PhagesDB, there are 8 other members of this pham. These are the same members from starterator which include Adat, Brad, Casserole, GurgleFerb, Jasmine, Nellie, and Shocker. All of these are in the AV cluster and have a gene length range between 1215-1224 basepairs. Shocker is also a singleton. All of these genes are listed to have function of a major capsid protein. /note=Starterator: This gene is a part of the pham 7207. The start of this gene is shared with many other non-draft phages including Adat_15, Brad_15, Casserole_15, GurgleFerb_15, Jasmine_17, Nellie_15, and Shocker_23. Despite Phroglets being a singleton, the variety of phages that share this start site indicate that there is synteny amongst these phages and further justify this start site to be correct. This also indicates that this gene is highly conserved. /note=Location call: Due to the high agreement between many of the data in addition to the high conservation amongst other phage, I would agree with the location call. /note=Function call: According to PhagesDB Blast, all of the phages that are in the AV cluster are yielding hits with high e-values (1e-10^4-1e10^3). Furthermore they have high scores with values between 372-276. HHPRED provides two high hits (4.7e-32 and 1.1e-29) that both call the function as a major capsid protein. NCBI Blast also calls the function to be a major capsid proteins (e-values of the two best hits 2.6e-125 and 3.8e-125) in phages Jasmine and Casserole which further displays synteny. Lastly, CDD also calls the function as a major capsid protein. Overall, based off all this evidence and the low e-values, I would call the function as a major capsid protein. /note=Transmembrane domains: DeepTMHMM states that this protein is most likely an outside protein and does not yield any TMR hits. This is consistent with the function call because the major capsid protein is responsible for housing the phages viral genome. /note=Secondary Annotator Name: Mathew, Taniya /note=Secondary Annotator QC: I have QC`ed this location and function call and agree with the primary annotator. /note=-Comments addressed: AK CDS 11754 - 13895 /gene="14" /product="gp14" /function="hypothetical protein" /locus tag="Phroglets_14" /note=Original Glimmer call @bp 11754 has strength 8.72; Genemark calls start at 11787 /note=SSC: 11754-13895 CP: yes SCS: both-gl ST: SS BLAST-Start: [right-handed parallel beta-helix repeat-containing protein [Kocuria palustris] ],,NCBI, q66:s43 90.4628% 5.62376E-114 GAP: 81 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.812, -3.0425830684175623, yes F: hypothetical protein SIF-BLAST: ,,[right-handed parallel beta-helix repeat-containing protein [Kocuria palustris] ],,WP_281196951,49.492,5.62376E-114 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Khachaturov, Allyson /note=Auto-annotation: Glimmer and GeneMark do not call the same start site. Glimmer calls the start site at 11754 while GeneMark calls the start site at 11787. PECAAN calls the start site at 11754. Host-trained GeneMark overall does not have many peaks that indicate high coding potential, but some coding potential. However, GeneMarkS displays many peaks in both direct and reverse strands. There are no gaps in the ORF which shows that this is potentially a real gene for both Host-Trained GeneMark and Self-trained GeneMark. Pham maps does not align Phroglets with any other non-draft phage due to the fact that it is a singleton. /note=Coding Potential: Despite Glimmer and GeneMark not aligning in start sites, I would say that this gene has coding potential. /note=SD (Final) Score: SD score is -3.043 within an acceptable range to be considered a good match. Furthermore, the Z-score is greater than 2 (2.812) which further provides evidence that this is the suggested start site. /note=Gap/overlap: There is a gap of 81 base pairs for the start called by PECAAN which is 11754. /note=Phamerator: According to Phages DB, this phage is the only known member of this pham. The length of this gene is 2,142 base pairs. Due to the lack of information, this gene does not appear to be conserved with other phages. /note=Starterator: This gene is found in the pham 199547. Since Phroglets is a singleton, not every gene calls start sites with other phages. This is the case with this gene. Due to the lack of information, this gene does not appear to be conserved with other phages. /note=Location call: Based off all the evidence from PECAAN and GeneMarkS, I would agree with the location call at 11754. /note=Function call: PhagesDB does not have many significant hits that agree on one function. These hits have low e-values and disagree with on another. Some start that the function is unknown while others state that this gene is a minor tail protein. NCBI Blast does not yield any hits. CDD disagrees with the above calls and states that the highest hits are nitrous oxide accessory proteins however there is not much evidence to support this call. Overall, I would call this as NKF due to the lack of strong evidence. /note=Transmembrane domains: DeepTMHMM does not yield any transmembrane hits. The graphs this program provides show that this is likely either an outside protein or a signaling protein. /note=Secondary Annotator Name: Mathew, Taniya /note=Secondary Annotator QC: I have QC`ed this location and function call and agree with the primary annotator. /note=-Comments addressed: AK CDS 13928 - 14314 /gene="15" /product="gp15" /function="hypothetical protein" /locus tag="Phroglets_15" /note=Original Glimmer call @bp 13928 has strength 8.64; Genemark calls start at 13919 /note=SSC: 13928-14314 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein FDG92_gp20 [Arthrobacter phage Jasmine] ],,NCBI, q12:s9 90.625% 6.87755E-15 GAP: 32 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.389, -4.489364182966632, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDG92_gp20 [Arthrobacter phage Jasmine] ],,YP_009594309,57.3913,6.87755E-15 SIF-HHPRED: DUF2951 ; Protein of unknown function (DUF2951),,,PF11166.11,74.2188,99.1 SIF-Syn: /note=Primary Annotator Name: Lieu, Jadelien /note=Auto-annotation: Glimmer and GeneMark. Glimmer calls the start site 13928, while GeneMark calls the start site 13919. /note=Coding Potential: The forward strand has no coding potential on the Host-Trained GeneMark, but has coding potential on the Self-Trained GeneMark. /note=SD (Final) Score: The final score is likely -4.489. There are 2 potential final scores with the same high Z-score (2.389). Start site 13919 has a final score of -5.143 and start site 13928 has a final score of -4.489. The former has a true LORF, but the latter is the auto-selected gene. /note=Gap/overlap: The gap is 32 with a spacer of 14. This is an acceptable gap. /note=Phamerator: Pham number of 8073 as of 1/8/25. Gene seems to be conserved within the AV pham in phages like Adat, Brad, and Casserole. There are no notes or function calls. /note=Starterator: Pham number 8073 has 6 non-draft phages. Start site 3 is found in 100% of the genes in the pham, indicating 13928 in Phroglets, and is likely the most accurate start site. /note=Location call: This is likely a real gene with a start site of 13928. This is supported by the auto-annotation call of Glimmer and the Starterator call of start site 3. /note=Function call: Function call is unknown. The best BLAST hits with E values of 3e-66, 3e-15, and 4e-15 call the function unknown. Poorer E values call the function a tape measure protein. The top and only 3 hits on NCBI report a hypothetical protein. There are no hits on CDD. The best hit on HHpred (probability of 98.76, E value of about 2e-6) calls an unknown function, followed by calls of a conserved membrane protein and a haemolysin. /note=Transmembrane domains: There is likely one transmembrane domain. DeepTMHMM predicts one transmembrane domain (TMhelix) 18 amino acids long. /note=Secondary Annotator Name: Reynolds, Noah /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator`s locational call and functional call. /note=Primary Annotation Review: Secondary annotator comments recognized and addressed. JL CDS 14323 - 15105 /gene="16" /product="gp16" /function="lysin A, N-acetylmuramoyl-L-alanine amidase domain" /locus tag="Phroglets_16" /note=Original Glimmer call @bp 14323 has strength 7.45; Genemark calls start at 14323 /note=SSC: 14323-15105 CP: yes SCS: both ST: SS BLAST-Start: [lysin A N-acetylmuramoyl-L-alanine amidase domain [Arthrobacter phage Casserole]],,NCBI, q4:s3 88.0769% 1.15679E-85 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.142, -4.487636275856737, no F: lysin A, N-acetylmuramoyl-L-alanine amidase domain SIF-BLAST: ,,[lysin A N-acetylmuramoyl-L-alanine amidase domain [Arthrobacter phage Casserole]],,USL89102,63.6015,1.15679E-85 SIF-HHPRED: d.118.1.1 (A:1-157) N-acetylmuramoyl-L-alanine amidase PlyG {Bacillus anthracis [TaxId: 1392]},,,d1yb0a1,62.3077,99.8 SIF-Syn: /note=Primary Annotator Name: Lieu, Jadelien /note=Auto-annotation: Glimmer and GeneMark. Both call the start site as 14323. /note=Coding Potential: The forward strand has no coding potential on the Host-Trained GeneMark, but has coding potential on the Self-Trained GeneMark. /note=SD (Final) Score: The final score is -4.488, corresponding to a Z score of 2.142. This is not the best final score and Z score from the list, but is still high and more than 2, respectively. /note=Gap/overlap: There is a gap of 8 and a spacer of 12. This is an acceptable gap. /note=Phamerator: The pham number is 88322 as of 1/9/25. This is conserved in phages comprising the AV cluster such as Adat, Brad, and Casserole. All attribute the function to a lysin A, N-acetylmuramoyl-L-alanine amidase domain. /note=Starterator: Pham 88322 has 6 non draft phages. Start site 1 was called in one non draft phage and in Phroglets with a 100% call when present, while start site 2 was called in five non draft phages with a 83.3% call when present. Despite higher conservation in start site 2, Phroglets only contains start site 1, which corresponds to 14323. /note=Location call: This is likely a real gene with a start site of 14323. This is supported by the auto-annotation calls of Glimmer and GeneMark, as well as Starterator evidence (the presence of start site 1 but not 2, and call frequency). /note=Function call: Function call is likely a lysin A, N-acetylmuramoyl-L-alanine amidase domain. All top E values in BLAST hits (4e-70 to 6e-67, notably) and HHpred (E value < e-10, Probability > 99) call this function. Top NCBI hits also report this function, along with some calls for an endolysin; but lysin A, N-acetylmuramoyl-L-alanine amidase domain triumphs considering other evidence sources. No hits on CDD. /note=Transmembrane domains: DeepTMHMM reports no TMDs; only an outside domain. /note=Secondary Annotator Name: Reynolds, Noah /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator`s locational call and functional call. /note=Primary Annotation Review: Secondary annotator comments recognized and addressed. JL CDS 15184 - 15639 /gene="17" /product="gp17" /function="hypothetical protein" /locus tag="Phroglets_17" /note=Original Glimmer call @bp 15184 has strength 10.91; Genemark calls start at 15184 /note=SSC: 15184-15639 CP: yes SCS: both ST: SS BLAST-Start: GAP: 78 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.312, -2.0720764396375664, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lieu, Jadelien /note=Auto-annotation: Glimmer and GeneMark both call the start site 15184. /note=Coding Potential: The forward strand has no coding potential on the Host-Trained GeneMark, but has coding potential on the Self-Trained GeneMark. /note=SD (Final) Score: Best final score of -2.072 and a Z score of 3.312. /note=Gap/overlap: Gap of 78 and a spacer of 12. This is relatively large, but the gene is conserved in members of cluster AV. /note=Phamerator: Pham number of 7377 as of 1/10/25. Gene seems to be conserved within the AV cluster in phages like Adat, Brad, and Casserole. There are no notes or function calls. /note=Starterator: Pham number 7377 has 6 non-draft phages. Start site 3 is found and called 100% of the genes in the pham, but is not present in Phroglets. Instead, the phage contains start site 1, corresponding to 15184. /note=Location call: This is likely a real gene with a start site of 15184. This is supported by the auto-annotation calls of Glimmer and GeneMark, as well as Starterator evidence. /note=Function call: Function call is unknown. The best BLAST hits with E values of 7e-84 to 6e-09) call the function unknown. Poorer E values call the function a RecA-like DNA recombinase. There are no hits on NCBI or CDD. No significant hits on HHpred (E values > 38, Probability < 80). /note=Transmembrane domains: DeepTMHMM reports no TMDs; only an inside domain. /note=Secondary Annotator Name: Reynolds, Noah /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator`s locational call and functional call. /note=Primary Annotation Review: Secondary annotator comments recognized and addressed. JL CDS 15639 - 16295 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="Phroglets_18" /note=Original Glimmer call @bp 15639 has strength 10.0; Genemark calls start at 15639 /note=SSC: 15639-16295 CP: yes SCS: both ST: SS BLAST-Start: [structural protein [Arthrobacter phage Jasmine] ],,NCBI, q1:s1 99.5413% 2.61065E-38 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.278, -4.593456635492119, yes F: minor tail protein SIF-BLAST: ,,[structural protein [Arthrobacter phage Jasmine] ],,YP_009594312,54.7511,2.61065E-38 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lieu, Jadelien /note=Auto-annotation: Glimmer and GeneMark both call the start site as 15639. /note=Coding Potential: The forward strand has no coding potential on the Host-Trained GeneMark, but has coding potential on the Self-Trained GeneMark. /note=SD (Final) Score: Final score of -4.593 corresponding to a Z score of 2.278. These are good Z scores. /note=Gap/overlap: There is an overlap of -1 and a spacer of 8. This is preferred. /note=Phamerator: Pham number 7028 as of 1/13/25. Gene seems to be conserved within the AV cluster in phages like Adat, Brad, and Casserole, and one other singleton Shocker. Notes of a function call for a minor tail protein. /note=Starterator: Pham number 7028 has 7 non-draft phages. Start site 1 was found and called in 100% of all genes in the pham, including Phroglets, This corresponds to start site 15639, supporting previous evidence. /note=Location call: This is likely a real gene with a start site of 15639. This is supported by the auto-annotation calls of Glimmer and GeneMark, as well as conserved Pham and Starterator evidence. /note=Function call: It seems the function is a minor tail protein. Although top NCBI, CDD, and HHpred calls report hypothetical proteins, top BLAST hits call for a minor tail protein. CDD has a fit for a family of unknown function (DUF6682) and also appears as the third hit on HHpred. But function call for a minor tail protein is supported by strong evidence from BLAST and cluster AV. /note=Transmembrane domains: DeepTMHMM reports no TMDs; only an outside domain. /note=Secondary Annotator Name: Reynolds, Noah /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator`s locational call and functional call. /note=Make sure to check off any genes you used as evidence in the Phages DB blast are checked off /note=Primary Annotation Review: Secondary annotator comments recognized and addressed. JL CDS 16304 - 18379 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="Phroglets_19" /note=Original Glimmer call @bp 16304 has strength 9.65; Genemark calls start at 16304 /note=SSC: 16304-18379 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein FDG92_gp24 [Arthrobacter phage Jasmine] ],,NCBI, q1:s1 99.7106% 0.0 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.008, -3.0866669750886713, yes F: minor tail protein SIF-BLAST: ,,[hypothetical protein FDG92_gp24 [Arthrobacter phage Jasmine] ],,YP_009594313,62.8655,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Lieu, Jadelien /note=Auto-annotation: Glimmer and GeneMark both call the start site 16304. /note=Coding Potential: The forward strand has no coding potential on the Host-Trained GeneMark, but has coding potential on the Self-Trained GeneMark. /note=SD (Final) Score: Final score of -3.087 and Z score of 3.008, relatively good scores and the best on the list. /note=Gap/overlap: There is a gap of 8 and spacer of 8. This is questionable, but all evidence from the SD Score and the Starterator supports this start site. /note=Phamerator: Pham number 88088 as of 1/13/25. Gene seems to be conserved within the AV cluster in phages like Adat, Brad, and Casserole, and one other singleton Shocker. Notes of a function call for a minor tail protein. /note=Starterator: Pham number 88088 has 7 non-draft phages. Start site 1 is found in 87.5% of the phages and called 100% of the time. In Phroglets, this corresponds to start site 16304. /note=Location call: It seems the gene has a start site of 16304. This is supported by the auto-annotation calls of Glimmer and GeneMark, as well as conserved Pham and Starterator evidence. /note=Function call: It seems the function is a minor tail protein. All top BLAST hits call for a minor tail protein, and the third NCBI hit calls for a minor tail protein in Casserole. CDD contains no hits and all HHpred top hits are different; however, conservation of genes in the AV pham and evidence from BLAST and NCBI support this call. /note=Transmembrane domains: DeepTMHMM reports no TMDs; only an outside domain. /note=Secondary Annotator Name: Reynolds, Noah /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator`s locational call and functional call. /note=Make sure to fill out the drop downs for gene coding potential and suggested start site /note=Also please check off any box you are using to support your analysis, PhagesDB blast, NCBI and Blasts. /note=Primary Annotation Review: Secondary annotator comments recognized and addressed. JL CDS 18394 - 18624 /gene="20" /product="gp20" /function="hypothetical protein" /locus tag="Phroglets_20" /note=Original Glimmer call @bp 18394 has strength 6.05; Genemark calls start at 18394 /note=SSC: 18394-18624 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein FDG92_gp25 [Arthrobacter phage Jasmine] ],,NCBI, q27:s38 59.2105% 4.65698E-4 GAP: 14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.997, -2.5823297389851954, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDG92_gp25 [Arthrobacter phage Jasmine] ],,YP_009594314,38.3721,4.65698E-4 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liu, Carissa /note=Auto-annotation: Gene (stop@18624 F) Both Glimmer and GeneMark call the gene. They agree on the same start site, which was called at 18394 bp with a start codon of ATG. /note=Coding Potential: [start site 18394] Host-Trained GeneMark shows no coding potential. Comparatively, in Self-Trained GeneMark, the gene has reasonable (very good coverage) coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. The gene does not display synteny with other non-draft phage genomes (even in cluster AV). There are 6 PhagesDB BLAST hits that are non-draft phage genomes but they all do not have an e-value less than 10^-6. There is coding potential predicted by Glimmer and GeneMark Host. The gene is at least 120 bp long (231 bp). There are no switches in gene orientation. Only one frame in one strand is used for a protein-coding gene. /note=SD (Final) Score: [start site 18394] The SD score -2.582 is the best (least negative). The Z-value 2.997 is the best (closest to/higher than 2). The SD score is reasonable to suggest the presence of a credible RBS. /note=Gap/overlap: [start site 18394] 14 bp gap with the upstream gene is reasonable. The length of the gene is acceptable (231 bp). It is the longest reasonable ORF for this gene call. There are no large non-coding gaps before the gene. /note=Phamerator: Pham 199303 as of 1/12/25 has no other members. No function called. /note=Starterator: No available report as of 1/12/2025 since the gene is an orpham. /note=Location call: [start site 18394] The gathered evidence suggests that this is a real gene since it has good coding potential. The most likely start site is 18394 because the 14 bp gap is the most favorable and covers all coding potential. /note=Function call: NKF. No program returned any informative results. PhagesDB BLASTp: 6 hits of unknown function (poor e-value 10^-5). NCBI BLASTp: top hit is hypothetical protein in phage Jasmine AV cluster (coverage ~59%, identity ~24%, e-value ~4x10^-4). CDD: no hits. HHpred: top Pfam hit is interacting nuclear orphan protein (poor coverage ~15%, poor e-value 36). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Laughton, Alette /note=Secondary Annotator QC: Consider adding the poor evidence to the function call (i.e., top HHPred Pfam hit was interacting nuclear orphan protein, having poor coverage ~15% and poor e-value = 36). Otherwise, I agree with the locational and functional calls of this gene. (CL: agreed with all suggestions and made changes accordingly.) Based on the above evidence, I agree with the primary annotator`s locational call and functional call. CDS 18624 - 19331 /gene="21" /product="gp21" /function="hypothetical protein" /locus tag="Phroglets_21" /note=Original Glimmer call @bp 18624 has strength 9.46; Genemark calls start at 18624 /note=SSC: 18624-19331 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein FDI47_gp24 [Arthrobacter phage Adat] ],,NCBI, q50:s35 79.1489% 2.58938E-12 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.199, -4.291708691156382, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI47_gp24 [Arthrobacter phage Adat] ],,YP_009613248,37.5,2.58938E-12 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Laughton, Alette /note=Auto-annotation: Gene (stop@19331 F), Glimmer start called at 18624 bp, GeneMark start called at 18624 bp. The start at 18624 has a start codon of ATG. /note=Coding Potential: For the start site at 18624, the gene has reasonable coding potential in the Self-Trained system, but no reasonable coding potential in the Host-Trained GeneMark system, and the site covers all of the coding potential. The longest start in Host-Trained and Self-Trained GeneMark appears to be between 18600 and 18700 in the forward direction. The gene is 708 bp long, shows no switch in gene orientation from forward to reverse, and shows coding potential in only one frame of the forward GeneMark system. /note=SD (Final) Score: -4.292. This final score is the highest of the listed final scores for suggested start sites. The Z score is 2.199. This Z score is the highest of the listed Z scores for suggested start sites. /note=Gap/overlap: For the start site at 18624, 1 bp overlap. This is the smallest of the gaps/overlaps for listed start sites. This is the longest ORF for the gene and there is no coding potential suggesting the presence of another gene before the start site. /note=Phamerator: Pham 154274, run on 1/8/2025. It is not conserved, and the function is not called. /note=Starterator: Report not generated. /note=Location call: Based on the above evidence, this is a real gene. The gene is conserved and has reasonable coding potential. The likely start site for this gene is 18624. 18624 seems to be the correct start based on a smaller gap, better Z score, better final score, and a common start codon. /note=Function call: No known function (NKF). PhagesDB BLASTp returned the top 8 hits of No Known Function (E values < 7e-17). NCBI BLASTp returned the top 2 hits of a hypothetical protein, found in AV cluster phages Adat and GurgleFerb (E values < 4e-12, coverage = 79%, identity 28%). CDD returned no hits. HHpred returned the top Pfam hit as FtsH ternary system domain (E value = 41) and top PDB hit as Maltose/maltodextrin-binding periplasmic protein (E value = 180). /note=Transmembrane domains: DeepTMHMM does not call any trans-membrane domains (TMDs), and this is therefore not a transmembrane protein. /note=Secondary Annotator Name: Aguilar, Xeleste /note=Secondary Annotator QC: I agree with the location and functional call of the primary annotator. Note that the synteny box is not filled out and the box below the phamerator has no suggested start site selected and should be checked off (the drop down box). Besides this I agree with the call. /note=COMMENTS TO SECONDARY ANNOTATOR: Hi Xeleste! The drop down box for Starterator was selected as "NA" because no report was generated. The synteny box was not filled out because this gene is NKF, and therefore synteny is invalid. CDS 19341 - 20660 /gene="22" /product="gp22" /function="hypothetical protein" /locus tag="Phroglets_22" /note=Original Glimmer call @bp 19341 has strength 7.86; Genemark calls start at 19341 /note=SSC: 19341-20660 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein FDI47_gp25 [Arthrobacter phage Adat] ],,NCBI, q98:s74 51.4806% 8.63202E-8 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.008, -2.6395089437464523, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI47_gp25 [Arthrobacter phage Adat] ],,YP_009613249,27.2517,8.63202E-8 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Laughton, Alette /note=Auto-annotation: Gene (stop@20660 F), Glimmer start called at 19341 bp, GeneMark start called at 19341 bp. The start at 19341 has a start codon of ATG. /note=Coding Potential: For the start site at 19341, the gene has reasonable coding potential in the Self-Trained system, but no reasonable coding potential in the Host-Trained GeneMark system, and the site covers all of the coding potential. The longest start in Host-Trained and Self-Trained GeneMark appears to be between 19300 and 19400 in the forward direction. The gene is 1320 bp long, shows no switch in gene orientation from forward to reverse, and shows coding potential indicative of the true reading frame in only one frame of the forward GeneMark system. While there are four carrots indicating slip present, the coding potential for those slips are very low, and therefore will not be considered as any gene additions. /note=SD (Final) Score: -2.640. This final score is the highest of the listed final scores for suggested start sites. The Z score is 3.008. This Z score is the highest of the listed Z scores for suggested start sites. /note=Gap/overlap: For the start site at 19341, 9 bp gap. This is the smallest of the gaps for listed start sites. This is the longest ORF for the gene and there is no coding potential suggesting the presence of another gene before the start site. /note=Phamerator: Pham 199488, run on 1/8/2025. It is not conserved, and the function is not called. /note=Starterator: Report not generated. /note=Location call: Based on the above evidence, this is a real gene. The gene is conserved and has reasonable coding potential. The likely start site for this gene is 19341. 19341 seems to be the correct start based on a smaller gap, better Z score, better final score, and a common start codon. /note=Function call: No known function (NKF). PhagesDB BLASTp returned the top 7 hits of No Known Function (E values < 4e-8). NCBI BLASTp returned the top hit of a hypothetical protein, found in AV cluster phage Adat (E value < 8e-8, coverage = 51%, identity 16%). CDD returned no hits. HHpred returned the top Pfam hit as Family of unknown function (E value = 24) and top PDB hit as Putative outer membrane chaperone (E value = 400). /note=Transmembrane domains: DeepTMHMM does not call any trans-membrane domains (TMDs), and this is therefore not a transmembrane protein. /note=Secondary Annotator Name: Aguilar, Xeleste /note=Secondary Annotator QC: I agree with the location and functional call of the primary annotator. Note that the box below the phamerator has no suggested start site selected and should be checked off (the drop down box). Besides this I agree with the call. /note=COMMENTS TO SECONDARY ANNOTATOR: Hi Xeleste! The drop down box for Starterator was selected as "NA" because no report was generated. CDS 20669 - 26077 /gene="23" /product="gp23" /function="hypothetical protein" /locus tag="Phroglets_23" /note=Original Glimmer call @bp 20669 has strength 6.16; Genemark calls start at 20669 /note=SSC: 20669-26077 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein FDG92_gp28 [Arthrobacter phage Jasmine] ],,NCBI, q889:s949 50.5549% 6.94725E-96 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.302, -4.0962544390943965, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDG92_gp28 [Arthrobacter phage Jasmine] ],,YP_009594317,25.2912,6.94725E-96 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Laughton, Alette /note=Auto-annotation: Gene (stop@26077 F), Glimmer start called at 20669 bp, GeneMark start called at 20669 bp. The start at 20669 has a start codon of ATG. /note=Coding Potential: For the start site at 20669, the gene has reasonable coding potential in the Self-Trained system, but no reasonable coding potential in the Host-Trained GeneMark system, and the site covers all of the coding potential. The longest start in Host-Trained and Self-Trained GeneMark appears to be between 20600 and 20700 in the forward direction. The gene is 5409 bp long, shows no switch in gene orientation from forward to reverse, and shows coding potential in only one frame of the forward GeneMark system. /note=SD (Final) Score: -4.096. This final score is the highest of the listed final scores for suggested start sites. The Z score is 2.302. This Z score is the highest of the listed Z scores for suggested start sites. /note=Gap/overlap: For the start site at 20669, 8 bp gap. This is the smallest of the gaps for listed start sites. This is the longest ORF for the gene and there is no coding potential suggesting the presence of another gene before the start site. /note=Phamerator: Pham 199404, run on 1/8/2025. It is not conserved, and the function is not called. /note=Starterator: Report not generated. /note=Location call: Based on the above evidence, this is a real gene. The gene is conserved and has reasonable coding potential. The likely start site for this gene is 20669. 20669 seems to be the correct start based on a smaller gap, better Z score, better final score, and a common start codon. /note=Function call: No known function (NKF). PhagesDB BLASTp returned the top 7 hits of No Known Function (E values < 3e-41). NCBI BLASTp returned the top 2 hits of a hypothetical protein, found in AV cluster phages Jasmine and Adat (E values < 8e-95, coverage = 39-51%, identity 13-17%). CDD returned no hits. HHpred returned the top Pfam hit as Large polyvalent protein associated domain (E value = 96) and top PDB hit as Ribosome, Disome (E value = 510). /note=Transmembrane domains: DeepTMHMM does not call any trans-membrane domains (TMDs), and this is therefore not a transmembrane protein. /note=Secondary Annotator Name: Aguilar, Xeleste /note=Secondary Annotator QC: I agree with the location and functional call of the primary annotator. Note that the box below the phamerator has no suggested start site selected and should be checked off (the drop down box). Besides this I agree with the call. /note=COMMENTS TO SECONDARY ANNOTATOR: Hi Xeleste! The drop down box for Starterator was selected as "NA" because no report was generated. CDS 26074 - 27255 /gene="24" /product="gp24" /function="lysin A, L-Ala-D-Glu peptidase domain" /locus tag="Phroglets_24" /note=Original Glimmer call @bp 26074 has strength 8.35; Genemark calls start at 26074 /note=SSC: 26074-27255 CP: yes SCS: both ST: NA BLAST-Start: [endolysin [Arthrobacter phage Jasmine] ],,NCBI, q113:s157 71.2468% 4.04506E-116 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.505, -3.738562143838384, no F: lysin A, L-Ala-D-Glu peptidase domain SIF-BLAST: ,,[endolysin [Arthrobacter phage Jasmine] ],,YP_009594318,48.8532,4.04506E-116 SIF-HHPRED: PlyCA; Lysin, bacteriophage, Antimicrobial protein, viral protein; 3.3A {Streptococcus phage C1},,,4F88_2,33.3333,98.3 SIF-Syn: lysin A, L-Ala-D-Glu peptidase domain, downstream gene is holin, just like in phages Jasmine and Adat. /note=Primary Annotator Name: Laughton, Alette /note=Auto-annotation: Gene (stop@27255 F), Glimmer start called at 26074 bp, GeneMark start called at 26074 bp. The start at 26074 has a start codon of ATG. /note=Coding Potential: For the start site at 26074, the gene has reasonable coding potential in the Self-Trained system, but no reasonable coding potential in the Host-Trained GeneMark system, and the site covers all of the coding potential. The longest start in Host-Trained and Self-Trained GeneMark appears to be between 26000 and 26100 in the forward direction. The gene is 1182 bp long, shows no switch in gene orientation from forward to reverse, and shows coding potential in only one frame of the forward GeneMark system. /note=SD (Final) Score: -3.739. This final score is the highest of the listed final scores for suggested start sites. The Z score is 2.505. This Z score is tied for the highest of the listed Z scores for suggested start sites. /note=Gap/overlap: For the start site at 26074, 4 bp overlap. This is the smallest of the gaps/overlaps for listed start sites. This is not the longest ORF for the gene. The longest ORF is the start at 26065. There is no coding potential suggesting the presence of another gene before the start site. /note=Phamerator: Pham 199723, run on 1/9/2025. It is not conserved, and the function is not called. /note=Starterator: Report not generated. /note=Location call: Based on the above evidence, this is a real gene. The gene is conserved and has reasonable coding potential. The likely start site for this gene is 26074. 26074 seems to be the correct start based on a smaller gap, better Z score, better final score, and a common start codon. /note=Function call: lysin A, L-Ala-D-Glu peptidase domain. PhagesDB BLASTp returned the top 6 hits of lysin A, L-Ala-D-Glu peptidase domain (E values < 2e-100). NCBI BLASTp returned the top 2 hits of an lysin A, L-Ala-D-Glu peptidase domain, found in AV cluster phages Jasmine and Adat (E values < 2e-113, coverage = %, identity 61-65%). CDD returned the top hit of a CHAP domain (E value < 7e-6). HHpred returned the top Pfam hit as CHAP domain (E value = 0.011) and top PDB hit as Lysin (E value =1.8e-9). /note=Transmembrane domains: DeepTMHMM does not call any trans-membrane domains (TMDs), and this is therefore not a transmembrane protein. /note=Secondary Annotator Name: Aguilar, Xeleste /note=Secondary Annotator QC: I agree with the location and functional call of the primary annotator. Note that the synteny box does not mention upstream make sure to include this portion in description, the box below the phamerator has no suggested start site selected and should be checked off (the drop down box). Besides this I agree with the call. /note=COMMENTS TO SECONDARY ANNOTATOR: Hi Xeleste! The synteny box did not mention upstream because upstream is NKF, and Dr. Daniel confirmed I should not include it. The drop down box for Starterator was selected as "NA" because no report was generated. CDS 27257 - 27436 /gene="25" /product="gp25" /function="holin" /locus tag="Phroglets_25" /note=Original Glimmer call @bp 27257 has strength 5.01; Genemark calls start at 27257 /note=SSC: 27257-27436 CP: yes SCS: both ST: SS BLAST-Start: [holin [Arthrobacter phage Adat] ],,NCBI, q1:s1 93.2203% 5.08936E-24 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.807, -5.69021729366323, yes F: holin SIF-BLAST: ,,[holin [Arthrobacter phage Adat] ],,YP_009613252,83.6066,5.08936E-24 SIF-HHPRED: Phage_holin_7_1 ; Mycobacterial 2 TMS Phage Holin (M2 Hol) Family,,,PF16081.8,88.1356,96.6 SIF-Syn: holin, upstream gene is lysin A, L-Ala-D-Glu peptidase domain, just like in phages Jasmine and Adat. Downstream gene is NKF, unlike WhiB family transcription factor in phages Jasmine and Adat. /note=Primary Annotator Name: Laughton, Alette /note=Auto-annotation: Gene (stop@27436 F), Glimmer start called at 27257 bp, GeneMark start called at 27257 bp. The start at 27257 has a start codon of ATG. /note=Coding Potential: For the start site at 27257, the gene has reasonable coding potential in the Self-Trained system, but no reasonable coding potential in the Host-Trained GeneMark system, and the site covers all of the coding potential. The longest start in Host-Trained and Self-Trained GeneMark appears to be between 27200 and 27300 in the forward direction. The gene is 180 bp long, shows no switch in gene orientation from forward to reverse, and shows coding potential in only one frame of the forward GeneMark system. /note=SD (Final) Score: -5.690. This final score is the highest of the listed final scores for suggested start sites. The Z score is 1.807. This Z score is tied for the highest of the listed Z scores for suggested start sites. /note=Gap/overlap: For the start site at 27257, 1 bp gap. This is the smallest of the gaps for listed start sites. This is the longest ORF for the gene. There is no coding potential suggesting the presence of another gene before the start site. /note=Phamerator: Pham 194158, run on 1/9/2025. It is conserved, displaying synteny with genes in Jasmine (AV) and Adat (AV). The function is called as holin. /note=Starterator: 13/25 non-draft members call start site 14, which does not correlate to the chosen start site 15 at 27257 for this phage. Glimmer and GeneMark’s autoannotated start at 27257 still seems to be the most likely start due to ModicumRichard not containing the start at 14. /note=Location call: Based on the above evidence, this is a real gene. The gene is conserved and has reasonable coding potential. The likely start site for this gene is 27257. 27257 seems to be the correct start based on a smaller gap, better Z score, better final score, and a common start codon. /note=Function call: holin. PhagesDB BLASTp returned the top 8 hits of holin (E values < 9e-9). NCBI BLASTp returned the top 2 hits of holin found in AV cluster phages Jasmine and Adat (E values < 4e-10, coverage = 93-100%, identity = 52-86%). CDD returned no hits. HHpred returned the top Pfam hit as holin (E value = 8.3e-8) and top PDB hit as hypothetical membrane protein (E value =19). /note=Transmembrane domains: DeepTMHMM calls 2 trans-membrane domains (TMDs), and this is therefore a transmembrane protein, consistent with a holin. /note=Secondary Annotator Name: Aguilar, Xeleste /note=Secondary Annotator QC: I agree with the location and functional call of the primary annotator. Note that the box below the phamerator has no suggested start site selected and should be checked off (the drop down box). Besides this I agree with the call. /note=COMMENTS TO SECONDARY ANNOTATOR: Hi Xeleste! The drop down box for Starterator was selected, "SS" because I agreed with the suggested start site. I am not sure what other box you might be referring to. CDS complement (27887 - 28480) /gene="26" /product="gp26" /function="hypothetical protein" /locus tag="Phroglets_26" /note=Original Glimmer call @bp 28480 has strength 4.46; Genemark calls start at 28480 /note=SSC: 28480-27887 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein FDI47_gp30 [Arthrobacter phage Adat] ],,NCBI, q47:s46 53.8071% 2.49002E-6 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.214, -2.274496637415597, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein FDI47_gp30 [Arthrobacter phage Adat] ],,YP_009613254,23.5556,2.49002E-6 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ea, Emily /note=Auto-annotation: Both Glimmer and GeneMark call the start site at 28480. The start codon at this start site is ATG which is a common start site. /note=Coding Potential: There is no coding potential seen on the host-trained GeneMark, however there is good coding potential in the self-trained GeneMark in the reverse direction. For the self-trained GeneMark, the chosen start site covers all of the coding potential. There is suspicious coding potential in the forward direction in the complementary frame. This coding potential is large and overlaps with the coding potential in the reverse direction by over 100 bp. /note=SD (Final) Score: The final score is -2.274 and is the best score. The Z score is the best and above 2, at 3.214. /note=Gap/overlap: There is an overlap of 1 which may be evidence of an operon. There is a large gap upstream of the gene (predicted gene stop@ 27436) but that can be attributed to the switch from forward to reverse orientation which needs a gap of at least 50 bp. /note=Phamerator: There is no Phamerator report because there are no other phage genes in its Pham. /note=Starterator: There is no Starterator report because there are no other phage genes in Pham. /note=Location call: Although uncertain of whether or not this gene is real due to a coding potential in the complementary forward strand and a lack of any coding potential in the host-trained GeneMark, this gene seems to be real due to the length (594 bp), an overlap of 1 bp with the downstream gene, and good coding potential in the reverse direction in self-trained GeneMark. The start site is likely 28480 because Glimmer and GeneMark call the same start site and covers all of the coding potential in self-trained GeneMark, and has the best final and Z scores. /note=Function call: This sequence has no function due to lack of known function, low BLAST scores and low percent identity (below 35%) in PhagesDB BLAST hits and a NCBI BLAST hit with an insufficient e-value and percent identity. There were no CDD hits and all HHpred hits had insufficient e-values. /note=Transmembrane domains: This gene is not a transmembrane protein according to DeepTMHMM and is likely on the inside of the membrane. /note=Secondary Annotator Name: Jasso, Sarahi /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator`s locational call and functional call. CDS complement (28480 - 28701) /gene="27" /product="gp27" /function="hypothetical protein" /locus tag="Phroglets_27" /note=Original Glimmer call @bp 28701 has strength 4.71; Genemark calls start at 28701 /note=SSC: 28701-28480 CP: yes SCS: both ST: NA BLAST-Start: GAP: -17 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.142, -4.487636275856737, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ea, Emily /note=Auto-annotation: Glimmer and GeneMark both call the start site at 28701 with a start codon of GTG. /note=Coding Potential: There is no coding potential seen in the host-trained GeneMark but there is good coding potential in the self-trained GeneMark in the reverse direction. The coding potential in the self-trained GeneMark is covered within the auto-annotated start site and stop site. There is suspicious coding potential in the forward direction in the complementary strand. /note=SD (Final) Score: The final score is the best at -4.488. The Z score is 2.142 which is sufficient to pass the cutoff of 2. /note=Gap/overlap: There is a 17 bp overlap which is somewhat large but is not greater than 30 bp which is more uncommon. /note=Phamerator: There is no available Phamerator report as of 1/10/25. /note=Starterator: There is no available Starterator report as of 1/10/25. /note=Location call: The gene seems real but there is some small, suspicious coding potential in the forward direction in the complementary strand. The start site is likely at 28701 as called by both Glimmer and GeneMark. All of the coding potential is covered by this start site and the start site has the best final score (-4.488) and Z score (2.142). This start site is more reasonable than the only other one listed that had a gap of 151 bp as opposed to an overlap of 17 bp. /note=Function call: There are no significant hits from PhagesDB BLAST, NCBI BLAST, CDD or HHpred (lack of hits or insufficient e-values, etc.). The gene is noted to be on the inside of the membrane. /note=Transmembrane domains: This gene does not contain any transmembrane domains and is therefore not a transmembrane protein. /note=Secondary Annotator Name: Jasso, Sarahi /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator`s locational call and functional call. CDS complement (28685 - 29140) /gene="28" /product="gp28" /function="RuvC-like resolvase" /locus tag="Phroglets_28" /note=Original Glimmer call @bp 29140 has strength 8.96; Genemark calls start at 29140 /note=SSC: 29140-28685 CP: yes SCS: both ST: NI BLAST-Start: [Holliday junction resolvase [Arthrobacter phage Jasmine] ],,NCBI, q4:s3 96.6887% 1.64806E-13 GAP: 131 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.044, -4.690056473634766, no F: RuvC-like resolvase SIF-BLAST: ,,[Holliday junction resolvase [Arthrobacter phage Jasmine] ],,YP_009594324,52.8302,1.64806E-13 SIF-HHPRED: RuvC endonuclease; endonuclease, DNA junctions, replication, recombination, phage packaging, Holliday junction, RNase-H fold, DNA junction endonuclease, Holliday junctions; 1.68A {Lactococcus phage bIL67},,,4KTW_A,86.0927,99.4 SIF-Syn: /note=Primary Annotator Name: Ea, Emily /note=Auto-annotation: Glimmer and GeneMark call the start site at 29140 which has a start codon of ATG. /note=Coding Potential: All of the coding potential is covered by this start site. There is only one frame with coding potential. /note=SD (Final) Score: The final score is the best of all the listed start sites at -4.690. The Z score is second best at 2.044. /note=Gap/overlap: There is a 131 bp gap and it is the smallest gap from the listed start sites. /note=Phamerator: As of 1/10/25, this gene is in pham 7564 which contains 6 non-draft genes (cluster AV) and 1 draft gene (no cluster; Phroglets). The gene is conserved within cluster AV and is found in all members (7/7; 100%) such as Jasmine and Adat. /note=Starterator: As of 1/10/25, the most called start site is start number 3 for 6/6 (100%) of non-draft genes in this pham with 6 manual annotations. Although this is the most called start site, Phroglets does not contain this start site and is called at start number 2 instead with no manual annotations. /note=Location call: This gene is likely real due to the good coding potential in the reverse direction, decent length, synteny with cluster AV genes, and matching hits in PhagesDB BLAST. The start site is likely 29140 as auto-annotated by Glimmer and GeneMark because it has the best final and Z scores of all start sites, and has the smallest bp gap. Although the other non-draft genes (6/6; 100%) call a different start site, this gene did not contain that start site. Due to a small number of genes within this pham for comparison, the start site at 29140 is reasonable due to the other listed reasons. /note=Function call: Based on hits from PhagesDB BLAST, NCBI BLAST, and HHpred hits (no CDD hits), there is enough evidence that this gene functions as a RuvC-like resolvase. The top PhagesDB BLAST hits (E-values 2e-14) have RuvC-like resolvase as their function. The significant NCBI BLAST hits (E-values 2e-13, Identity>35%, 97% coverage) list their functions as a Holliday junction resolvase. The function as a Holliday junction resolvase seems to be synonymous with the RuvC-like resolvase function. The top HHpred hits (E-values e-17, coverage>99%) listed these functions: RuvC resolvase, RuvC endonuclease, and Holliday junction resolvase and affirms the functions from the BLAST hits. /note=Transmembrane domains: There are no transmembrane domains within this gene and is therefore not a transmembrane protein. /note=Secondary Annotator Name: Jasso, Sarahi /note=Secondary Annotator QC:Based on the above evidence, I agree with the primary annotator`s locational call and functional call. I would ask the instructor about the gap though, as it is quite large. DV - relatively large gap, but there is no coding potential. CDS complement (29272 - 29430) /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="Phroglets_29" /note=Original Glimmer call @bp 29430 has strength 1.82 /note=SSC: 29430-29272 CP: yes SCS: glimmer ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.739, -5.319675865613894, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ea, Emily /note=Auto-annotation: Glimmer calls the start site at 29430 with a start codon of GTG. GeneMark does not call a start site. /note=Coding Potential: The start site that Glimmer calls covers all of the coding potential in the self-trained GeneMark. There is no significant suspicious coding potential in any other frame. /note=SD (Final) Score: The Final (-5.320) and Z-scores (1.739) are the lowest of the possible start sites. Despite having the lowest scores, this start site is the only one with a feasible length. The start site @29430 results in a gene that is 159 bp long (>120 bp cutoff) as opposed to a length of 51 bp or lower. /note=Gap/overlap: There is a 4 bp overlap, which is indicative of being part of an operon and serves as evidence that this is a real gene. /note=Phamerator: There is no available Phamerator report as of 1/13/25. /note=Starterator: There is no available Starterator report as of 1/13/25. /note=Location call: There is coding potential in the self-trained GeneMark and the length is sufficient (159 bp>120 bp). There is not much other data available to use to determine whether it is real or not (synteny or BLAST results), however poor final and Z-scores may be due to the 4 bp overlap that is favorable and suggests that it is a real gene and is part of an operon. The lack of a start-site call from GeneMark is somewhat suspicious but the 4 bp overlap does support presence of an operon and is highly favorable. /note=Function call: There are no significant hits from PhagesDB BLAST and no NCBI BLAST hits. There are no CDD hits and no HHpred hits with significant values. /note=Transmembrane domains: There are no transmembrane domains found on DeepTMHMM and this gene is therefore not a transmembrane protein. /note=Secondary Annotator Name: Jasso, Sarahi /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator`s locational call and functional call. Make sure to indicate in the box below the starterator if it is at its suggested start site. The starterator is filled out as NA since there was no starterator report - EE 1/26/25 CDS complement (29427 - 31286) /gene="30" /product="gp30" /function="DNA polymerase I" /locus tag="Phroglets_30" /note=Original Glimmer call @bp 31082 has strength 3.74; Genemark calls start at 31229 /note=SSC: 31286-29427 CP: yes SCS: both-cs ST: NI BLAST-Start: [DNA polymerase I [Arthrobacter phage Adat] ],,NCBI, q30:s18 94.5073% 1.06251E-157 GAP: -74 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.203, -2.2186743274732437, yes F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Arthrobacter phage Adat] ],,YP_009613260,62.3064,1.06251E-157 SIF-HHPRED: SIF-Syn: DNA polymerase I, upstream gene is a DNA primase/helicase, just like in phage Adat (cluster AV). /note=Primary Annotator Name: Ea, Emily /note=Auto-annotation: Glimmer calls the start site at 31082 with a start codon of GTG. GeneMark calls the start site at 31229 with a start codon of TTG ( uncommon; about 7% of all genes). Glimmer has the more likely start site based on start codon. There is one start site @31196 that results in a 16 bp gap that could also be feasible with a start codon of TTG (not called by either Glimmer or GeneMark). Other start sites result in gaps or overlaps that are at or above 41 bp, making them very unlikely. /note=Coding Potential: There is good coding potential in self-trained GeneMark in the reverse direction. Neither of the auto-annotated start sites cover all of the coding potential, but the start site called by GeneMark (@31229) does cover most of it. There are some spikes of coding potential in other frames but this stretch of coding potential is longer which would indicate more/better coding potential for a functional gene. The start site @31286 would cover all of the coding potential. /note=SD (Final) Score: For start site @31082 (Glimmer), it has a Z-score of 1.762 and a final score of -6.039. For start site @31229 (GeneMark), it has a Z-score of 1.807 and a final score of -6.088. These final scores are comparatively a lot more negative than the rest (peaking at 2.219). The Z-scores are also not good because they are below the cutoff of 2, but the start site @31229 (GeneMark) is one of the only start sites with reasonable gaps/overlaps. Start site @31286 has a Z score above 2 and the least negative final score of those mentioned (-2,219). /note=Gap/overlap: For start site @31082 (Glimmer), there is a gap of 130 bp. For start site @31229 (GeneMark), there is an overlap of 17 bp. The start site @31082 called by Glimmer is not reasonable since it is such a large gap and would miss a lot of coding potential. This makes the start site called by GeneMark @31229 much more reasonable. Another start site @31196 has a gap of 16 bp which could also be reasonable in comparison to the Glimmer start site. Start site @31286 has an overlap of 74 bp which is quite large but does cover all of the coding potential. /note=Phamerator: As of 1/13/25, this gene is in Pham 200932 with 7 non-draft phages and 1 draft phage (Phroglets). It is mainly conserved found in 6/7 (86%) non-draft AV phages (includes Jasmine and Adat) and 1/7 (14%) non-draft singletons. Function is conserved and called as DNA polymerase I in 7/7 (100%) non-draft phages. /note=Starterator: As of 1/13/25, start number 5 is manually annotated in 6/7 (86%) non-draft phages (cluster AV) in this pham. Phroglets does not contain this start site (#5) and starts at start number 13 @31082. Start number 4 is manually annotated in 1/7 (14%) non-draft phages, which was the other singleton (Shocker) in this pham. Phroglets also does not contain start number 4. Since Phroglets does not contain either of these start sites with manual annotations, Starterator is not informative. /note=Location call: This gene is likely real due to strong coding potential in the reverse direction, sufficient length, synteny with members of the pham (AV phages and Shocker), and strong hits in PhagesDB and NCBI BLAST recognizing this sequence (e-values < e-126). The real start site of this gene does not have much conclusive evidence due to a lack of strong final/Z-scores and Starterator report that was not informative (lack of manually annotated start sites). Of the auto-annotated start sites, the GeneMark start site @31229 is more likely to be the actual start site due to better coverage of coding potential (more likely to contain necessary structures) and smaller overlap (17 bp). However, after consulting with the instructor, it was agreed upon that start site @31286 is the real start site since it covers all of the coding potential that is important for function and has the highest Z- and final scores. /note=Function call: There are strong hits from PhagesDB BLAST (e-126), NCBI BLAST (e-151, ~45% identity, 97% query coverage), and CDD (~e-65 and above) that all call function as DNA Polymerase I. The top three relevant HHpred hits (~e-61 and above, 100% probability, ~55% coverage) listed the following functions: DNA polymerase I and apicoplast DNA polymerase (specific to apicoplasts). Due to the large amount of strong hits from all of the above programs, this gene is a DNA polymerase I. **issue with HHpred hits showing on PECAAN /note=Transmembrane domains: There are no transmembrane domains in this gene according to DeepTMHMM and is therefore not a transmembrane protein. /note=Secondary Annotator Name: Jasso, Sarahi /note=Secondary Annotator QC: Based on the above evidence, I agree with the primary annotator`s locational call and functional call. Make sure to put that it does have GM coding capacity in the box above. GM coding capacity is now marked - EE 1/26/25 CDS complement (31213 - 32982) /gene="31" /product="gp31" /function="DNA primase/helicase" /locus tag="Phroglets_31" /note=Original Glimmer call @bp 32982 has strength 3.69; Genemark calls start at 32982 /note=SSC: 32982-31213 CP: yes SCS: both ST: NI BLAST-Start: [recA-like recombinase [Arthrobacter phage Adat] ],,NCBI, q2:s24 95.9253% 1.87483E-134 GAP: 17 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.631, -5.464279489298128, no F: DNA primase/helicase SIF-BLAST: ,,[recA-like recombinase [Arthrobacter phage Adat] ],,YP_009613261,54.9114,1.87483E-134 SIF-HHPRED: c.37.1.11 (A:) Hexameric replicative helicase repA {Escherichia coli [TaxId: 562]},,,d1nlfa_,40.4075,99.8 SIF-Syn: /note=Primary Annotator Name: Venkatraman, Rakshaa /note=Auto-annotation: Both Glimmer and GeneMark call start site 32982, with start codon ATG. /note=Coding Potential: Reasonable coding potential shown on self-trained GeneMark (above 0.5 for most of the region), no coding potential shown on host-trained GeneMark. The chosen start site does cover all of the coding potential. On the self-trained GeneMark, the coding potential is only shown in one frame, and there are no switches in gene orientation from forward to reverse. /note=SD (Final) Score: The final score of this start site is -5.464 and Z-score is 1.631. Although there are options with better final scores and z-scores, they present large gaps and poor start sites (TTG). Of those with low gaps, this is the best final and z-score. /note=Gap/overlap: There is a 17bp gap by this start site, which is the second lowest gap. The site with a gap of only 2bp starts with a start codon of TTG, which is a lot rarer, and has a worse final score. Gene has an acceptable length of 1770bp with this start/stop site. /note=Phamerator: Pham 6707 is suggested by a report run on 1/8/25. Given that Phroglets is a singleton, unable to conduct conservation analysis by cluster. Phamerator calls function DNA primase/helicase - there is one other singleton present in this pham with this function, and 6 other cluster AV phages. Interestingly, all these AV phage genes have lengths of 1866 bp and the singleton Shocker has a length of 1833 bp; the 1770 bp length of this Phroglets gene is therefore shorter in comparison (second largest length option with more favorable z/final scores though). /note=Starterator: Per starterator analysis run on 1/8/25, there are 7 non-draft members in pham 6707, none of a common cluster due to Phroglets being a singleton. Starterator is uninformative as Phroglets does not have the Most annotated start site and no other phages to compare the chosen start site with. /note=Location call: Gene seems to be a real gene as evidenced by the same start site of 32982 being called by Glimmer and GeneMark, with good length, common start codon, decent final score, and coding potential on self-trained GeneMark. /note=Function call: Suggested function of DNA Primase/Helicase. Six phagesDB hits with strong e-values (e-118 and e-117) suggesting function DNA Primase/Helicase. NCBI blast hits with strong e-values also support this function. HHPred results have high probability, decent coverage, and strong e-values, suggesting a replicative DNA helicase. I ran comparative HHPred results to confirm whether it has both Primase and Helicase components by comparing it with phage Schubert_31 from the approved function list, which showed a probability of 100, strong e-value, and high score. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Mathew, Mallika /note=Secondary Annotator QC: I agree with the primary annotation - the gene appears to be real, has coding potential on self-trained GeneMark, and common start codon. I agree with the function and location calls made which are supported by strong evidence. CDS complement (33000 - 33410) /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="Phroglets_32" /note=Original Glimmer call @bp 33410 has strength 6.56; Genemark calls start at 33410 /note=SSC: 33410-33000 CP: yes SCS: both ST: NA BLAST-Start: GAP: 82 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.402, -4.3372389842445, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Venkatraman, Rakshaa /note=Auto-annotation: Glimmer and GeneMark both suggest start site 33410 with start codon ATG, which is the most preferable start codon and with the smallest gap. /note=Coding Potential: Self-trained GeneMark has strong coding potential which is entirely covered by the chosen start site. Minimal to no coding potential shown on host-trained GeneMark. /note=SD (Final) Score: The chosen start site has both the best SD score and z-score from all possible options. /note=Gap/overlap: The chosen start site has a gap of 82 base pairs, which is the smallest gap possible, while still yielding the largest gene length of 411 bp. /note=Phamerator: Pham 199827 suggested by report on 1/10/25. The only phage in this pham is this one, and there are no others to compare it to. /note=Starterator: No starterator report was generated for this gene, as checked on 1/10/25. /note=Location call: Gene seems to be a real gene as evidenced by the same start site of 33410 being called by Glimmer and GeneMark, with good length, common start codon, decent final score, and coding potential on self-trained GeneMark. /note=Function call: Most likely function is no known function. No phagesDB hits with strong e-values or scores. No strong hits found on NCBI Blast, CDD, or HHPred as well. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Mathew, Mallika /note=Secondary Annotator QC: I agree with the location call based on Glimmer and GeneMark since starterator did not provide any information and this was the only phage in this pham. I agree with the function call due to no strong hits found on any databases. CDS complement (33493 - 33987) /gene="33" /product="gp33" /function="hypothetical protein" /locus tag="Phroglets_33" /note=Original Glimmer call @bp 33987 has strength 7.8; Genemark calls start at 33987 /note=SSC: 33987-33493 CP: yes SCS: both ST: NA BLAST-Start: GAP: 267 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.203, -2.5074698667202133, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Venkatraman, Rakshaa /note=Auto-annotation: Glimmer and GeneMark both call start site 33987, with start codon ATG. /note=Coding Potential: Significant coding potential in the self-trained GeneMark, and a small amount of coding potential in the host-trained GeneMark. Chosen start site covers the entire coding potential. /note=SD (Final) Score: Chosen start site has the best SD score and z-score, and also the greatest length. /note=Gap/overlap: This start site has a gap of 267bp, which is pretty high, but is the one with the least gap when compared to the other options. /note=Phamerator: Pham 199831 suggested by report on 1/10/25. The only phage in this pham is this one, and there are no others to compare it to. /note=Starterator: No starterator report was generated for this gene, as checked on 1/10/25. /note=Location call: Gene seems to be a real gene as evidenced by the same start site of 33987 being called by Glimmer and GeneMark, with good length, common start codon, decent final score, and coding potential on self-trained GeneMark. /note=Function call: Most likely a protein of unknown function. 1 PhagesDB Blast with strong e-value suggesting unknown function. No significant hits found on NCBI Blast, CDD, and HHPred. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Mathew, Mallika /note=Secondary Annotator QC: I agree with the function and location calls made as there is strong evidence for the start site at 33987 and the lack of significant hits on CDD, HHPred, NCBI Blast and PhagesDB suggest it is a protein with no known function. CDS complement (34255 - 35115) /gene="34" /product="gp34" /function="AAA-ATPase" /locus tag="Phroglets_34" /note=Original Glimmer call @bp 35115 has strength 6.5; Genemark calls start at 35163 /note=SSC: 35115-34255 CP: yes SCS: both-gl ST: NI BLAST-Start: [ATPase [Arthrobacter phage Adat] ],,NCBI, q2:s9 93.007% 2.09618E-46 GAP: 28 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.913, -4.899101348025862, no F: AAA-ATPase SIF-BLAST: ,,[ATPase [Arthrobacter phage Adat] ],,YP_009613263,43.3333,2.09618E-46 SIF-HHPRED: AAA_24 ; AAA domain,,,PF13479.9,72.3776,99.1 SIF-Syn: /note=Primary Annotator Name: Venkatraman, Rakshaa /note=Auto-annotation: Glimmer calls start site 35115 with start codon ATG while GeneMark calls start site 35163 with start codon GTG. /note=Coding Potential: No coding potential on the host-trained GeneMark. Significant coding potential on the self-trained GeneMark. There is a small amount of coding potential on another frame, but it does not seem significant since the chosen start site covers the entire ORF and has significant potential throughout its entire length. /note=SD (Final) Score: Chosen start site of 35115 has final score -4.899 and z-score of 1.913, which is the second best final score. The one with a better final score corresponds to start 35163 which has too large of an overlap to be present in a phage genome. There is also an option with a z-score of 2.15 but this has a suboptimal gap and final score. /note=Gap/overlap: The GeneMark start site has an overlap of 20 base pairs and is therefore not likely to be present in the phage genome. The Glimmer start site 35115 has a gap of 28bp, which is the smallest gap available. /note=Phamerator: Pham 8024 suggested by report on 1/13/25. There are 6 cluster AV phages in this pham, all with function of AAA-ATPase. Interestingly, all these have lengths of 1000bp+ while Phroglets’ gene only has about 860bp length. /note=Starterator: Per starterator analysis run on 1/13/25, there are 6 non-draft members in pham 6707, none of them a common cluster due to Phroglets being a singleton. Starterator is uninformative as Phroglets does not have the Most annotated start site, no other phages to compare the chosen start site with, and no manual annotations of the start site. /note=Location call: Gene seems to be a real gene as evidenced by the start site of 35115 being called by Glimmer, with good length, common start codon, decent final score, and coding potential on self-trained GeneMark. /note=Function call: AAA-ATPase. PhagesDB and NCBI BLAST returns multiple results with high scores and strong e-values (e-46) suggesting the function of AAA-ATPase. CDD also had a specific hit showing an AAA domain (8.96e-05 e-value) with high percent coverage. HHPred also returned hits suggesting AAAdomain and ATPase-domain with high e-values, probability, and score. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Mathew, Mallika /note=Secondary Annotator QC: I agree with the location and function calls made. The evidence from NCDBI BLAST, PhagesDB and CDD all provide strong hits for AAA-ATPase. The location calls is backed by evidence from Glimmer and good coding potential. CDS complement (35144 - 35413) /gene="35" /product="gp35" /function="hypothetical protein" /locus tag="Phroglets_35" /note=Original Glimmer call @bp 35413 has strength 9.79; Genemark calls start at 35440 /note=SSC: 35413-35144 CP: yes SCS: both-gl ST: NA BLAST-Start: GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.047, -3.1309088467834587, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Venkatraman, Rakshaa /note=Auto-annotation: Glimmer calls start site 35413 with start codon ATG while GeneMark calls start 35440 with start ATG. The Glimmer start site of 35413 seems more likely based on coding potential and gap/overlap evidence. /note=Coding Potential: Start site 35413 covers the entirety of coding potential. There is good coding potential on the self-trained GeneMark. Little to no coding potential is seen on the host-trained GeneMark. There are no switches in gene orientation. /note=SD (Final) Score: Start 35413 has the best/least negative final score and high z-score (-3.131, 3.047). It has an acceptable length of 270bp. /note=Gap/overlap: There is a gap of only 1bp which is the smallest gap possible of all the options. /note=Phamerator: Pham 199351 suggested by report on 1/13/25. The only phage in this pham is this one, and there are no others to compare it to. /note=Starterator: No starterator report was generated for this gene, as checked on 1/13/25. /note=Location call: Gene seems to be a real gene as evidenced by the start site of 35413 being called by Glimmer, with good length, minimal gap, common start codon, good final and z-score, and coding potential on self-trained GeneMark. /note=Function call: PhagesDB Blast does not have any functions listed with strong e-values, all results turn up as function unknown with e-values of 1.4 and higher. CDD, NCBI Blast, and HHPred returned no significant hits. Most likely a protein of unknown function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Mathew, Mallika /note=Secondary Annotator QC: I agree with the function and location calls made as the start site called by Glimmer seems more likely based on coding potential and the lack of significant hits on CDD, NCBI BLAST and HHPred suggest there is no known function. CDS complement (35415 - 36257) /gene="36" /product="gp36" /function="RecB-like exonuclease/helicase" /locus tag="Phroglets_36" /note=Original Glimmer call @bp 36257 has strength 6.62; Genemark calls start at 36257 /note=SSC: 36257-35415 CP: yes SCS: both ST: SS BLAST-Start: [exonuclease [Arthrobacter phage Adat] ],,NCBI, q6:s17 96.4286% 1.42356E-64 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.001, -6.842772001587514, no F: RecB-like exonuclease/helicase SIF-BLAST: ,,[exonuclease [Arthrobacter phage Adat] ],,YP_009613266,59.7122,1.42356E-64 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Allataifih, Bashar /note=Auto-annotation: Glimmer and GeneMark both call this gene in agreement on the start codon 36257. This start codon is a “ATG” /note=Coding Potential: The host-trained GeneMark and self-trained GeneMark gives the 1st ORF of the reverse strand very high coding potential. The host-trained Genemark shows all coding potential within the start site at 36257. /note=SD (Final) Score: 1.001 is the final score at the auto-annotated start of 36257 and is the best score and the LORF. /note=Gap/overlap: Start codon at 36257 has the most reasonable and minimal gap at only -11 base pairs while the rest show a gap of more than 100+ base pairs making them less reasonable for the coding potential they miss and extra base pair gaps. /note=Phamerator: This gene is very conserved in the pham. none of the pham phages share the same auto-annotated start site length. This one has the most starts sitting in at 5 while most of the rest have 2-3 /note=Starterator: As of 1/10/25 this gene is in the pham 198672. The starterator is uninformative because all the members are split between the start sites. The function is agreed to be a DNA-binding protein. /note=Location call: The gene is a real gene that starts very likely at the original start site pointed by the auto-annotated Glimmer start codon. Shortest gap, Complete coverage of coding potential, and good final score. /note=Functional call: RecB-like exonuclease/helicase /note=Transmembrane domains: DeepTMHMM predicts no TMDs; only one outside domain. /note=Secondary Annotator Name: Sridharan, Ananya M /note=Secondary Annotator QC: Looked over all the notes. Agree with the function call alongside the rest of the information provided. CDS complement (36247 - 36660) /gene="37" /product="gp37" /function="hypothetical protein" /locus tag="Phroglets_37" /note=Original Glimmer call @bp 36660 has strength 9.85; Genemark calls start at 36660 /note=SSC: 36660-36247 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein PP636_gp18 [Arthrobacter phage Hestia] ],,NCBI, q26:s67 79.562% 7.27639E-32 GAP: 91 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.569, -3.4645021069762962, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PP636_gp18 [Arthrobacter phage Hestia] ],,YP_010655988,43.2432,7.27639E-32 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Allataifih, Bashar /note=Auto-annotation: Glimmer and GeneMark both call this gene in agreement on the start codon 36660. This start codon is a “ATG” /note=Coding Potential: The host-trained GeneMark and self-trained GeneMark gives the 1st ORF of the reverse strand very high coding potential. The host-trained Genemark shows all coding potential within the start site at 36660. /note=SD (Final) Score: -3.465 is the final score at the auto-annotated start of 36660 and is the best score and the LORF. /note=Gap/overlap: Start codon at 36660 has the most reasonable and minimal gap at only 91 base pairs while the rest show a gap of more than 300+ base pairs making them not reasonable for the coding potential they miss and extra base pair gaps. /note=Phamerator: As of 1/10/25 the Pham is 198665. There are 7 other members in the Pham like Floof and HotPotato. The function is still unclear until this point. /note=Starterator: As of 1/10/25 this gene is in the pham 198665. The starterator is uninformative because all the members are split between the start sites. The start number that was agreed on the most is 12, with only 2 out the 6 agreeing. Too unclear to call function yet. /note=Location call: The gene is a real gene that starts very likely at the original start site pointed by the auto-annotated Glimmer start codon. Shortest gap, not full coverage of coding potential, but good enough for us to call. Lastly, a good final score. /note=Functional call: Unknown function /note=Transmembrane domains: DeepTMHMM predicts no TMDs; only one outside domain. /note=Secondary Annotator Name: Sridharan, Ananya M /note=Secondary Annotator QC: Looked over all the notes. Agree with the function call alongside the rest of the information provided. CDS complement (36752 - 36943) /gene="38" /product="gp38" /function="hypothetical protein" /locus tag="Phroglets_38" /note=Original Glimmer call @bp 36943 has strength 2.0; Genemark calls start at 36943 /note=SSC: 36943-36752 CP: yes SCS: both ST: SS BLAST-Start: GAP: -11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.044, -5.8544093294192034, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Allataifih, Bashar /note=Auto-annotation: Glimmer and GeneMark both call this gene in agreement on the start codon 36943. This start codon is a “TTG” /note=Coding Potential: The host-trained GeneMark and self-trained GeneMark gives the 2nd ORF of the reverse strand very high coding potential. The host-trained Genemark shows all coding potential within the start site at 36943. /note=SD (Final) Score: -5.854 is the final score at the auto-annotated start of 36943. With that being said, it is not the best score and the LORF. It comes in second, but it has the highest coding potential in my opinion. /note=Gap/overlap: Start codon at 36943 has the most reasonable and minimal gap at only -11 base pairs while the rest show a gap of either a lot less at -23, or a big gap sitting at 93 bp. /note=Phamerator: The Pham number is 199533 and as of 1/11/25 there is only 1 member in this Pham, which is not a draft ( no other phages in the Pham). /note=Starterator: As of 1/11/25, there is no available report (possibly due to the phage’s singleton status and is an orpham). /note=Location call: Based on the Glimmer start site encompassing all coding potential and having the best SD, Z-Score, and gap size in PECAAN (nothing informative from phamerator/ starterator), this is a real gene that most likely starts at 36943) with starting codon TTG. /note=Functional call: NKF, both Phagesdb BLAST and CDD have no hits. HHPRED has multiple hits with some significant e-values, yet they`re not informative, and NCBI BLAST has no hits. /note=Transmembrane domains: DeepTMHMM predicts no TMDs; therefore it is not a membrane protein /note=Secondary Annotator Name: Sridharan, Ananya M /note=Secondary Annotator QC: Looked over all the notes. Agree with the function call alongside the rest of the information provided. CDS complement (36933 - 37517) /gene="39" /product="gp39" /function="hypothetical protein" /locus tag="Phroglets_39" /note=Original Glimmer call @bp 37517 has strength 10.55; Genemark calls start at 37517 /note=SSC: 37517-36933 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Betaproteobacteria bacterium]],,NCBI, q7:s1 87.6289% 6.94312E-43 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.085, -5.56627535339612, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Betaproteobacteria bacterium]],,MBK6790893,63.0058,6.94312E-43 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Allataifih, Bashar /note=Auto-annotation: Glimmer and GeneMark both call this gene in agreement on the start codon 37517. This start codon is a “ATG” /note=Coding Potential: The host-trained GeneMark and self-trained GeneMark gives the 1st ORF of the reverse strand very decent coding potential. The host-trained Genemark shows all coding potential within the start site at 37517. /note=SD (Final) Score: -5.566 is the final score at the auto-annotated start of 37517. With that being said, it is the best score and the LORF. It has the highest coding potential out of the bunch available in my opinion. /note=Gap/overlap: Start codon at 37517 has the most reasonable and minimal gap at only -4 base pairs while the rest show a huge gap of at least 47 or a lot more. /note=Phamerator: The Pham number is 199454 and as of 1/11/25 there is only 1 member in this Pham, which is not a draft ( no other phages in the Pham). /note=Starterator: As of 1/11/25, there is no available report (possibly due to the phage’s singleton status and is an orpham). /note=Location call: Based on the Glimmer start site encompassing all coding potential and having the best SD, Z-Score, and gap size in PECAAN (nothing informative from phamerator/ starterator), this is a real gene that most likely starts at 37517) with starting codon ATG. /note=Functional call: NKF, Phagesdb BLAST has some uninformative hits whereas CDD has no hits. HHPRED has multiple hits with some high e-values that are not informative nor helpful for this case, and NCBI BLAST has some hits that are also not helpful. /note=Transmembrane domains: DeepTMHMM predicts no TMDs with one outside domain; therefore it is not a membrane protein /note=Secondary Annotator Name: Sridharan, Ananya M /note=Secondary Annotator QC: Looked over all the notes. Agree with the function call alongside the rest of the information provided. CDS complement (37514 - 37858) /gene="40" /product="gp40" /function="DNA polymerase I" /locus tag="Phroglets_40" /note=Original Glimmer call @bp 37858 has strength 8.46; Genemark calls start at 37858 /note=SSC: 37858-37514 CP: yes SCS: both ST: SS BLAST-Start: GAP: 36 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.389, -4.188334187302651, no F: DNA polymerase I SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Allataifih, Bashar /note=Auto-annotation: Glimmer and GeneMark both call this gene in agreement on the start codon 37858. This start codon is a “ATG” /note=Coding Potential: The host-trained GeneMark and self-trained GeneMark gives the 1st ORF of the reverse strand very decent coding potential. The host-trained Genemark shows all coding potential within the start site at 37858. /note=SD (Final) Score: -4.188 is the final score at the auto-annotated start of 37858. With that being said, it is one of the best scores and the LORF. It has a high coding potential out of the bunch available in my opinion. /note=Gap/overlap: Start codon at 37858 has the most reasonable and minimal gap at 36 base pairs while the rest show a huge gap of at least 42 or more. /note=Phamerator: The Pham number is 199767 and as of 1/11/25 there is only 1 member in this Pham, which is not a draft ( no other phages in the Pham). /note=Starterator: As of 1/11/25, there is no available report (possibly due to the phage’s singleton status and is an orpham). /note=Location call: Based on the Glimmer start site encompassing all coding potential and having the best SD, Z-Score, and gap size in PECAAN (nothing informative from phamerator/ starterator), this is a real gene that most likely starts at 37858) with starting codon ATG. /note=Functional call: Based on the Phagesdb BLAST, it should be a DNA polymerase 1. Phagesdb BLAST has one hit, which turns out to be somewhat informative. whereas CDD has no hits. HHPRED has multiple hits with some high e-values that are not informative nor helpful for this case, and NCBI BLAST has no hits. /note=Transmembrane domains: DeepTMHMM predicts no TMDs with one outside domain; therefore it is not a membrane protein /note=Secondary Annotator Name: Sridharan, Ananya M /note=Secondary Annotator QC: Looked over all the notes. Agree with the function call alongside the rest of the information provided. CDS complement (37895 - 38392) /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="Phroglets_41" /note=Original Glimmer call @bp 38392 has strength 5.69; Genemark calls start at 38392 /note=SSC: 38392-37895 CP: yes SCS: both ST: NA BLAST-Start: GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.698, -4.106817306804867, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: d.61.1.2 (A:) 2`-5` RNA ligase LigT {Thermus thermophilus [TaxId: 274]},,,d1iuha_,72.7273,81.9 SIF-Syn: /note=Primary Annotator Name: Qian, Audrey /note=Auto-annotation: Gene (stop @37895 R). Both Glimmer and GeneMark call the start at 38392, and the start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the reverse strand (fourth frame) only, indicating that this is a reverse gene. The ORF is the longest (498 bp), and it does have reasonable coding potential, which is only found on GeneMark Self. GeneMark Host does not show any coding potential in any of the frames corresponding to the reverse strand. However, since coding potential is shown at least on one GeneMark, this gene could still be real, though not well-supported. The chosen start site includes all of the coding potential. /note=SD (Final) Score: -4.107. It is the best final score on PECAAN. The z score is also the highest at 2.698. /note=Gap/overlap: -8 bp gap, or 8 bp overlap. This overlap is within a reasonable range; there are no other phages to compare to determine whether this gap is conserved. However, this gap is the best one, since the next smallest gap is 91 bp. /note=Phamerator: The pham number as of 1/8/2025 is 199180. There is no other information regarding the gene’s conservation in other phages or clusters. The function call for the gene is unknown. /note=Starterator: As of 1/8/2025, Starterator report for this Pham is still processing. This error is also due to the fact that this phage is a singleton or an orphan orpham. /note=Location call: Since Starterator is unavailable, the rest of the above evidence indicates that this may be a real gene and the most likely start site is 38392, especially if relied on the fact that Glimmer Start and GeneMark Start agree. This gene with start site 38392 has the best Z-score and final score, and it has the least number of bp gaps. It is also the longest gene (498 bp). There are no other phages or homologues in which Phroglets may be compared to to determine whether this start site is generally conserved. Starterator does not provide any information. /note=Function call: NKF. All PhagesDB BLAST hits have no known function (e-value < 2e-94), and NCBI BLAST called the hit with unknown function. There were no results, since there was no significant similarity found with other phages or clusters. There were no hits on CDD, but HHpred called numerous hits, with the highest probability being 87.15% (identities 18%). However, the e-value is high at 1.8, and this hit (SCOP_d1iuha_) calls the function of RNA ligase LigT, class: alpha and beta proteins. However, since all three other platforms do not call any hits with known function, it is unlikely that this ORF has a functional call. There were no similar phages to determine conservation of function /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Carrillo, Daniel /note=Secondary Annotator QC: I agree with your functional and start location call. BLASTp and CDD did not produce any hits with functions. Your analysis is sufficient and covered all necessary points. A suggestion: /note=-Include coverage in your analysis of HHpred /note= /note=Note (AQ): I agree with the secondary annotator`s comments. Changes have been made (unsure of coverage, but included identity percentage). CDS complement (38385 - 38735) /gene="42" /product="gp42" /function="hypothetical protein" /locus tag="Phroglets_42" /note=Original Glimmer call @bp 38735 has strength 7.53; Genemark calls start at 38660 /note=SSC: 38735-38385 CP: yes SCS: both-gl ST: NA BLAST-Start: [MAG: hypothetical protein VM34scaffold347_44 [Phage 66_12]],,NCBI, q15:s8 68.1034% 8.20343E-15 GAP: -26 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.81, -3.494990588194529, no F: hypothetical protein SIF-BLAST: ,,[MAG: hypothetical protein VM34scaffold347_44 [Phage 66_12]],,QOR55633,49.4624,8.20343E-15 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Qian, Audrey /note=Auto-annotation: Gene (stop@38385 R). Glimmer and GeneMark call different start sites at 38735 and 38660, respectively. The start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the reverse strand (fourth frame) only, indicating that this is a reverse gene. The ORF is the longest (351 bp), and it does have reasonable coding potential, which is only found on GeneMark Self. GeneMark Host does not show any coding potential between the stop and suggested start sites. However, since coding potential is shown at least on one GeneMark, this gene could still be real, though not well-supported. The chosen start site includes all of the coding potential, though it seems that the coding potential in this ORF clearly stops before start site 38735. /note=SD (Final) Score:-3.495. It is the best final score on PECAAN. The z score is the second highest at 2.81, with the highest score being 2.883 (final score -4.422). /note=Gap/overlap: -26 bp gap, or 26 bp overlap. This overlap is somewhat large, but it is the best one since the next lowest gap is 49 bp. There are no other phages to compare to determine whether this gap is conserved. /note=Phamerator: The pham number as of 1/8/2025 is 199463. There is no other information regarding the gene’s conservation in other phages or clusters. The function call for the gene is unknown. /note=Starterator: As of 1/8/2025, Starterator report for this Pham is still processing. This error is also due to the fact that this phage is a singleton or an orphan orpham. /note=Location call: Since Starterator is unavailable, the rest of the above evidence indicates that this may be a real gene and the most likely start site is 38375. This ORF has the best Z-score and final score, and it has the least number of bp gaps. It is also the longest gene (351 bp), which is longer than 120 bp, and there is coding potential. There are no other phages or homologues in which Phroglets may be compared to to determine whether this start site is generally conserved. Starterator does not provide any information. /note=Function call: NKF. The top PhagesDB BLAST hits have no known function (e-value < 1e-07). The next known function is a minor tail protein, but the score is relatively low (30) and the e-value relatively high (2.3). NCBI BLAST called all hits with unknown function (68% coverage, < 50.82% identities, and e-value > 8e-15). The top two hits were QOR55633 and WP_358160304. There were no hits on CDD. Although HHpred called many hits, none were good hits (low probability of < 75.8% and high e-value of > 5.2, identities <26%). Thus, it is unlikely that this ORF has a functional call. There were no similar phages to determine conservation of function /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Carrillo, Daniel /note=Secondary Annotator QC: I agree with your function and start location call. Your analysis was suffienct and appropriate. I would make one additon: /note= - your HHpred analysis should include coverage. /note= /note=Note (AQ): I agree with the secondary annotator`s comments. Changes have been made (unsure of coverage, but included identity percentage). CDS complement (38710 - 39189) /gene="43" /product="gp43" /function="HNH endonuclease" /locus tag="Phroglets_43" /note=Original Glimmer call @bp 39189 has strength 7.42; Genemark calls start at 39189 /note=SSC: 39189-38710 CP: yes SCS: both ST: NI BLAST-Start: [HNH endonuclease [Arthrobacter phage Brad]],,NCBI, q24:s6 74.8428% 5.68315E-33 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.392, -3.8308929311341546, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage Brad]],,AXH43741,62.8788,5.68315E-33 SIF-HHPRED: d.4.1.3 (M:1-105) Intron-encoded homing endonuclease I-HmuI {Bacteriophage SP01 [TaxId: 10685]},,,d1u3em1,61.0063,99.6 SIF-Syn: HNH endonuclease, upstream gene is no known function, downstream is also NKF. Although HNH endonuclease is conserved in phages Adat, Brad, Casserole, Jasmine, and Nellie, there is no synteny. /note=Primary Annotator Name: Qian, Audrey /note=Auto-annotation: Gene (stop@38710 R). Both Glimmer and GeneMark call the start at 39189, and the start codon is GTG. /note=Coding Potential: Coding potential in this ORF is on the reverse strand (sixth frame) only, indicating that this is a reverse gene. The ORF is the longest (480 bp), and it does have reasonable coding potential, which is only found on GeneMark Self. GeneMark Host does not show any coding potential in any of the frames corresponding to the reverse strand or forward strand. However, since coding potential is shown at least on one GeneMark, this gene could still be real, though not well-supported. The chosen start site includes all of the coding potential. /note=SD (Final) Score: -3.831. It is not necessarily the best final score on PECAAN, but it is third best to -3.319. The z score is also third highest at 2.392, but it still reaches to around 2, which is a good z score. /note=Gap/overlap: -4 bp gap, or 4 bp overlap. This overlap is within a reasonable range and could indicate that this ORF is an operon. There are no other phages to compare to determine whether this gap is conserved. However, this gap is the best one because the next smallest gap is 89 bp, which is too large. /note=Phamerator: The pham number as of 1/9/2025 is 194245. The gene is conserved in 31 other phages, split among clusters AV, AM, FI, and AW. The function call for the gene is unknown, but the genes in other phages in the same cluster code for HNH endonuclease. /note=Starterator: There are 26 non-draft genes in the pham. 19/26 non-draft members call start site 16. However, the called start site for Phroglets is 11 at 39189 bp, which has an unknown number of MA’s. Despite this, the best annotated start site for this phage aligns with the auto-annotated start site, and Phroglet does not have an annotation for start site 16. Phroglets is the only phage whose gene contains start site 11, which is reasonable because Phroglets is a singleton. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 39189 bp. Starterator states that the start number called the most often in the published annotations is 16 (19/26 non-draft genes in the pham), but Phroglets does not have this “Most Annotated” start. Instead, the auto-annotated start for ModicumRichard is 11 at 39189 bp. This start site, rather than start site 16, agrees with both Glimmer and GeneMark. Furthermore, the start site in Starterator across all homologues is mostly conserved, just in different locations. Starterator is informative but does not provide any new information. /note=Functional call: HNH endonuclease. Most PhagesDB BLAST hits call function HNH endonuclease (e-value < 1 e-25, score < 114), while other calls are unknown functions. Majority of NCBI BLAST hits call the function of HNH endonuclease (<85% coverage, 52.509% identities, and e-value > 6e-33). There were no hits on CDD, but HHpred called many hits with high probability (red sequences, top two >98.28% and e-value < 3.3.e-7, identities < 22%). The two hits, 1U3E_M and SCOP_d1u3em1 call functions of HNH homing endonuclease and intron-encoded homing endonuclease I-HmuI, respectively. Based on the above evidence, it is likely that this ORF has a functional call of HNH endonuclease. Since Phroglets is a singleton, there are no other phages in the same cluster that can be compared for conservation of function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Carrillo, Daniel /note=Secondary Annotator QC: I agree with your functional call, start call, and your analysis on synteny. Although there is conservation in genes, the order is not conserved. Your analysis was sufficient and adequate in all sections. I would recommend adding coverage in your analysis of HHpred. /note= /note=Note (AQ): I agree with the secondary annotator`s comments. Changes have been made (unsure of coverage, but included identity percentage). CDS complement (39186 - 39401) /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="Phroglets_44" /note=Original Glimmer call @bp 39401 has strength 1.97; Genemark calls start at 39401 /note=SSC: 39401-39186 CP: yes SCS: both ST: SS BLAST-Start: GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.142, -4.408951082954879, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: CsoZ; metallochaperone, copper, copper homeostasis, staphylococcus aureus, chaperone; 1.3A {Staphylococcus aureus (strain NCTC 8325)},,,6FF1_A,29.5775,57.6 SIF-Syn: /note=Primary Annotator Name: Qian, Audrey /note=Auto-annotation: Gene (stop@39186 R). Both Glimmer and GeneMark call the start at 39401, and the start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the reverse strand (fifth frame) only, indicating that this is a reverse gene. The ORF is the second longest (216 bp) but is still over 120 bp in length; it does have reasonable coding potential, which is only found on GeneMark Self. GeneMark Host does not show any coding potential in any of the frames corresponding to the reverse strand or forward strand. However, since coding potential is shown at least on one GeneMark, this gene could still be real, though not well-supported. The chosen start site includes all of the coding potential. /note=SD (Final) Score: -4.409. It is not necessarily the best final score on PECAAN, but it is second best to -3.550. The z score is also second highest at 2.142, but it still reaches to around 2, which is a good z score. /note=Gap/overlap: -1 bp gap, or 1 bp overlap. This overlap is the closest to almost no overlap or gap. There are no other phages to compare to determine whether this gap is conserved. The next smallest gap is -4 bp at start site 39404, which is also within reasonable range and could indicate that this ORF is an operon. /note=Phamerator: The pham number as of 1/9/2025 is 86214. The gene is conserved in 40 other phages, split among clusters AV, AU2, AU4, FK, FI, and AW. The function call for half of these genes is unknown, while the other half is DNA binding protein. /note=Starterator: There are 29 non-draft genes in the pham. 29/29 non-draft members call start site 10, which correlates with start site 39401 for Phroglets (Start: 10 @39401 has 19 MA’s). This start site agrees with the called start site from both Glimmer and GeneMark. This start site displays synteny among all homologues on Starterator. Starterator is informative but does not provide any new information. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 39401 bp. Starterator agrees with both Glimmer and Genemark. /note=Functional call: NKF. All PhagesDB BLAST hits call function unknown, and the results did not have good scores or e-values (e-value < 0.008, score < 38). NCBI BLAST results had no hits, as there was no significant similarity found. There were no hits on CDD, but HHpred called many hits with low probability (green and blue sequences, top two < 78.38% and e-value > 7.2, identities 12%). The top result has uncharacterized protein conserved in bacteria while the second result has metallochaperone, aligned with Staphylococcus aureus. Based on the above evidence, it is likely that this ORF has no known function. Since Phroglets is a singleton, there are no other phages in the same cluster that can be compared for conservation of function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Carrillo, Daniel /note=Secondary Annotator QC: I agree with your function and start site location call. Your analysis is appropriate and thorough. One suggestion would be to include coverage in your disclosure of HHpred results. /note= /note=Note (AQ): I agree with the secondary annotator`s comments. Changes have been made (unsure of coverage, but included identity percentage). CDS complement (39401 - 39751) /gene="45" /product="gp45" /function="hypothetical protein" /locus tag="Phroglets_45" /note=Original Glimmer call @bp 39751 has strength 8.12; Genemark calls start at 39751 /note=SSC: 39751-39401 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein PQB82_gp62 [Arthrobacter phage Dynamite] ],,NCBI, q2:s8 49.1379% 1.29364E-10 GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.301, -2.033982896655645, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQB82_gp62 [Arthrobacter phage Dynamite] ],,YP_010666841,54.4118,1.29364E-10 SIF-HHPRED: PHD finger protein 7,Ubiquitin-conjugating enzyme E2 D2; Ubiquitin, RING ligase, PHD, linker, LIGASE; HET: ZN; 3.58A {Mus musculus},,,8JWU_C,37.931,52.1 SIF-Syn: /note=Primary Annotator Name: Qian, Audrey /note=Auto-annotation: Gene (stop@39401 R). Both Glimmer and GeneMark call the start at 39751, and the start codon is ATG. /note=Coding Potential: Coding potential in this ORF is on the reverse strand (fourth frame) only, indicating that this is a reverse gene. The ORF is the second longest (351 bp) but is still over 120 bp in length; it does have reasonable coding potential, which is only found on GeneMark Self. GeneMark Host does not show any coding potential in any of the frames corresponding to the reverse strand or forward strand. However, since coding potential is shown at least on one GeneMark, this gene could still be real, though not well-supported. The chosen start site includes all of the coding potential. /note=SD (Final) Score: -2.034. This is best final score on PECAAN. The z score is also the highest at 3.301. /note=Gap/overlap: -8 bp gap, or 8 bp overlap. This overlap is within a reasonable range; there are no other phages to compare to determine whether this gap is conserved. However, this gap is the best one, since the next smallest gap is -56 bp. /note=Phamerator: The pham number as of 1/9/2025 is 199552. There is no other information regarding the gene’s conservation in other phages or clusters. The function call for the gene is unknown. /note=Starterator: As of 1/9/2025, Starterator report for this Pham is still processing. This error is also due to the fact that this phage is a singleton or an orphan orpham. /note=Location call: Since Starterator is unavailable, the rest of the above evidence indicates that this may be a real gene and the most likely start site is 39751, especially if relied on the fact that Glimmer Start and GeneMark Start agree. This gene with start site 39751 has the best Z-score and second best final score, and it has the least number of bp gaps. There are no other phages or homologues in which Phroglets may be compared to to determine whether this start site is generally conserved. Starterator does not provide any information. /note=Function call: NKF. All relatively reasonable PhagesDB BLAST hits have no known function (e-value < 3e-06), and NCBI BLAST called all hits with unknown function (49% coverage, e-value 1e-10, and 45.61% identity). There were no hits on CDD, but HHpred called a few hits, all with low probability of < 52.69% and high e-value of >17. Since all three other platforms do not call any hits with known function, it is unlikely that this ORF has a functional call. There were no similar phages to determine conservation of function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs. Therefore, it is not a membrane protein. /note=Secondary Annotator Name: Carrillo, Daniel /note=Secondary Annotator QC: I agree with your start site location and start site call. Your analysis was complete and sufficient. I would suggest including coverage in your HHpred results. /note= /note=Note (AQ): I agree with the secondary annotator`s comments. Changes have been made (unsure of coverage, but included identity percentage). CDS complement (39744 - 39977) /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="Phroglets_46" /note=Original Glimmer call @bp 39923 has strength 2.96; Genemark calls start at 39977 /note=SSC: 39977-39744 CP: yes SCS: both-gm ST: NI BLAST-Start: [hypothetical protein HOU70_gp47 [Arthrobacter phage Liebe] ],,NCBI, q12:s9 84.4156% 8.59076E-6 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.569, -4.116640962647576, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU70_gp47 [Arthrobacter phage Liebe] ],,YP_009817079,48.6842,8.59076E-6 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liu, Carissa /note=Auto-annotation: Gene (stop@39744 R) Both Glimmer and GeneMark call the gene, but they do not agree on the same start site. Glimmer called it at 39923 bp with a start codon of ATG. GeneMark called it at 39977 bp with a start codon of ATG. /note=Coding Potential: [start site 39977] Host-Trained GeneMark shows no coding potential. Comparatively, in Self-Trained GeneMark, the gene has reasonable (very good coverage) coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. The gene does not display synteny with other non-draft phage genomes (even in cluster AV). There are 2 PhagesDB BLAST hits that are non-draft phage genomes with an e-value less than 10^-6. There is coding potential predicted by Glimmer and GeneMark Host. The gene is at least 120 bp long (234 bp). There are no switches in gene orientation. Only one frame in one strand is used for a protein-coding gene. /note=SD (Final) Score: [start site 39977] The SD score -4.117 is the third best. The best (least negative) is -3.187. The Z-value 2.569 is the third best. The best (closest to/higher than 2) is 2.873. The SD score is still reasonable to suggest the presence of a credible RBS. /note=Gap/overlap: [start site 39977] 8 bp overlap with the upstream gene is reasonable. The length of the gene is acceptable (234 bp). It is the longest reasonable ORF for this gene call. There are no large non-coding gaps before the gene. /note=Phamerator: Pham 200609 as of 1/12/25 has 21 members that belong to various clusters (including CR, V) than Phroglets (singleton). No function called. /note=Starterator: There is not a reasonable start site choice that is conserved among the members of the pham to which this gene belongs. Start site #14 @39923 called in Phroglets. 7/20 non-draft genes call site #7 (which is not found in Phroglets). The autoannotated start site call agrees with the site predicted by Glimmer. Starterator was not informative since Phroglets is a singleton. /note=Location call: [start site 39977] The gathered evidence suggests that this is a real gene since it has good coding potential. The most likely start site is 39977 because the 8 bp overlap is the most favorable and covers all coding potential. The start site 39923 with the best SD and Z-value scores was not chosen because the ORF does not cover all of the coding potential. The start site 39815 with the second best scores was not chosen because the length of the gene 72 bp is not at least around 120 bp. Starterator was not informative since Phroglets does not have a MA start site. /note=Function call: NKF. No program returned any informative results. Two hits in PhagesDB BLASTp have an unknown function and an e-value of 7x10^-7. No significant hits in NCBI BLASTp and HHpred. CDD: no hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gutierrez, Michelle /note=Secondary Annotator QC: This looks so good, honestly perfect PECAAN notes and something that could be an example! Good job and I am sorry for the delay. CDS complement (39970 - 40182) /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="Phroglets_47" /note=Original Glimmer call @bp 40182 has strength 6.74; Genemark calls start at 40182 /note=SSC: 40182-39970 CP: yes SCS: both ST: NA BLAST-Start: GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.064, -3.544192673744953, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liu, Carissa /note=Auto-annotation: Gene (stop@39970 R) Both Glimmer and GeneMark call the gene. They agree on the same start site, which was called at 40182 bp with a start codon of ATG. /note=Coding Potential: [start site 40182] Host-Trained GeneMark shows no coding potential. Comparatively, in Self-Trained GeneMark, the gene has reasonable (very good coverage) coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. The gene does not display synteny with other non-draft phage genomes (even in cluster AV). PhagesDB BLAST: 0 hits. There is coding potential predicted by Glimmer and GeneMark Host. The gene is at least 120 bp long (213 bp). There are no switches in gene orientation. Only one frame in one strand is used for a protein-coding gene. /note=SD (Final) Score: [start site 40182] The SD score -3.544 is the second best. The best (least negative) is -2.443. The Z-value 3.064 is the best (closest to/higher than 2). The SD score is still reasonable to suggest the presence of a credible RBS. /note=Gap/overlap: [start site 40182] 8 bp overlap with the upstream gene is reasonable. The length of the gene is acceptable (213 bp). It is the longest reasonable ORF for this gene call. There are no large non-coding gaps before the gene. /note=Phamerator: Pham 199561 as of 1/9/25 has no other members. No function called. /note=Starterator: No available report as of 1/9/2025 since the gene is an orpham. /note=Location call: [start site 40182] The gathered evidence suggests that this is a real gene since it has good coding potential. The most likely start site is 40182 because the 8 bp overlap is the most favorable and covers all coding potential. /note=Function call: NKF. No program returned any informative results. No hits in PhagesDB BLASTp, NCBI BLASTp, and CDD. HHpred: no significant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gutierrez, Michelle /note=Secondary Annotator QC: Great notes and due to Phroglets being a singleton, it is much like another phage that had no phamerator so good job proving everything regardless. CDS complement (40175 - 42178) /gene="48" /product="gp48" /function="DNA helicase" /locus tag="Phroglets_48" /note=Original Glimmer call @bp 42178 has strength 8.02; Genemark calls start at 42178 /note=SSC: 42178-40175 CP: yes SCS: both ST: NI BLAST-Start: [helicase [Arthrobacter phage Adat] ],,NCBI, q108:s208 79.4603% 1.68732E-145 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.291, -4.101388555093469, no F: DNA helicase SIF-BLAST: ,,[helicase [Arthrobacter phage Adat] ],,YP_009613271,45.6609,1.68732E-145 SIF-HHPRED: Chromatin-remodeling ATPase INO80; Chromatin Remodeler, hexasome, DNA BINDING PROTEIN, DNA BINDING PROTEIN-Hydrolase complex; HET: ADP; 3.41A {Saccharomyces cerevisiae S288C},,,8EUF_Q,74.9625,100.0 SIF-Syn: Phroglets: DNA helicase, upstream gene is Pham 199304, downstream is Pham 199561. Cluster AV phages (Adat, Brad, GurgleFerb Jasmine, Nellie): DNA helicase, upstream gene is Pham 7499, downstream is Pham 86214. /note=Primary Annotator Name: Liu, Carissa /note=Auto-annotation: Gene (stop@40175 R) Both Glimmer and GeneMark call the gene. They agree on the same start site, which was called at 42178 bp with a start codon of ATG. /note=Coding Potential: [start site 42178] Host-Trained GeneMark shows very sparse coding potential. Comparatively, in Self-Trained GeneMark, the gene has reasonable (very good coverage) coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. The gene does not display synteny with other non-draft phage genomes, but it does display the same pattern with phages Nellie, GurgleFerb, Brad, and Adat (all in Cluster AV) and all share the same Pham 184301. There are 91 PhagesDB BLAST hits that are non-draft phage genomes with an e-value less than 10^-6. There is coding potential predicted by Glimmer, GeneMark Self, and GeneMark Host. The gene is at least 120 bp long (2004 bp). There are no switches in gene orientation. Only one frame in one strand is used for a protein-coding gene. /note=SD (Final) Score: [start site 42178] The SD score -4.101 is the second best. The best (least negative) is -3.966. The Z-value 2.291 is the best (closest to/higher than 2). Since the gene may be part of an operon, the RBS score is irrelevant for the start call. /note=Gap/overlap: [start site 42178] 1 bp overlap with the upstream gene is reasonable. This gene may be part of an operon. The length of the gene is acceptable (2004 bp). It is the longest reasonable ORF for this gene call. There are no large non-coding gaps before the gene. /note=Phamerator: Pham 184301 as of 1/8/25 is conserved in Adat, Brad, Casserole, GurgleFerb, Jasmine, and Nellie, but they belong to a different cluster (all AV) than Phroglets (singleton). Phamerator called the gene function DNA helicase. All of the functions called were consistent and found in the approved function list. /note=Starterator: There is not a reasonable start site choice that is conserved among the members of the pham to which this gene belongs. Start site #10 @42178 called in Phroglets. 4/7 non-draft genes call site #1. The start site call agrees with the site predicted by both Glimmer and GeneMark. Starterator was not informative since Phroglets is a singleton. /note=Location call: [start site 42178] The gathered evidence suggests that this is a real gene since it is conserved in phamerator and has good coding potential. The most likely start site is 42178 because the 1 bp overlap is favorable and covers all coding potential. /note=Function call: DNA helicase. The top five PhagesDB BLASTp hits have the function DNA helicase and an e-value of 10^-128. The top three NCBI BLASTp hits have the function DNA helicase (>79% coverage, >32% identity, and e-value <10^-144). The top CDD hit had the function DNA helicase (>93% coverage and e-value 0). HHpred had more than three hits with the function chromatin remodeling ATPase INO80 (100% probability, >74% coverage, and e-value <10^-43). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gutierrez, Michelle /note=Secondary Annotator QC: I believe this was the only gene in your assigned genes that had a known function, the data well supports it and looks truly beautiful. CDS complement (42178 - 42651) /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="Phroglets_49" /note=Original Glimmer call @bp 42651 has strength 10.1; Genemark calls start at 42651 /note=SSC: 42651-42178 CP: yes SCS: both ST: NA BLAST-Start: GAP: -26 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.312, -2.58321678164666, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liu, Carissa /note=Auto-annotation: Gene (stop@42178 R) Both Glimmer and GeneMark call the gene. They agree on the same start site, which was called at 42651 bp with a start codon of GTG. /note=Coding Potential: [start site 42651] Host-Trained GeneMark shows no coding potential. Comparatively, in Self-Trained GeneMark, the gene has reasonable (very good coverage) coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. The gene does not display synteny with other non-draft phage genomes (even in cluster AV). There are 16 PhagesDB BLAST hits that are non-draft phage genomes but they all do not have an e-value less than 10^-6. There is coding potential predicted by Glimmer and GeneMark Host. The gene is at least 120 bp long (474 bp). There are no switches in gene orientation. Only one frame in one strand is used for a protein-coding gene. /note=SD (Final) Score: [start site 42651] The SD score -2.583 is the best (least negative). The Z-value 3.312 is the best (closest to/higher than 2). The SD score is reasonable to suggest the presence of a credible RBS. /note=Gap/overlap: [start site 42651] 26 bp overlap with the upstream gene is reasonable. The length of the gene is acceptable (474 bp). It is the longest reasonable ORF for this gene call. There are no large non-coding gaps before the gene. /note=Phamerator: Pham 199304 as of 1/10/25 has no other members. No function called. /note=Starterator: No available report as of 1/10/2025 since the gene is an orpham. /note=Location call: [start site 42651] The gathered evidence suggests that this is a real gene since it has good coding potential. The most likely start site is 42651 because the 26 bp overlap is the most favorable and covers all coding potential. /note=Function call: NKF. No program returned any informative results. No significant hits in PhagesDB BLASTp and HHpred. No hits in NCBI BLASTp and CDD. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gutierrez, Michelle /note=Secondary Annotator QC: Looks good! CDS complement (42626 - 42838) /gene="50" /product="gp50" /function="hypothetical protein" /locus tag="Phroglets_50" /note=Original Glimmer call @bp 42838 has strength 3.44; Genemark calls start at 42838 /note=SSC: 42838-42626 CP: yes SCS: both ST: NA BLAST-Start: GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.312, -1.993391246735709, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Liu, Carissa /note=Auto-annotation: Gene (stop@42626 R) Both Glimmer and GeneMark call the gene. They agree on the same start site, which was called at 42838 bp with a start codon of ATG. /note=Coding Potential: [start site 42838] Host-Trained GeneMark shows no coding potential. Comparatively, in Self-Trained GeneMark, the gene has reasonable (very good coverage) coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. The gene does not display synteny with other non-draft phage genomes (even in cluster AV). PhagesDB BLAST: 0 hits. There is coding potential predicted by Glimmer and GeneMark Host. The gene is at least 120 bp long (213 bp). There are no switches in gene orientation. Only one frame in one strand is used for a protein-coding gene. /note=SD (Final) Score: [start site 42838] The SD score -1.993 is the best (least negative). The Z-value 3.312 is the best (closest to/higher than 2). The SD score is reasonable to suggest the presence of a credible RBS. /note=Gap/overlap: [start site 42838] 8 bp overlap with the upstream gene is reasonable. The length of the gene is acceptable (213 bp). It is the longest reasonable ORF for this gene call. There are no large non-coding gaps before the gene. /note=Phamerator: Pham 199575 as of 1/10/25 has no other members. No function called. /note=Starterator: No available report as of 1/10/2025 since the gene is an orpham. /note=Location call: [start site 42838] The gathered evidence suggests that this is a real gene since it has good coding potential. The most likely start site is 42838 because the 8 bp overlap is the most favorable and covers all coding potential. /note=Function call: NKF. No program returned any informative results. No hits in PhagesDB BLASTp, NCBI BLASTp, and CDD. HHpred: no significant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Gutierrez, Michelle /note=Secondary Annotator QC: All good notes, amazing job! CDS complement (42831 - 43163) /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="Phroglets_51" /note=Original Glimmer call @bp 43163 has strength 7.2; Genemark calls start at 43163 /note=SSC: 43163-42831 CP: no SCS: both ST: NI BLAST-Start: GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.774, -3.0409675880108447, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Streva, Amanda /note=Auto-annotation: Both Glimmer Start and GeneMark Start call the start at 43163 bp with codon ATG /note=Coding Potential: High coding potential is found only in the GeneMark Self- Trained Reverse strand. The coding potential is within the range of the agreed start site (43183) to the stop site (42831) with insignificant amounts of coding potential found in the forward strand of the gene. GeneMark Host-Trained had no coding potential. /note=SD (Final) Score: -3.041, This is the highest (least negative) Final score and has the highest Z-score at 2.774 /note=Gap/overlap: -1 bp overlap, which can be indicative of an operon and is the smallest gap value in PECAAN /note=Phamerator: The Pham number is 199489 and as of 1/8/25 there is only 1 member in this Pham, which is not a draft ( no other phages in the Pham). /note=Starterator: As of 1/8/25, there is no available report (possibly due to the phage’s singleton status and is an orpham). /note=Location call: Based on Glimmer and GeneMark agreeing on the start, high coding potential, as well as having the best SD, Z-Score, and gap size (nothing informative from phamerator/ starterator), this is a real gene that most likely starts at 43163 /note=Function call: NKF, No program returned informative results. Phagesdb BLAST, CDD and NCBI BLAST have no hits. HHPRED have no significant hits with e-values ranging from 35-130 /note=Transmembrane domains: There are no predicted TMDs therefore no sufficient evidence that this is a membrane protein /note=Secondary Annotator Name: Scheithauer, Julia /note=Secondary Annotator QC: This looks good! /note=AS: Thank you! CDS complement (43163 - 43441) /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="Phroglets_52" /note=Original Glimmer call @bp 43441 has strength 4.32; Genemark calls start at 43441 /note=SSC: 43441-43163 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein [Clostridia bacterium]],,NCBI, q17:s5 65.2174% 1.12843E-5 GAP: 16 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.843, -4.964641208403463, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Clostridia bacterium]],,MBP3801556,57.8125,1.12843E-5 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Streva, Amanda /note=Auto-annotation: Both Glimmer Start and GeneMark Start call the start site of this gene at 43441 bp with start codon ATG /note=Coding Potential: High coding potential is found in GeneMark Self- Trained reverse strand within the range of the agreed start site (43441) to the stop site (43163). There`s insignificant amounts of coding potential in the forward direction in the Self- Trained and no coding potential in GeneMark Host-Trained. /note=SD (Final) Score: -4.965, which is the highest (least negative) final score and a Z-score that is <2 (1.843), but is the best value listed on PECAAN /note=Gap/overlap: 16 bp gap, which is a bit large, but still the smallest value on PECAAN with other start sites having gaps from 133-217 bp. /note=Phamerator: The Pham number is 199405 and as of 1/10/25 there is only 1 member in this Pham, which is not a draft ( no other phages in the Pham). /note=Starterator: As of 1/10/25, there is no available report (possibly due to the phage’s singleton status and is an orpham). /note=Location call: Based on Glimmer and GeneMark agreeing on the start, high coding potential in GeneMark Self-Trained, as well as having the best SD, Z-Score, and gap size in PECAAN (nothing informative from phamerator/ starterator), this is a real gene that most likely starts at 43441 with starting codon ATG. /note=Function call: NKF, Phagesdb BLAST and CDD had only one hit with an insufficient e-value with an unknown function, HHPRED had multiple hits with no sufficient e-values, NCBI BLAST had a few hits closer to a significant e-value, yet all are hypothetical proteins /note=Transmembrane domains: There are no predicted TMDs therefore no sufficient evidence that this is a membrane protein /note=Secondary Annotator Name: Scheithauer, Julia /note=Secondary Annotator QC: The lack of coding potential is weird and the z-scores/final scores are also kinda bad so maybe check if the prof cuz it might not be a real gene? everything else looks good though /note=AS: I checked with the instructor, they said it was a real gene and agreed with my location call because it has the lowest final score and highest Z-score out of all start sites listed! CDS complement (43458 - 43901) /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="Phroglets_53" /note=Original Glimmer call @bp 43901 has strength 7.16; Genemark calls start at 43811 /note=SSC: 43901-43458 CP: no SCS: both-gl ST: NI BLAST-Start: GAP: 4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.047, -3.1309088467834587, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Streva, Amanda /note=Auto-annotation: Glimmer Start and GeneMark Start did not agree on the start site. Glimmer called 43901 and GeneMark called 43811. The start site codon for the Glimmer is TTG and GeneMark is ATG. /note=Coding Potential: High coding potential is found in GeneMark Self- Trained reverse strand within the range of the GeneMark start site (43811) to the stop site (43458) but is better encompassed in the Glimmer start (43901) to the stop site. There was no coding potential in GeneMark Host-Trained. /note=SD (Final) Score: -3.131, which is the highest Final Score and the highest Z score (3.047) in PECAAN /note=Gap/overlap: 4 bp gap, which is the smallest gap listed on PECAAN and is indicative of an operon /note=Phamerator: The Pham number is 199724 and as of 1/10/25 there is only 1 member in this Pham, which is not a draft ( no other phages in the Pham). /note=Starterator: As of 1/10/25, there is no available report (possibly due to the phage’s singleton status and is an orpham). /note=Location call: Based on Glimmer better encompassing the coding potential with the best SD, Z-Score, and gap size in PECAAN (nothing informative from phamerator/ starterator), this is a real gene that most likely starts at 43901 with starting codon TTG. /note=Function call: NKF, NCBI BLAST and CDD have no hits, Phagesdb BLAST has many hits that call the function to be ribonucleotide reductase, however none have significant e-values (ranging from 0.26-0.76). HHPRED also had many hits with insignificant E-values (ranging from 28-110) /note=Transmembrane domains:There are no predicted TMDs therefore no sufficient evidence that this is a membrane protein /note=Secondary Annotator Name: Scheithauer, Julia /note=Secondary Annotator QC: this looks good! /note=AS: Thank you! CDS complement (43906 - 44085) /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="Phroglets_54" /note=Original Glimmer call @bp 44085 has strength 7.69; Genemark calls start at 44085 /note=SSC: 44085-43906 CP: no SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_BRAD_49 [Arthrobacter phage Brad]],,NCBI, q3:s2 94.9153% 8.45141E-8 GAP: 75 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.558, -4.1397687364535365, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_BRAD_49 [Arthrobacter phage Brad]],,AXH43737,65.5738,8.45141E-8 SIF-HHPRED: SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: Streva, Amanda /note=Auto-annotation: Both Glimmer and Genemark agree and call the start site to be at 44085 bp with start codon ATG /note=Coding Potential: High coding potential found in GeneMark Self trained reverse strand from agreed start site (44085) to the stop site (43906) with insignificant amounts of coding potential in the forward strand. There was no coding potential in GeneMark Host-Trained. /note=SD (Final) Score: -4.140, which is the best Final Score listed on PECAAN and has the highest Z-Score of 2.558 /note=Gap/overlap: 75 bp gap, this is a relatively large gap but is the smallest on listed on PECAAN and it is not big enough to represent a missing protein coding gene but could be a promoter that is first in the operon, also both Glimmer and GeneMark call the start and has the most common start codon (ATG) /note=Phamerator: The Pham number is 8113 and was run on 1/12/25. It is conserved /note=and found in 6 other genes which are all in cluster AV (Adat, Brad, Casserole, GurgleFerb, Jasmine, and Nellie). None are drafts. /note=Starterator: The start site 4 was the most manually annotated in 6 of the non-draft genes, yet it was not present in Phroglet’s track. Because of this the auto-annotated start site 3, 44085 was kept because both and Glimmer and GeneMark call 44085 as the start as well as it having the best Gap size and final/Z score /note=Location call: Based on the above evidence, this is a real gene that most likely starts at 44805 /note=Function call: NKF, Phagesdb BLAST has multiple hits that call gene “function unknown”, NCBI BLAST also has a few hits that call the function to be a hypothetical protein, HHPRED has multiple hits but none with significant e-values, CDD has no hits. Because of this evidence, the function call of this gene is likely NKF /note=Transmembrane domains: There are no predicted TMDs by DeepTMHMM, therefore this is not a membrane protein /note=Secondary Annotator Name: Scheithauer, Julia /note=Secondary Annotator QC: this looks good! /note=AS: Thank you! CDS complement (44161 - 44301) /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="Phroglets_55" /note=Original Glimmer call @bp 44301 has strength 5.21 /note=SSC: 44301-44161 CP: no SCS: glimmer ST: NI BLAST-Start: [hypothetical protein [Candidatus Methylomirabilis sp.]],,NCBI, q3:s7 60.8696% 0.0225862 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.966, -2.6453814847322845, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Candidatus Methylomirabilis sp.]],,HEU5395328,35.5932,0.0225862 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Streva, Amanda /note=Auto-annotation: Glimmer Start calls the start site to be at 44301 with start codon ATG. GeneMark does not call any start. /note=Coding Potential: Small amounts of coding potential in GeneMark Self-trained that’s within the Glimmer start site (44301) and the stop site (44161) There was no coding potential in GeneMark Host-Trained. /note=SD (Final) Score: -2.645, which is the best Final Score listed on PECAAN with the highest Z-Score of 2.966 /note=Gap/overlap: -1 bp overlap, which is indicative of an operon and is the smallest gap size listed on PECAAN /note=Phamerator: The Pham number is 199583 and as of 1/11/25 there is only 1 member in this Pham, which is not a draft ( no other phages in the Pham). /note=Starterator: As of 1/11/25, there is no available report (possibly due to the phage’s singleton status and is an orpham). /note=Location call: Based on the Glimmer start site encompassing all coding potential and having the best SD, Z-Score, and gap size in PECAAN (nothing informative from phamerator/ starterator), this is a real gene that most likely starts at 44161) with starting codon ATG /note=Function call: NKF, both Phagesdb BLAST and CDD have no hits. HHPRED has multiple hits yet none with significant e-values that range from 1.7-17, and NCBI BLAST has one hit that calls the function to be a hypothetical protein /note=Transmembrane domains: There are no predicted TMDs therefore this is not an membrane protein /note=Secondary Annotator Name: Scheithauer, Julia /note=Secondary Annotator QC: I would also check with the professors for this gene on if its real or not because of the lack of coding potential. /note=AS: I checked with Instructor, There is still significant amounts of coding potential and the gene should be considered real! CDS complement (44301 - 44753) /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="Phroglets_56" /note=Original Glimmer call @bp 44753 has strength 9.29; Genemark calls start at 44753 /note=SSC: 44753-44301 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Citricoccus sp. I39-566] ],,NCBI, q66:s4 56.0% 1.70311E-16 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.558, -3.838738740789555, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Citricoccus sp. I39-566] ],,WP_309817961,59.5238,1.70311E-16 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Torosian, Isabella /note=Auto-annotation: Both Glimmer and GeneMark call the gene, and they agree on the start site at 44753 bp. /note=Coding Potential: No coding potential was found in Genemark Host. Genemark Self has high coding potential in the reverse strand. /note=SD (Final) Score: The final score is -3.839, and the z-score is 2.558. This start site is the only one that has a z-score higher than 2 and has the highest final score. Additionally, this start site produces the largest gene and has the shortest gap, which is most ideal. /note=Gap/overlap: There is a 1 bp overlap between this gene’s start site and the previous gene’s stop site. This is a very small overlap, which is ideal and may suggest the gene is a part of an operon. /note=Phamerator: As of January 8, 2025, the Pham of this gene is 199654. This Pham has no other phages. /note=Starterator: As of January 8, 2025, there was no Starterator report. /note=Location call: Considering the evidence above, this gene is a real gene and has a start site at 44753 bp. Starterator does not agree with Glimmer or Genemark since there was no report. We cannot say the start site is conserved within its cluster because the phage belongs to a singleton. Glimmer and Genemark agree on the start site, and the start site covers all of the coding potential in Genemmark Self. /note=Function call: NKF. The top hits from Phages DB Blast called for no known function. All of the top hits on NCBI call for hypothetical protein. CDD had no significant hits. On HHpred, all of the hits had high e-values (lowest was around 76); therefore, there were no significant hits found on HHpred. This evidence leads me to believe that the function of this gene is unknown. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Grunski, Lilly /note=Secondary Annotator QC: All evidence was considered. The gene is most likely real and the start site is well placed. CDS complement (44753 - 44944) /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="Phroglets_57" /note=Original Glimmer call @bp 44944 has strength 4.33; Genemark calls start at 44944 /note=SSC: 44944-44753 CP: yes SCS: both ST: SS BLAST-Start: GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.371, -3.935403931688939, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Torosian, Isabella /note=Auto-annotation: Both Glimmer and GeneMark call the gene, and they agree on the start site at 44944 bp. /note=Coding Potential: No coding potential was found in Genemark Host. Genemark Self has high coding potential in the reverse strand, and the chosen start site covers all of the coding potential. /note=SD (Final) Score: The final score is -3.935, and the z-score is 2.371. This start site produces the highest final score and is the only one with a z-score higher than 2. Additionally, this start site produces the largest gene and has the shortest gap (1 bp), which is most ideal. /note=Gap/overlap: There is a 1 bp gap between this gene’s start site and the previous gene’s stop site. This is normal as the density of phage genomes is very high, so genes tend to be tightly packed. /note=Phamerator: As of January 10, 2025, the Pham of this gene is 199235. This Pham has no other phages. /note=Starterator: As of January 10, 2025, there was no Starterator report. /note=Location call: Considering the evidence above, this gene is a real gene and has a start site at 44944 bp. Starterator does not agree with Glimmer or Genemark since there was no report. We cannot say the start site is conserved within its cluster because the phage belongs to a singleton. Glimmer and Genemark agree on the start site, and the start site covers all of the coding potential in Genemmark Self. /note=Function call: NKF. There were no significant hits on PhagesDB (scores were too low and no hits with e-values less than 10^-6; the lowest e-value was 0.12). NCBI claimed to find no significant similarity. CDD had no significant hits. Although HHpred called many hits, none were good hits (high e-value of > 0.0065). Thus, it is unlikely that this ORF has a functional call. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Grunski, Lilly /note=Secondary Annotator QC: The start call and function call appear to be correct as all available evidence was considered. When the Phamerator is published, this should be consulted. /note=IT: Phamerator is still not published. CDS complement (44946 - 45224) /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="Phroglets_58" /note=Genemark calls start at 45224 /note=SSC: 45224-44946 CP: yes SCS: genemark ST: SS BLAST-Start: GAP: 1030 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.967, -5.535726432718978, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Torosian, Isabella /note=Auto-annotation: Glimmer does not call the gene, which raises suspicions on whether this is a real gene. Genemark predicts that the start site is 45224. /note=Coding Potential: No coding potential was found in Genemark Host. Genemark Self has high coding potential in the reverse strand within the ORF, and the chosen start site covers all of the coding potential. There is some overlap in coding potential on the forward strand, but it is only around 100 bp, which makes it unlikely to be a real gene on the forward strand. There is a large 89 bp overlap in the reverse strand with the upstream gene. /note=SD (Final) Score: The chosen start site has a z-score of 1.967 and a final score of -5.536. This start site produces a very large overlap (-89). There is another start site (45134) with a better z-score (2.527) and a better final score (-3.551), as well as a much smaller gap (1); however, this start site does not cover all of the coding potential. /note=Gap/overlap: The original overlap was 89 bp; however, the gene upstream was found not to be a real gene. The new gap is 1030, a substantially significant gap that may suggest that genes need to be added. /note=Phamerator: As of January 10, 2025, the Pham number of this gene is 199421. Phroglets is the only member of the Pham; therefore, it cannot be assessed whether this gene is conserved. /note=Starterator: As of January 10, 2025, there was no Starterator report. /note=Location call: This gene is a real gene, and the start site is 45224. /note=Function call: NKF. PhagesDB BLASTp had no significant hits. No significant results were found on NCBI Blast. CDD had no hits. HHpred has some functional hits; however, the e-values are high (lowest value is 0.83), and the probabilities are low. This indicates that HHpred has no significant hits. Combining all the evidence, we cannot confidently claim the function of this gene. /note=Transmembrane domains: DeepTMHMM states that this gene`s protein sequence does not yield any transmembrane proteins (TMRs=0). /note=Secondary Annotator Name: Grunski, Lilly /note=Secondary Annotator QC: The start site and function calls made seem correct as they considered all available evidence. But it seems too early to rule out start 4 @ 45134 because it has a much better z score and final score; the coding potential seems to be high in half of this region and pretty bad in the last half of the gene. The selected start 1 does include more of the coding potential, but it runs into the same problem as start 4 in that there is significantly less coding potential in the later part of the gene. /note=IT: After consulting the professor, it was decided that the chosen start site was the best option as it covers all of the coding potential CDS complement (46255 - 46977) /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="Phroglets_59" /note=Original Glimmer call @bp 46977 has strength 8.42; Genemark calls start at 46977 /note=SSC: 46977-46255 CP: yes SCS: both ST: SS BLAST-Start: GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.997, -2.644643059745525, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Torosian, Isabella /note=Auto-annotation: Both Glimmer and GeneMark call the gene, and they agree on the start site at 46977 bp. /note=Coding Potential: Genemark Host displays some coding potential in the ORF; the peaks display a lot of gaps. Genemark Self displays a large volume of coding potential in the reverse strand of the ORF. There are some peaks of coding potential in the forward strand; however, they are not significant enough to be considered real genes. /note=SD (Final) Score: The start site has a z-score of 2.997 and a final score of -2.645. This start site has the best z-score and final score out of all the start site options. /note=Gap/overlap: As of January 10, 2025, PECAAN displayed no information on gap/overlap. /note=Phamerator: As of January 10, 2025, the Pham number of this gene is 199094. Phroglets is the only member of the Pham; therefore, it cannot be assessed whether this gene is conserved. /note=Starterator: This analysis was run on 01/08/25. Start site 3 in Starterator was found in 2/2 of the draft genes in this pham (the pham contains no non-draft phages). No manual annotations. Start 3 is 46977 in Phroglets. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering the evidence above, this gene is a real gene and has a start site at 46977 bp. Glimmer and Genemark agree on the start site, and the start site covers all of the coding potential in Genemmark Self. Starterator agrees with Glimmer or Genemark. We cannot say the start site is conserved within its cluster because the phage belongs to a singleton. /note=Function call: NKF. PhagesDB had a couple of hits with function calls; however, none were significant (high e-values > 0.005). NCBI only produced one hit, which was not significant (e-value was 0.022). CDD had no significant hits. HHpred also produced some functional calls, but none of these hits were significant (the lowest e-value was 1.2). Given this evidence, I cannot confidently claim the function of this gene. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Grunski, Lilly /note=Secondary Annotator QC: The function call seems correct as all available evidence was consulted. Since no synteny maps exist nor a non-draft, manually annotated start site, the start site should be placed with caution and subject to change if new evidence arises. The Gene does seem real as there is significant coding potential in the self-trained genemark, but it is strange the host-trained gives no data.