CDS 74 - 427 /note=First start may not have the best Z-Score (1.802 ) and Final Score (-5.132), but it is the longest ORF and closes a significant gap. (Glimmer/Genemark Agreement at 74) /note=Synteny with Khuang_Draft for this gene. /note= /note=Coding Potential on Genemark is high. and begins at about 100bp /note=Monomer, Dimer, and Trimer /note=Function could be Hypothetical, Terminase, or Helix Turn Helix, but the score is not high enough to find a definitive function CDS 390 - 1988 /note=This gene has a start site at 390 as called by Glimmer and Genemark. It is not the longest ORF and it has a z-score of 2.618 and final score of -3.329. It has a 110 overlap. It includes all coding potential as evidenced by genemark. There is synteny with Khuang for this gene. /note= /note=Monomer. CDS 1985 - 3505 /note=This gene has a strong coding potential with close blast results with phage Khuang: e-score of 1.209e+04. 94%. Both glimmer and genemark call the start at 1,979. While it doesn`t have the highest z-score (1.159) and final score (-6.431) , it has the longest orf and an 11 bp overlap. /note= /note=There is synteny with Khuang /note= /note=monomer, dimer /note= /note= There is a strong probability of it being a portal protein due to the HHpred data. CDS 3505 - 4272 /note=768 bp long gene with major coding potential predicted by GeneMark S and Gene Mark w/ Glimmer; it`s stated to have "Original Glimmer call @bp 3505 has strength 15.05." Both local and library BLASTp state capsid maturation protase and prohead protease with HK family, leading to e-values of 1e-54 and 9e-168, respectively. Starterator states that bp 3505 has 126 MA`s, there is only a 1 bp overlap upstream; the RBS sequence has a Z-score of 2.753 and a final score of -3.138. Phamerator conserved domains state COG3740 with a phage head maturation similarity and Putative_prohead_protease, which processes capsid protein for T4 phage, as a similarity. HHpred data states 5JBL_D as a similarity, which has a Prohead core protein protease pentamer/Hydrolase witha probability value of 97.72, but a n e-value of 0.0077. Fold seek also states a similarity match with Klebsiella pneymoniae subsp. with Phage prohead protease of the HK97 family. /note= /note=monomer CDS 4290 - 5516 /note=includes all strong coding potential, called by glimmer and genemark , 3.144 z-score and final score of -3.258, high function call with HHPRED data. monomer PTM is >0.5 but homodimer and homotrimer are not above 0.8. CDS 5584 - 5841 /note=This gene has a start site at 5584 (as called by Glimmer and Genemark). It is the longest ORF and it has a z-score of 3.098 and final score of -2.334. It includes all coding potential as evidenced by genemark. It`s closest blast match is with Khuang (e-value of e-101 and coverage of 93%), and it also has synteny with this phage too. NCBI called a 63.41% alignment with a hypothetical protein, and BLAST found this gene similar to many other genes with unknown functions, insufficient evidence for functional call. PTM >0.5 for monomer, homodimer and homotrimer iPTM are not above 0.8 CDS 5860 - 6384 /note=This gene likely starts at this site (good Z score of 2.504, lowest final score of -3.860, LORF, smallest gap, good length). It includes all coding potential and is called by Glimmer and GeneMark. PhagesDB point to a function of "Head completion protein/Adaptor protein" with high coverage and probability making this the likely function. Confident function call with HHPRED data. PTM > 0.5 for monomer and trimer, homodimer and homotrimer iPTM are not above 0.8. CDS 6384 - 6734 /note=It is likely that this gene starts at 6384 because of LORF, Z-score of 2.349, final score of -3.837 (not great, but okay with the other aspects), and small gap length . overlap of 1. Includes all coding potential and is called by both genemark and Glimmer. Phagesdb function frequency suggests this gene to be a head-to-tail stopper/connector protein, as well as HHPRED. NCBI on the other hand states function as unkown/potential gene. PTM>0.5 for monomer and dimer. homodimer and homotrimer iPTM are not above 0.8. CDS 6734 - 7009 /note=This is a gene called by glimmer and genemark. It has a start site of 6,734 with a Z score of 2.248 and a final score of -4.190. It has the longest ORF with overlap of -1. The function is a minor capsid protein. The PTM score is 0.82. IPTM score with downstream gene is greater than .80. CDS 7002 - 7427 /note=This is a gene called by glimmer and genemark. It has a start site of 7002 with a Z score of 2.747 and a final score of -3.057. It has the longest ORF. The function is a tail terminator The monomer PTM score is 0.91. IPTM when monomer folded with upstream is >.80. CDS 7485 - 8312 /note=This is a gene called by glimmer and genemark. It has a start site of 7485 with a Z score of 2.382 and a final score of -4.419. It has the longest ORF. The function is a minor tail protein. The PTM score is 0.72. CDS 8409 - 8735 /note=This is a gene called by glimmer and genemark. It has a start site of 8409 with a Z score of 3.008 and a final score of -2.505. It has the longest ORF. The function is a tail assembly chaoerone. The PTM score is 0.77. When monomer folder with downstream IPTM is >.80. CDS join(8409..8708,8708..9115) /note=Larger version of TAC due to FrameShift /note=Glimmer and GeneMark call start at 8828 with strength of 5.64 /note=Z-score and final score are the best for this start site. /note=When monomer is folded with upstream gene IPTM is >.80. CDS 9122 - 12901 /note=The length is 3780bp, the closest match was phage Rootbeer, Kovu, Sonali with an e-value of 0.0 in the same pham, it does not have the most annotated start and Glimmer (9122 start) and Genemark (9200 start) don`t agree, tape measure protein, z-value is 2.862 and final score is -3.116. I believe that the start is 9122 as it has the strongest evidence, no gap on either side, it has high coding potential with some spikes. CDS 12901 - 13848 /note=Start site supported with coding potential, good z-score and final score, and has a -1 overlap. Function is minor tail protein as supported on PhagesDB, HHPred, and NCBI. /note=Alphafold PTM scores for Monomer/Dimer/Trimer > 0.5 CDS 13849 - 14961 /note=This gene is a forward gene with a stop site of 14961, although this gene is long it is a structural gene aligning with the function of being a tail protein. There is no gap up stream, which is reasonable because the gene promoters are not switching gene strands. The z score is 2.365 while the final score is -3.865, these scores fit within the range further confirming that this is a gene and the start site is at 13849. /note=Alphafold PTM scores for Monomer/Dimer/Trimer > 0.8 CDS 14961 - 15839 /note=Suggested start aligns with prior gene, and although it may not have the best Z Score, both Glimmer and Genemark agree on this start placement. /note=Has spotty coding potential in some areas, but for the most part it is quite high throughout the gene. /note= /note=According to PhagesDB, this is likely to be a minor tail protein. According to BLAST, annotated phages that have similar genes list it as a Minor Tail Protein as well. /note= /note=- coding potential on Genemark & DNAMaster calls it with both Glimmer and GM /note=- highest z-score @ 15057 but other starts cause gaps that are too wide to be acceptable /note=- z-score: 1.603 final score: -5.474 -1 gap /note=- all Phagesdb Function Frequency calls it a *minor tail protein* /note=- Phagesdb BLAST: insufficient evidence for functional call /note=- HHPRED: 99.2 probability /note=- NCBI BLAST: insufficient evidence for functional call /note=- PTM >0.5 monomer, dimer, & trimer /note=- homodimer iPTM >0.8 /note=- homotrimer iPTM >0.8 CDS 15836 - 16270 /note=High coding potential according to Genemark, Glimmer and Generak Disagreement on start site. /note=Not enough evidence to call a function yet, as all similar phages have unknown functions. /note= /note=- coding potential on Genemark /note=- Glimmer and GM have different calls: Genemark: 15842 Glimmer: 15836 /note=- highest z-score @ 15989 but Start 15836 is most likely because it perfectly aligns with the stop of prior gene, is the Longest ORF, and has a marginally better Final Score than the Start at 15842 /note=- z-score: 1.881 final score: -4.905 -4 gap /note=- all Phagesdb Function Frequency has different assigned functions /note=- Phagesdb BLAST: insufficient evidence for functional call /note=- HHPRED: insufficient evidence for functional call /note=- NCBI BLAST: insufficient evidence for functional call CDS 16267 - 16602 /note=Perfectly aligns with previous gene (No overlap, or gap). No known functions at this time, not enough evidence to suggest a specific purpose. /note= /note=Not enough evidence to call a function yet, as all similar phages have unknown functions. /note= /note=- coding potential on Genemark /note=- Glimmer and GM have different calls /note=- amazing Z Score for Longest ORF /note=- z-score: 3.008 final score: -2.523 -4 gap /note=- Phagesdb Function Frequency: insufficient evidence for functional call /note=- Phagesdb BLAST: insufficient evidence for functional call (only drafts available in same cluster) /note=- HHPRED: 93.7 probability 96.3964 coverage /note=- NCBI BLAST: insufficient evidence for functional call /note=- PTM >0.5 monomer, dimer, & trimer CDS 16612 - 17556 /note=Most likely to be endolysin, as not capable of being Lysin A, and all annotated genomes list it as such. /note= /note=- coding potential on Genemark /note=- Glimmer and GM have different calls: Glimmer: 16612 Genemark: 16621 /note=- Start site is likely at 16612 because it leaves no gaps with prior gene, creates the longest ORF, and has a slightly better final score than other options. /note=- z-score: 2.238 final score: -4.151 9 gap /note=- Phagesdb Function Frequency: calls for either endolysin or lysin a /note=- Phagesdb BLAST: insufficient evidence for functional call /note=- HHPRED: 97.2 probability 50.3185 coverage /note=- NCBI BLAST: 89.8089 identity 93.949 alligned /note=- PTM >0.5 monomer only CDS 17562 - 17909 /note=Glimmer and Genemark agree on this start position, and shows high coding potential in Genemark. Small gap with prior gene, but that is to be expected as the other start sites have low Z-Scores/too much overlap. Due to being adjacent to a potential Endolysin, it is highly probable this gene is a Holin, or Membrane protein. For now, I am listing it as a Holin since I have no other evidence to suggest it may be a membrane protein. Alphafold shows PTM > 0.5 for monomer/trimer. CDS 17919 - 18521 /note=Start site at 17919 is by far the best choice, due to Glimmer and Genemark agreement, having the longest ORF, and the best Z Score+Final Score. High coding potential in selected area. Unfortunately, function cannot be determined at this time, due to insufficient evidence. Alphafold shows PTM > 0.5 for monomer. CDS 18535 - 18786 /note=Z Value and Final score are great, but doesn`t have great coding potential. Additionally, does not closely match any other genes from other phages. /note=Evidence suggesting it may NOT be a gene: Low coding potential towards end of gene /note=Evidence suggesting it may be a gene: Great Z Score, absence of gene would leave sizable gap that could not be filled in genome. CDS 18787 - 20274 /note=Gene with a stop site of 20274 has a start site of 18787. Although it doesn`t have the best Z score, it minimizes the gap (0 bp gap). Both gimmer and GeneMark agreed on this site and it is the longest orf. There was strong coding potential for this region, no starterator data because it is an orpham. Alphafold shows PTM with ATP > 0.5 when monomer alone < 0.5 CDS complement (20275 - 21036) /note=HHPRED coverage insufficient to make any call. /note= /note=This gene has strong coding potential and includes all of it from the second start and on. There were no significant blast results. Gimmer (21,036) and Genemark (21,069) disagreed on the start. /note= /note=Gene with a start site of 21,036. based on the RBS z-score and final score (2.106 and -4.429) in comparison to the start called by GeneMark (21,069) it can be deduced that the second start is more likely. /note= /note=This gene is an Orpham and Alphafold data shows a PTM > 0.05 for the monomer, and dimer. CDS 21094 - 21246 /note=To start, there are no significant blast hits for this gene. As far as coding potential goes, there seems to be strong potential in this range, however there is no bottom black line indicating a gene that has been called by GeneMark. There is no synteny in other phages. No Starterator data (orpham). /note= /note=As far as start sties; GeneMark: no call, Glimmer: 21094. (It cannot be the first start site (21049) because a 50 bp gap is needed when gene promoters switch DNA strands from forward and reverse genes. /note= /note=Because of the coding potential being lower in the beginning starts, it had to turn to the RBS data. Given the strong need for a promotor, the base pair gap is needed. /note= /note=Deleting this gene then creates a gap of 270 bp which isn`t ideal. There is coding potential in between this gap. /note= /note=According to Alphafold, PTM > 0.05 for monomer. CDS 21307 - 21687 /note=start site altered. /note= /note=Gene displays strong coding potential including the start site of 21307. Based on the very similar data below, looking at z-score is next step. The most likely starts based on z-score and final score are 21307 or 21331. Start 21307 best minimizes the gap. This is the highest z-score of 1.901 and best overall score of -4.483. /note= /note=After re-blasting, with the start site of 21289, the blast results are similar to start site of 21331. (Khuang 88% e-120; Constance 2e-49, 85% ) /note=After re-blasting, with the start site of 21292, the blast results are same as 21289. /note=After re-blasting, with the start site of 21307, the blast results slightly increase. (Khuang 89% e-117; Constance 2e-49, 85% ) /note=After re-blasting, with the start site of 21322, the blast results(Khuang 88% e-120; Constance 2e-49, 85% ) /note=After re-blasting, with the start site of 21331, the blast results (Khuang 88% e-120; Constance 2e-49, 85% ). /note= /note=According to Alphafold, PTM > 0.05 for monomer CDS complement (21991 - 22209) /note=Short, and isolated gene, with large gap upstream and downstream; gene length of 219. Coding potential is good for most of the gene, but on both GeneMarkS and GeneMark with Glimmer, it has a premature end to the coding potential; the image has about 95% of the coding potential before ending. Local and library BLASTp has a top match of Phrank15_37 with a function of helix-turn-helix DNA binding protein with an e-value of 4e-26 and 1e-32, respectively. The gene has the longest ORF at –2, and the RBS sequence has a Z-score of -5.333 and Final Score of 2.039. HHpred data has no match above 90%, but the top match is 3RMR_A, named Avirulence protein; effector, RPP1-recognized, alpha-helical, W-motif, seahorse, virulence, RPP1, R-protein, PROTEIN BIN, with a probability value of 85.29, an e-value of 2.2, a score of 29.59, a SS of 3.6, an aligned cols of 42, and target length of 260. FoldSeek has a BFVD match with Arthobacter phage TripleJ with a probability value of 0.00, Seq. Id. of 50, and an e-value of 3.14e-4. /note= /note=According to Alphafold, PTM > 0.05 for monomer CDS complement (22299 - 22433) /note=Gene added. Not called by Glimer or GM. Weak CP but this has a 0bp Gap with upstream gene that does have moderate CP and fills a gap. Z-score is poor but 0 bp gap with upstream gene. Insufficient evidence for functional call. Need to run AlphaFold on this added gene. CDS complement (22716 - 22805) /note=Short but fills a gap. Not called by Glimmer or GM. There is a spike of CP in the middle. While both BLAST and HHPRED hits are weak, the fact that there are some phagesdb BLAST hits to other small ORFs supports keeping this gene. CDS complement (22798 - 23118) /note=Gene length is 321 bp, with 4 bp overlap downstream with gene 30. Coding Potential is around the 0.9 to 1.0 range. Local and Library BLASTp reveal unknown functions for the protein, matching with Kumotta_48 (e-value of 0.003), and Arthrobacter sp. StoSoilB13 (e-value of 2e-19). Glimmer and GeneMark agree with the start; (Original Glimmer call @bp 23118 has strength 7.45); the gene has the longest ORF (-2); and the RBS sequence has a Z-Score of 2.919 and Final Score of –2.725. HHpred data has only one hit (at 16.01% rather than above 90%) named 2LXE_A, with a Histone-lysine N-methyltransferase SUVR4 Ubiquitine binding domain function, with an e-value of 650. Foldseek`s Top Hit was an unclassified Caudoviricetes with a probability value of 0.00, an e-value of 5.64e-3. /note= /note=PTM > 0.05 for monomer and dimer CDS complement (23115 - 23255) /note=Short gene of length 141 bp; slight overlap of 4 bp upstream to gene 29. Coding Potential is fairly good–on Gene MarkS, it is nearly maximized, though in Gene Mark w/ Glimmer, there is a split in the coding potential. Both Local and Library BLASTp state Arthrobacter phage Maja as the closest known (non-draft) phage, with an e-value of 3e-09 and 2e-07, respectively. Original Glimmer call @bp 23255 has strength 4.26 but not called by GeneMark. Starterator report has 6 MA`s for this start; the gene also has the longest ORF. The RBS sequence has a good Z-Score of 3.246 and Final Score of –1.954. HHpred data states a relatively close match for the protein with a DNA-directed RNA polymerase II, IV, and V subunit 12, with a probability value of 86.95%, but the e-value is 0.44. /note= /note=According to Alphafold, the PTM ATP and PTM NAD are > 0.08 when the monomer is < 0.05. /note=PTM of monomer is 0.46 and 0.59 and 0.62 for ATP and NAD respectively. CDS complement (23272 - 23676) /note=405 bp long gene; Coding Potential is fairly good with some dips in the Glimmer Gene Mark pair. For both the Local and Library BLASTp, EvePickles_45 was the top hit, with unknown function, and with a respective e-value of 5e046 and 1e-50. Glimmer and Genemark call start at @bp 23676 with a strength of 7.48; Starterator has 3 MA`s at 23676. There is a 4 bp overlap downstream with gene 32. The gene has the longest ORF at –2; and the RBS sequence has a Z-Score of 2.763 and Final Score of –3.115. HHpred data has no hit above 90%; the top hit was a mitochondrial ribosome Assembly intermediate for translation, which has a probability value of 39.09 and e-value of 8.6. Foldseek has a top hit in the BFVD database–unclassified Tinytimothyvirus, with a probability value of 0.00, a Seq. Id. of 13.5, and an e-value of 5.61e-1. /note= /note=PTM > 0.05 for monomer. CDS complement (23673 - 24188) /note=This is likely a gene due to the coding potential and overlap of 4 bp with the downstream gene. Glimmer and genemark agree on the start. Data not strong enough to call a function. /note= /note=PTM > 0.05 for dimer. CDS complement (24185 - 24379) /note=Gene evidence: /note=- Glimmer and Genemark both call it as a gene /note=- Genemark shows coding potential /note= /note=Start Evidence: /note=- Z-score: 1.879 /note=- Final score: -4.977 /note=- Overlap of 4 bp with the downstream gene /note= /note=Functional call: /note=- Insufficient evidence for functional call /note= /note=Alphafold: /note=PTM >0.5 monomer CDS complement (24376 - 24819) /note=Gene evidence: /note=- Glimmer/Genemark calls /note=- Coding potential /note= /note=Start Evidence: /note=- z-score: 2.085 /note=- Final score: -4.534 /note=- 4 bp overlap /note= /note=Functional call: /note=- insufficient evidence for function call /note= /note=Alphafold: /note=PTM >0.5 CDS complement (24816 - 25235) /note=Gene evidence: /note=- Genemark/Glimmer call gene /note=- Shows coding potential /note= /note=Start evidence: /note=- Z-score: 2.991 /note=- Final score: -2.620 /note=- 4 bp overlap /note= /note=Functional call: /note=- insufficient evidence for functional call /note= /note=Alphafold: /note=- PTM >0.5 monomer/dimer CDS complement (25232 - 25423) /note=Gene evidence /note=- Glimmer/Genemark calls /note=- coding potential /note= /note=Start site: /note=-4 bp overlap and /note=- z-score (2.618) and final score (-3.408) are much better than if the start site was 25408 (as called by Genemark) /note= /note=Functional call: /note=- insufficient evidence for functional call CDS complement (25420 - 25752) /note=Glimmer and GeneMark both agree on the start site. /note=A small amount of coding potential lies outside of the called start and stop sites. /note=Z-score and final score are not ideal, but there is a -4 bp overlap. /note=There is insufficient evidence for a functional call. /note=AlphaFold: Monomer PTM > 0.5, homodimer iPTM < 0.8, homotrimer iPTM < 0.8 CDS complement (25749 - 25895) /note=Glimmer and GeneMark agree on start site. /note=There is a good z-score and final score, as well as a -4 bp gap. /note=There is insufficient evidence for a functional call. /note=AlphaFold: Monomer PTM < 0.5, homodimer iPTM < 0.8, homotrimer < 0.8 /note=PTM w/ ATP > 0.5 while monomer alone PTM <0.5 CDS complement (25892 - 26248) /note=Glimmer and GeneMark agree on start site. /note=There is a small amount of coding potential before the called start site. /note=There is a good z-score and final score as well as a -1 bp gap. /note=There is insufficient evidence for a functional call. /note=AlphaFold: Monomer PTM >0.5, homodimer iPTM < 0.8, homotrimer iPTM < 0.8 CDS complement (26248 - 26433) /note=Glimmer and GeneMark agree on the start site. /note=There is strong coding potential. /note=There is a good z-score and final score, as well as a -4 bp gap. /note=There is insufficient evidence for a functional call. /note=AlphaFold: Monomer PTM > 0.5, homodimer iPTM > 0.8, homotrimer iPTM < 0.8. CDS complement (26430 - 26630) /note=I assign hypothetical protein because the probability/coverage is not great and Khuang_Draft also has an unknown function. /note=NEEDS ADDITIONAL REVIEW - /note=This is a gene because glimmer/genemark start agreement and coding potential. However, it does not have the most annotated start and it doesn`t have the longest ORF. Although Z-score and final score are good, the potential start(longest ORF) also has a similar Z and final score. So start could be 26633 instead of 26630. /note=-Insufficient Evidence for Functional Call /note=-AlphaFold Data: Monomer and Dimer PTM > 0.5 CDS complement (26630 - 27097) /note=The evidence is not strong enough to call a function (channel protein only has 90%probability and 32 coverage) /note=NEEDS ADDITIONAL REVIEW - /note=This is a gene because glimmer/genemark start agreement and coding potential. However, similar to gene 41, this call start doesn`t have the longest ORF and the potential start with the longest ORF also has a good Z score but a slightly lower final score. /note=-Insufficient evidence for functional call /note=AlphaFold: /note=-Monomer PTM > 0.5 CDS complement (27227 - 28384) /note=This is a gene because glimmer/genemark start agreement, coding potential, longest ORF, and highest Z-score/lowest final score. Added HHPRED Evidence Accession: 6EMY_B Description: Int protein; transposase protein-DNA complex, tyrosine recombinase, Y-transposase, Tn916-like conjugative transposon, antibiotic resistance transfer, RECOMBINATION; 2.5A {Enterococcus faecalis} Query Range: (76-373) Target Range: (1-306) 99.9% probability with 77.6% coverage /note=This is a gene because glimmer/genemark start agreement, coding potential, longest ORF, and highest Z-score/lowest final score. /note=AlphaFold: /note=-Monomer and Trimer PTM > 0.5 CDS complement (28385 - 28789) /note=This is a gene. The start is 28789 because it includes more coding potential and has a z value is 2.774 and the final score is -3.860. There is not strong enough data to classify a function. I called the longer start site because it would allow for gene 45 to be removed because it is a forward gene in the middle of reverse genes. The closest blast match is with Khaung_Draft_46 with an e-value of 5e-73 with 96% coverage. /note= /note=AlphaFold: /note=-Monomer and Dimer PTM > 0.5 CDS complement (28770 - 29156) /note=This is a gene. It has a long enough bp length and some synteny. Most of the coding potential is there and has a blast match with 90% coverage. The gaps present are needed for promoters. The z-score is 1.96 and the final score is -5.709 /note=Immunity repressor identified, along with integrase. Position of this HTH DNA binding domain protein supports CRO functional call. /note=Monomer PTM > 0.5 CDS 29222 - 29503 /note=I believe this is a gene since it has sufficient base pairs (>120bp), has coding potential, /note=and has synteny with gene 48 on Khuang_Draft in Phamerator. I believe it starts at the site agreed upon by Glimmer and GeneMark, since it includes all coding potential, and has the longest ORF. I also don’t believe there is enough data to assign a function since the NCBI Blast data has low percentages, PhagesDB blast e-values are too high to see anything, and HHpred functions do not come up for this gene. /note= /note=Monomer PTM > 0.5 /note= /note=Monomer PTM < Monomer with ATP PTM /note= /note=Monomer with ATP iPTM > 0.8 CDS 29569 - 29766 /note=I believe this is a gene since it has sufficient base pairs (>120bp), has coding potential, /note=and has synteny with gene 49 on Khuang_Draft in Phamerator. However, I don’t believe it starts at the site Glimmer calls, since there is a gap of 119 bp which could be brought down, the start codon of ATG is more likely, and the z-value for the start called by GeneMark at 29569 is better. I picked an unknown function since the PhagesDB data points to it, as well as the low percentages in both HHpred and NCBI Blast. /note= /note=Monomer with NAD PTM> 0.5 CDS 29763 - 29885 /note=I believe this is a gene since it has sufficient base pairs (>120bp), includes all coding potential, /note=has synteny with gene 50 on Khuang_Draft in Phamerator. I believe it starts at the site agreed upon by Glimmer and GeneMark, since it lines up in Starterator, has a common start of GTG (26%), has a decent Z-score (1.529) and final score (-5.650), AND it has a 4 base pair overlap. I also don’t believe there is enough data to assign a function since the HHpred data has low percentages, and blast claims unknown function /note= /note=Monomer PTM > 0.5 /note= /note=Monomer with ATP PTM score is greater than the Monomer PTM score /note= /note=Monomer with NAD PTM score is greater than the Monomer PTM score CDS 29882 - 30517 /note=I believe this is a gene since it has sufficient base pairs (>120bp), includes all coding potential, matches with an E-value of 5e-68 with Hum25_48, and has synteny with gene 53 on Khuang_Draft in Phamerator. I believe it starts at the site agreed upon by Glimmer and GeneMark since there is a 4 base pair overlap, Z-score of 2.14, and a final score of -4.629. The starterator is not informative. I believe it is an erf family DNA binding protein because that lines up with Hum25 and has a high probability and coverage on HHpred. Alphafold data shows PTM>0.5 for monomer and trimer. Homodimer and homotrimer iPTM<0.8. PTM monomer=PTM monomer ATP=PTM monomer NAD, all greater than 0.5. CDS 30510 - 31055 /note=Gene evidence: /note=- Glimmer and GeneMark agree on a start /note=- Includes coding potential /note=Start evidence: /note=- Alignment on starterator /note=- Z-score of 2.679, final score of -4.445 /note=- 8 base pair overlap /note=Functional call: /note=- insufficient evidence for functional call /note=Alphafold: /note=- Dimer/trimer PTM >0.5 /note=- Dimer/trimer iPTM<0.8 /note=- Monomer ATP PTM >0.5 when monomer PTM <0.5 /note=- Monomer NAD PTM >0.5 when monomer PTM <0.5 CDS 31059 - 31415 /note=Gene Evidence: /note=- Same Glimmer and GeneMark start called /note=- All coding potential /note=Start Evidence: /note=- Alignment on starterator /note=- Low Z-score 0.8, final score -7.251 /note=Functional Call Evidence: /note=- Insufficient evidence for functional call /note=Alphafold: /note=- Monomer, dimer, trimer PTM > 0.5 /note=- Trimer iPTM > 0.8 CDS 31416 - 31790 /note=Gene Evidence: /note=- Glimmer and GeneMark align on a start /note=- All coding potential /note=Start Evidence: /note=- Alignment on starterator /note=- Z-score of 2.6, final score of -3.368 /note=- No gap or overlap /note=Functional Call Evidence: /note=- Evidence for functional call /note=AlphaFold: /note=- Monomer/dimer PTM > 0.5 CDS 31787 - 32077 /note=This length of this gene is 279 base pairs long and has high coding potential according to gene mark. All coding potential includes and has the LORF. Glimmer and gene mark call different start sites. Starterator data indicated that the start site of 31799 was the most called start site, however given that phage genomes are compact, a -4 base pair overlap is optimal. The phagesdb function frequency cannot determine the function. Phagesdb BLAST points to an unknown function and the HHpred data probabilities are all below 40%. This gene codes for a protein, but its function is unknown, therefore the hypothetical protein is called. /note= /note=Alphafold: /note=PTM > 0.5 monomer CDS 32137 - 33381 /note=This gene is very long, with 1245 basepairs and has high coding potential as indicated by gene mark. Glimmer and Genemark agree with the same start site of 32137. /note=The phagesdb function frequency points to this coding for DNA Methylase, this is also supported by the Phagesdb BLAST and the HHpred data. With all data gathered, the DNA methyltransferase call can be made. /note= /note=Alphafold: /note=PTM> 0.5 monomer/dimer CDS 33378 - 33551 /note=This gene is very short, being 174 basepairs long and has high coding potential as indicated by gene mark. Glimmer and Genemark agree with the same start site of 33378. /note=The phagesdb function frequency points to this gene coding for either endonuclease or exonuclease. Phagesdb BLAST points to an unknown functions as well as endonuclease and exonuclease. The HHpred data suggests that this may code for Scavenger receptor class A member 5, although this result is only around 80% probable. This gene codes for a protein, but the function of this protein cannot currently be determined due to conflicting evidence, therefore the hypothetical protein is called. /note= /note=Alphafold: /note=PTM w/ ATP > 0.5 when monomer alone <0.5 /note=PTM w/ ATP > 0.5 when monomer alone <0.5 CDS 33551 - 34300 /note=Start site is agreed by both glimmer and genemark. The length of the gene is 33551 and genemark supports that this has a high coding potential. HHpred data and NCBI blast data supports that this codes for a RepA-like replication initiator protein. /note= /note=Alphafold: /note=PTM > 0.5 dimer CDS 34297 - 35538 /note="DnaB-like dsDNA helicase" appears in Phagesdb and HHPred BLASTs nearly uniformly. Glimmer and Genemark agree on the start site. Z-score and final score are good, with a -4 basepair overlap. Genemark shows good coding potential for this area. Alphafold PTM > 0.5 for monomer. CDS 35532 - 35810 /note=Evidence of Gene: Both GeneMark and Glimmer pick up this as a gene, all the coding potential is included in that region of the gene /note=Start Evidence: Some synteny with khuang, but the starterator not informative. Z score - 3.069 and Final score -3.222. Gene overlap of 7 /note=Functional call: NrdH-like glutaredoxin , correlates with HHPRED and NCBI blast enough evidence for functional call /note=Alphafold: monomer PTM >0.5 0.87 CDS 35810 - 36169 /note=This gene has a start site of 35810 as called by GeneMark and Glimmer. It has the longest ORF with a z-score of 2.933, final score of -2.742, and a -1bp overlap. Coding potential determined by GeneMark is also relatively low compared to other genes. There is unsubstantial data from BLASTS in order to determine the function of this gene. /note=Alphafold- monomer PTM>0.5 0.71 CDS 36172 - 36696 /note=Gene evidence: /note=- This gene has a start site at 26172 as called by Glimmer and Genemark. /note=- It includes all coding potential as evidenced by genemark. /note= /note=Start evidence: /note=- it has a z-score of 2.237 /note=- final score of -4.213 /note=- 2 bp gap /note= /note=Functional call: /note=- PhagesDB function frequency, BLAST, and HHPred /note= /note=Alphafold: /note=PTM >0.5 monomer/trimer CDS 36693 - 37115 /note=Glimmer and GeneMark agree on start site. /note=There is strong coding potential. /note=There is a good z-score and final score, as well as a -4 bp gap. /note=Strong RusA-like resolvase function call from PhagesDB data base and BLAST, as well as HHPRED. /note=AlphaFold: Monomer PTM > 0.5, homodimer iPTM > 0.8, homotrimer iPTM < 0.8. CDS 37108 - 37269 /note=The length is 162bp, no significant similarities when blasted, it has the longest orf, glimmer and genemark agree, no starterator data, it has high coding potential, the z-value is 3.171 and the final score is -2.112. It has no large gaps and a slight overlap with the next gene CDS 37257 - 37904 /note=Called start site at 37257 because this is where Starterator data called the gene. Additionally, it has a better final score. /note= /note=Blast data, NCBI, and HHpred only suggest a hypothetical protein /note= /note=AlphaFold: /note=-Monomer and Dimer PTM > 0.5 CDS 38175 - 38486 /note=Gene was added because of strong HHPRED hits on the C-terminus of this hypothetical protein. Deleted Reverse genes with weak HHPRED hits and no blast hits. CP is weak for most of this gene except the far 3` end. Reverse gene in the same area has better CP, but selected this based on HHPRED again. START was selected to be LO even though start 38298 has slightly better RBS scores. CDS complement (38483 - 38824) /note=This is a gene called by glimmer and genemark. It has a start site of 38824 with a Z score of 2.600 and a final score of -3.368. It has the longest. Insufficient evidence for functional call. The PTM score is 0.69. CDS 38971 - 39267 /note=This gene is a forward gene that has strong codding potential on gene mark with a length of 297 with a gap of 146, the z score is a 1.84 with a final score of -5.961 further showing that this is a gene with the stop site of 38971. GeneMark and Genemark agree with the start site at 38971. CDS 39278 - 39388 /note=Gene Added. Not called by glimmer or GM. CP present on 5` end. While small, this fills a gap. Insufficient evidence for function. While below the threshold, HHPRED hits are in the upper 70%. CDS 39399 - 39659 /note=This is a forward gene with a length of 261 with a gap of 131. This gene has very high coding potential with a z score of 3.098 and a final score of -2.253, further proving that it is a gene with the start site of 39399 bp. /note=FUNCTION - determining if it is hypothetical or if there is a specific function