CDS 1 - 534 /note=*possibly look at ending gene to see if they connect somehow? /note= /note=This gene includes all coding potential with the start site at bp 1. Both glimmer and genemark agree on this call, it includes the longest open reading frame, and the z-score and final score for the start site 1 is the strongest of the potential start sites. The BLAST function shows a similar sequence in multiple phages, Sonali and Mufasa8, and both have the same alignment. /note=HHPRED, both BLAST results have numerous HTH DNA binding proteins. HHPRED identifies numerous proteins that function through DNA interactions response regulators, transcription factors, sigma factors etc. CDS 649 - 2124 /note=Start site /note=Glimmer 649bp /note=Genemark 655bp /note=After further investigation, Most likely start site is at 649bp. /note=This start site has a Z score of 2.098, and Final Score of -4.299. This Z score is similar to Genemark`s predicted start site, but has a better final score. /note= /note=Additionally, this region shows high coding potential on Genemark S and Genemark Arthrobacter Sp. /note=Highly similar to terminase genes in all databases. CDS 2144 - 5155 /note=although a lot of the evidence (phamerator & genemark) point to start #4 as thye actual start, i think the start is actually at #1 because the z-value is highest and final score is closest to 0 @ 1. also big gap = red flag. by having the start @ 1, the gap decreases, which leads to reason for more coding potential. in conclusion my argument is that the start is at #1. blast data from dnamasters supports that the start is earlier than what dnamaster called it. /note= /note=- Start Site Base Pair: 2144 /note=- Stop Site Base Pair: 5155 /note=- 2970 bp /note=- closest match to Arthobacter phage Sonali; portal protein /note=- 61 bp gap upstream; 8 bp gap downstream /note=- Z-value = 3.117 /note=- Final Score = -2.253 CDS 5164 - 5541 /note=I believe this is a gene since it has sufficient base pairs (>120bp), includes all coding potential, matches with an E-value of 2e-65 with Sonali, and has synteny with a capsid maturation protease with gene 4 on Mufasa8 in Phamerator. I believe it starts at the site agreed upon by Glimmer and GeneMark, has a common start of GTG (26%), and has the best Z-score (3.201) and final score (-2.458) out of all the other starts. CDS 5538 - 6623 /note=Gene 5 is likely to be a gene that has start and stop 5’ end 5538 3’ end 6623 with a total BP of 1086BP with a Start codon (ATG) and total BP, 1086b, which is long enough to be a gene. The gene has coding potential and close BLAST matches are Sonali (e value: 1E-172). Glimmer and genemark agree on start. Most common start. Small gap of -4bp. Longest open reading frame. RBS sequence has good Z-score (2.786) and final score (-2.906). /note=Not enough data to support functional call CDS 6655 - 8118 /note=I think this is a gene because glimmer and genemark agree on its start, it has a high Z-value, it is the longest ORF, and has high coding potential. CDS 8232 - 8552 /note=confirm that this is a gene based on the evidence of the strong coding potential that glimmer picks up the start of gene 7 to be 8232 and this is further confirmed by the data shown in gene mark. Although there is no data that compares this gene from a gene of another phage (phages DB says Orpham.) There is no overlap of the gene but there are gaps that do align with the genomic standards. The gene has a strong z score that is 2.69 further proving that this is a gene. CDS 8568 - 9440 /note=ID`s gp6 HK97 and SPP1 15 to support the functional call. /note=Is it a gene? /note=– What is the start and stop? /note=o Start Site Base Pair: 8568 /note=o Stop Site Base Pair: 9440 /note=– Is it long enough to be a gene? (number of bp) /note=o 873 bp (over the expected 120 bp minimum for phage genes, so good) /note=– Does this gene have coding potential? /note=o Gene Mark S /note= /note=o Gene Mark w/ Glimmer /note= /note=– Does it include all the coding potential? /note=o Don’t understand what the question means; I assume it is referring to maximum potential, which there is some doubt because there is “streaking” (the constant ups and downs of the coding potential in the area where Gene 8 is located on the GeneMark records) /note=– What is the closest BLAST match? What is the E-value and coverage? /note=o Local BLAST results in Sonali_9, head-to-tail adaptor protein to be the closest match with an a value of e-147. /note= /note=o Library BLAST (Total Network, Main Website) leads to a similar result: Arthrobacter phage Sonali with the head-to-tail adaptor protein with an E-value of 3e-174, and percent identical of 90.34%. /note=– Does this gene have synteny in other phages? /note=o Since Synteny means collinearity of genes, Maruru has synteny with Sonali in for the former gene 8 nucleotides, resulting in the BLASTn search producing 780/877 (89%) identical nucleotide base pairs, and 8/877 gaps (rounded to 0%), with the expected value being 0.0. /note=Where does it start? (5’ -> 3’ = 8568 bp to 9440 bp) /note=– Does glimmer and genemark agree with the start? /note=o Original Glimmer call @bp 8568 has strength 14.08 /note=– Starterator- Is it the most common start? /note=o No other phages in Starterator Report has that particular start site; only Maruru Gene 8 is listed for that start site. /note=– Is there a gap or overlap? /note=o There is a gap upstream of 16 bp, which is more than enough for there to be a reasonable promoter region. /note=– Does this gene have the longest ORF? /note=o Yes (+1 ORF) /note=– Does the RBS sequence have a good Z-score and Final score? /note=o Z-score: 3.040 /note=o Final Score: -2.339 /note=What is its function? /note=– Looking at its closest non-draft phage relative, what function is expected in this region? /note=o Using the Phamerator Cluster Map (FG), the closest related phages that are non-draft, which are Sonali and Mufasa, Maruru’s Gene 8 is predicted to be a head-to-tail adaptor protein. /note=– BLAST the gene on phagesdb. What is the sequence alignment with its other phage genes with predicted function? /note=o The closest-related phage on BLASTp (Local Library) predicts Gene 8 to be a head-to-tail adaptor. /note=– Blast the gene on NCBI, what is the sequence alignment with its other phage genes with predicted function? /note=o The closest-related phage on BLASTp predicts Gene 8 to be a head-to-tail adaptor. /note=– HHpred data: /note=o Q: What alignment matches have a probability above 90%? /note= #1 Hit 8HQO_Q: Escherichia phage DT57C – 99.84% /note= #2 Hit 6TE9_C: Rhodobacter capsulatus – 99.83% /note= #3 Hit 8FXR_p: Agrobacterium phage Milano – 99.82% /note= #4 Hit 8QEK_b: Staphylococcus phage 812 – 99.7% /note= #5 Hit 3JVO_M: Enterobacteria phage HK97 – 98.57% /note= #6 Hit 7Z4W_a: DNA Chan – 98.56% /note= #7 Hit 8VD8_B: Dubowvirus dv80alpha – 98.27% /note= #8 Hit 1ZTS_A: Unknown name (Hypothetical Protein yqbG) – 98.2% /note=o Q: What is the description of these matches? /note= Putative neck protein, connector, head-completion, adaptor protein (Each hit has one or more of these descriptions). /note=o Q: How much does this protein align with the genes? (Assuming Alligned cols) /note= #1: 167 /note= #2: 175 /note= #3: 185 /note= #4: 197 /note= #5: 99 /note=#6: 102 /note= #7: 98 /note= #8: 113 /note=– Other Evidence? Alphafold and Foldseek /note=o Alphafold states no results, for unknown reasons. /note=o Foldseek: Top Hit is an Uncultured Caudovirales phage with an E-value of 8.17e-11 in the BFVD /note= Foldseek also states for AFDB50, Subtercola vilae has an E-value 9.06e-22 CDS 9437 - 9955 /note=Gene 9 has a start point of 9437 with a glimmer and genemark starts that agree with each other. 519 base pairs making it long enough to be a gene. It does have a coding potential. Closest BLAST match is Sunshine23 with a coverage of likely 100%, it has a synteny with Mufasa8. 7 base pairs overlap with gene 10. Pham Starterator of 12785. Z value of 2.579 meaning positive, while Final Score is -3.267 meaning negative sequence alignment. /note=Function assigned as minor tail protein according to HHPred data. CDS 9948 - 10103 /note=Yes this is a gene. Glimmer and GeneMark agree with a strength of 1.34. It has a blast match with Sunshine23_Draft_11 with an e-value of 2e-17. The z-score is 1.661 and final score is –6.288. There is all coding potential in GeneMark. /note= /note=Not enough evidence for functional call CDS 10113 - 10406 /note=Yes. This is a gene with a start site of 10113 and stop site of 10406. Both Glimmer and GenMark called the start at 10113. It is the longest orf. Includes coding potential. Both the Z-score (3.169) final score (-2.011). CDS 10406 - 11389 /note=Glimmer/GeneMark agree on start site. Good coding potential, synteny, and is the most common start. Good Z-score as well. /note=BLASTs and HHPRED support major tail protein call along with synteny with Sonali and Mufasa8 CDS 11473 - 11970 /note=Start: 11473 (called by Glimmer and Genemark) /note=Stop: 11970 (called by Glimmer and Genemark) /note=bp length: 498 bp /note=Coding potential: gene has coding potential (supported by Genemark) /note=BLAST match: 100% protein identity match with Sonali, and 98% similarity with Sonali nucleotide /note=Synteny: Sonali_13 and Sunshine23_16 /note=Starterator: Maruru and 2 other phages called the same start /note=Gap: 85 bp gap between gene 12, and 40 bp gap between gene 14 /note=RBS: z-score of 3.095 and final score of -2.441 CDS 12009 - 12170 /note=This gene has a start at 12009 bp and a stop at 12170 bp (length of 162) with agreement from Genemark and Glimmer. There is a gap of 38 bp with upstream gene and 11 bp with downstream gene with coding potential. This gene does not have the longest ORF, has a Z-score of 2.857, and a final score of –2.953. The closest BLAST match is Sonali which has an e value of 7e-24 and 100% coverage. CDS 12167 - 12610 /note=The gene starts at 12182 and ends at 12610, genemark and glimmer an in agreement about this. This gene is also is longer than 150 base pairs long. Genemark shows that this gene has a high coding potential in the chart. The closest BLAST match is Sonali, the E-value is 5e-74 This phage has some synteny with Sunshine 23 and with Sonali 15. The phamarator report shows that there is similarity between the phages. The start site of this gene is also known, Glimmer states that it starts at 12182, Genemark is also in agreement with it. Starterator states that it has the most annotated start, at 12182. This gene has a gap upstream of 12 base pairs. According to DNA master, this gene does not have an overlap, as the genes upstream and downstream of it has gaps. This gene is also not the longest ORF. Finally the Z-score of the start site of this gene is 3.008 and the final score states that it is -2.593. CDS 12620 - 17245 /note=Yes. It is 4626 bp in length with a start sight of 12620 and a stop sight of 17245 both the glimmer and genemark are in agreement. The closest match had an E value of 0.0. It also had synteny with other phages on the phage map. CDS 17270 - 18190 /note=Coding potential, synteny, BLAST all support that this is a gene with the start of 17270. Synteny and BLAST results also support that this is a minor tail protein. Additional data, HHPRED, and CDD, also support this functional call. CDS 18201 - 19316 /note=This genes synteny, HHPRED, and NCBI points to the function being a minor tail protein with high coverage (above 90%). Start most likely is start selected. CDS 19327 - 20364 /note=z-score and final score support start @ 19327 CDS complement (20369 - 20977) /note=Original Glimmer call @bp 20977. Phamerator agrees with this data and so does DNA master when looking at z-value and final score. BLAST data at beginning of dnamaster does not mach up. /note= /note=Gaps upstream and downstream. Gap downstream much smaller than upstream. 4<60 /note= /note=start @ 20977 /note=stop @ 20369 CDS 21038 - 21928 /note=all data/exidence supports that the start is @ 21038 and end is @ 21928 /note=- 891 bp /note=- 60 bp gap upstream; 33 bp gap downstream /note=- Z-value = 2.963; Final Score = -2.794 /note= /note=MAY BE A VIRION STRUCTURAL PROTEIN BUT NOT ON LIST OR MINOR TAIL PROTEIN CDS 21962 - 22369 /note=glimmer & genemark call start @ 21962 and stop @ 22369 /note=- 408 bp /note=- 33 bp gap upstream; 4 bp overlap downstream /note=- Z-value = 3.022; Final Score = -2.315 CDS 22366 - 23457 /note=- Glimmer call @bp 22366; GeneMark calls start at 22372 /note=- 1092 bp /note=- 4 bp overlap upstream; 13 bp gap downstream /note=- Z-value = 2.963; Final Score = -2.505 /note= /note=FUNCTION COULD BE LYSIN A CDS 23469 - 23924 /note=- Original Glimmer call @bp 23469 /note=- 456 bp /note=- 13 bp gap upstream; 69 bp gap downstream /note=- Z-value = 3.201; Final Score = -1.993 /note=- synteny with sonali gene 24 but sonali gene 24 function is unknown as are the functions of the FG cluster. Pham 212443 has some genes that have known same funtions but are in different cluster from FG. (jemerald, aesir, and anakin) CDS 23992 - 25752 /note=The gene start site: 23992, gene stop site: 25752. This is supported by Genemark and glimmer. This is the most common start as it is found in 53 genomes that were manually annotated. In total, 66 genomes have this pham and called the start site. The closest BLAST results are with Prairie. This gene has a 67 base pair gap upstream. Through using phagesdb BLAST, many of the functions are unknown. The NCBI BLAST data states that the function points to a minor tail protein, but the coverage is very low. HHPRED points to a variety of different proteins, no definitive function is found. CDS 25762 - 26697 /note=This is a gene. The start and stop is: Start: 25762, Stop: 26697. This is supported by Glimmer and DNA master. This region has high coding potential as called by Genemark. The closest BLAST results are with a phage named Inked24, which is a minor tail protein. Glimmer and Genemark agree with the start site. The start number called the most often in the published annotations is 8, it was called in 49 of the 99 non-draft genes in the pham according to the starterator data. The function of the gene is not definitive, phagesdb BLAST results states that the function is unknown. HHPRED data points to a veriety of different proteins, but no definitive function can be derived. CDS 26708 - 27541 /note=This is a gene, this is supported by the Genemark data, indicating that this region has a lot of coding potential. The start site is 26708, the stop site is: 27541, as indicated by Glimmer and Starterator data. It is also long enough to to be a gene, it has a length of 834 basepairs. The start number called the most often in the published annotations is 8, it was called in 49 of the 99 non-draft genes in the pham. It is only found in 7 of the 136 genes in the pham. Called in 5 manual annotations. The phagesdb BLAST states that the function is unknown for the proteins, the highest alignment sequence was 57.971%. The NCBI BLAST states that the function ranges from a minor tail protein to a hypothetical protein. HHPRED data indicates a variety of different functions, from major structural proteins to putative baseplate proteins, no definitive function can be called. CDS 27558 - 27689 /note=This region has a high coding potential as indicated but he host trained Genemark. Its start site is at 27558 and stop site is 27689 as indicated by the starterator data and glimmer. It is also long enough to be a gene, it has 132 basepairs. The closest BLAST match is with a phage named Nightmare25. This is the most common start, as indicated by being the most called start by 11 annotated phages. Through the phagesdb BLAST, I cannot determine a function as the data suggests that the function is unknown. The NCBI BLAST results state that it is most likely a hypothetical protein, but there is variability in that data. CDS 27682 - 28050 /note=This is a region of high coding potential as indicated by the host trained Genemark. It induces all the coding data. Through the closest BLAST match, it matches with Sonali28. The function cannot be completely determined as the NCBI BLAST, phagesdb BLAST and the HHPRED data all indicate different functions. CDS 28338 - 29489 /note=Glimmer called the start site at 28515, but this gene has an RBS score of (z-score: 1.654 and final score: -5.332). Genemark called a start site 28338 with an RBS score of (1.67 and -6.764). Since the reverse gene with a stop site of 28385 shows evidence of not being a gene, it could be beneficial to move the start site further upstream to close the gap (as Genemark calls it). It has complete coding potential as well as synteny with Sonali, Sunshine23, and Mufasa8. There is also a 100% probability that it is an integrase as supported by HHPRED and BLAST results. CDS complement (29486 - 29698) /note=Glimmer and Genemark call the start site at 29698 with complete coding potential. It has an RBS score of (z-score: 3.201 and final score: -1.931) which is good. It has synteny with Sonali and Sunshine23 and the closest BLAST match was Sunshine23 (e-value: e-112 and coverage 99%). There is high probability that the function is either excise or helix-turn-helix DNA binding domain. The highest probability is 98.3% for the excise function with an e-value of 2.8e-9 and the next highest probability of 98.2% for the excise function with an e-value of 3.4e-9 as supported by HHPRED. CDS complement (29709 - 29915) /note=There is only one potential start site for this gene and it is at 29915. Its RBS score of (2.857 and -2.601) also solidify the probablility that this is its start site. There is no clear function of this gene that is clearly supported by the evidence. It has a 51% alignment with the function of ribonucleotide reductase on phagesDB, but I didn`t find any similarities on NCBI blast. It`s highest alignment is with a function that is unknown according to HHPRED. CDS complement (29912 - 30178) /note=This gene has a start site at 30178 as called by Glimmer. Genemark also calls the start at 30178, and found it to include all coding potential. It also has a good RBS score of (z-score: 3.169 and final score: -1.931). Its closest BLAST match was Sonali (with an e-value of 2e-43). There were no alignment matches with a probablity of 90%, but at 84.5% alignment it is most like a transcriptional regulator as supported by HHPRED. It does express high alignment with Sonali on phagesDB. CDS complement (30189 - 30410) /note=Confirm that the gene start is 30410 as it is read by both gene mark and glimmer, gene mark showing high coding potential insists that it is a gene. there does not seem to be any unusual gaps or overlaps although one thing to note is that this gene is an inverse gene. Although the z-value is low but the coding potential (gene mark) gives good reason to believe that gene does start there. CDS complement (30461 - 30784) /note=Start site at 30784bp of this reverse gene there is no unusual gap or overlap at this stop site. Both genemark as well as glimmer picked up the start to be here. there are no unusual gaps or overlaps and the z score is 2.93 further proving that it is a gene. CDS complement (30860 - 31228) /note=this gene has high coding potential at the call that both glimmer and gene mark made at 31228. this gene is inverse and has a z score within the acceptable range (2.963). Gene mark shows high coding potential further proving the genes start site. /note= /note=3 CDS complement (31306 - 32121) /note=Both genemark and glimmer read the start of this gene to be at 32121bp there is no gap or overlap of the gene. The z score is within the acceptable range 2.931. The gene mark further proves that there is a strong coding potential. The start codon is also ATG further indicating that this is a gene. Although there is no function frequency that can indicate the function of the gene. /note= /note=NEEDS A FUNCTION: hypothetical protein, not enough proof CDS complement (32264 - 32707) /note=Start 32707 bp, gene mark picked up the start side, although the z score (2.436 compared to 2.85) is a little smaller than other start sites, the gap is only 16 which is more promising that the start is at 32707bp. There is also high coding potential at the called start site seen on gene master. Conclusion - the start site is at 32707 bp due to the high coding potential as well as the high z score (2.436) as well as a reasonable gap down stream. /note= /note=Peptidase: Pham map with sonali and mufasa (both called peptidase- similarites with sonali and mufasa which have consistently been similar) HHpred 99.6% probability AND HIGH COVERAGES. CDS complement (32724 - 33215) /note=This is likely a gene since it has sufficient length (492 bp), coding potential, a reasonable RBS, and has synteny with other phages (Sunshine23_Draft_40 (1e-56), Sonali_39 (1e-56)). It also has a high probability of having the same function as another gene (helix-turn-helix DNA binding domain, probability 98.1). CDS 33545 - 33682 /note=This potential gene needs to be looked at carefully since the length is 138 bp, has a large gap (476 bp) with the upstream gene, and doesn’t have the longest ORF. This is due to the upstream gene being a reverse gene, leaving a larger gap. There is evidence to suggest that this is a gene, such as synteny with similar genes (Sonali_40 (4e-19), Sunshine23_Draft_41 (5e-19)) and coding potential. CDS 33679 - 33957 /note=This is likely a gene since it has sufficient length (279 bp), coding potential, synteny with other phages (Sonali_41 (9e-45), Sunshine23_Draft_42 (2e-44)), an overlap of 4 bp, the longest ORF, and the supported RBS. It also has a high probability of having the same function as another gene (Recombination Directionality Factor, probability 97.4). CDS 34066 - 34416 /note=This is likely a gene since it has sufficient length (351bp), coding potential, synteny with other phages (Sunshine23_Draft_43 (1e-62), Sonali_42 (1e-62)), and the supported RBS. It also has a high probability of having the same function as another gene (Stage VI sporulation protein F, probability 41). CDS 34416 - 35648 /note=This is likely a gene since it has sufficient length (1233 bp), coding potential, synteny with other phages (Sonali_43 (0), Sunshine23_Draft_44 (0)), the longest ORF, and the supported RBS. It also has a high probability of having the same function as another gene (AAA-ATPase, probability 99.9). CDS 35645 - 35929 /note=Start Site: 35645; Stop Site: 35929; 285 bp gene; Coding potential in both GeneMark files state a high likelihood for a gene. For BLAST, the local function and the library BLASTp states Sonali Gene 44 as having the most similar gene, with the library function stating a 5e-60 probability of dissimilarity. Starterator states Start Site 35645 with 4 “Most Annotated Starts;” with a 3 bp overlap upstream. The RBS values are a Z-score: 2.467, and Final Score: -3.587. BLAST (Local and Library), HHPRED do not state a particular function; though, there is good evidence that this is a gene, as stated above. CDS 35926 - 36084 /note=Start Site: 35926; Stop Site: 36084; 159 bp gene; both GeneMark reports potential gene. Local and Library BLAST state Sonali Gene 45 as the most similar gene with an E-value of 4e-22 and 2e-23, respectively; Percent Identical: 96.15%. Starterator: only 1 “Most Annotated Start;” with 4 bp overlap upstream. The RBS scores are Z-score: 1.882 and Final Score: -4.905. Phamerator, BLAST (Local) state unknown function; while the Library BLAST states a hypothetical protein HOV09_gp45. In HHpred, there are no alignment matches above 90%. In Foldseek, the closest match via the polypeptide product is the Arthrobacter phage Mufasa8 with a Probability value of 0.00, so this gene is present within Maruru but the function is still unknown. CDS 36167 - 37042 /note=Start Site: 36167; Stop Site: 37042; 876 bp gene; both GeneMark reports state the prominence of a gene. Local BLAST states two major similar species with this gene: Sunshine_23 Draft 48 with unknown function for the protein, and an E-value of e-169, and Sonali_46 with an Exonuclease Protein, and an E-value of e-168. Library BLAST states Sonali with the exonuclease function and 98.97% percent identical. HHpred Data: 17 hits above 97% for Probability, with the first hit being: 5ZYT_C; Name: Mitochondrial genome maintenance exonuclease (human MGME1); Probability: 99.89; E-value: 5e-21. On FoldSeek, the top result is from the BFVD library: it is the uncultured Caudovirales phage; Probability: 0.00; Seq. Id.: 20.3; E-value: 8.06e-15. CDS 37039 - 38160 /note=Start Site: 37039; Stop Site: 38160; 1122 bp gene. Both GeneMark reports state that the Start Potential is excellent (goes straight to 1.0), while there being a dubious “dip” in coding potential around the 37700 bp region. Local BLAST has three top hits: Sunshine_23 Draft_49 and Sonali_47 with unknown functioning proteins and both having a e-value of 0.00, while the third hit is Mufasa8_47, with the protein function being RecT-like ssDNA binding-protein and an e-value of e-131. Library BLAST states phage Sonali with RecT-like ssDNA annealing protein function, an E-value of 0.00, and a percent identical of 92.27%. Starterator: 4 bp overlap upstream; 12 MA’s. The RBS values are Z-score: 2.963 and Final Score: -2.584. HHpred Data: 3 hits above 97%, with the top hit being 7UJL_A; Name: Recombination protein bet; Annealase, Synaptase, SSAP, Single-strand annealing protein, DNA annealing intermediate, Reco; Probability: 99.77; E-value: 1.9e-17. For FoldSeek, the top hit is the uncultured Caudovirales phage; Probability: 0.00; Seq Id.: 20; E-value: 3.24e-14. CDS 38164 - 38760 /note=Start Site: 38164; Stop Site: 38760; 597 bp gene. Both GeneMark reports state dubious results with constant fluctuation around the half way point of the gene. Local BLAST reports Sunshine23_Draft_50 with an unknown function and an E-value of e-177, and Sonali_48 with the ssDNA binding protein function with an E-value of e-115. Library BLAST states ssDNA binding protein with an E-value of 7e-130 and a Percent Identical of 95.96%. Starterator: 230 MA’s at this start site; and 4 bp overlap upstream.. The RBS values are Z-score: 2.493 and Final Score: -4.359. Phamerator indicates similar gene in Sonali Gene 48, which is an ssDNA binding protein. BLAST data listed above; main function detected: ssDNA binding protein with an E-value of 7e-130. HHpred Data: 48 hits being above 90%; majority of these hits indicate ssDNA binding protein; top 21 hits being ssDNA binding protein with 99.8% similarity. The top hit is 1 3EIV_B Single-stranded DNA-binding protein with a probability value of 100.0, an E-value of 8.5E-28. CDS 38776 - 39165 /note=It is a gene. With a Start 38776 this would include all coding potential with a gap at the start.The z-score 2.362 and final score of -3.729 and has the longest ORF, making this call more important. It has an e-score of 5e-69 with Sonali. There is no predicted function. The is a large gap on downstream gene. CDS 39355 - 39804 /note=GeneMark and Glimmer both agree. It is long enough to be considered a gene and has strong coding potential from GeneMark. There is match with 100% coverage and e-score of 1e-84. There is 11.7 strength with the start of the gene but the gene has 190bp gap upstream with a higher z-score and final score. If the gene is called at the earlier start site, and the longer ORF, the gap would be smaller does not show strong coding potential with longer ORF. CDS 39808 - 40479 /note=This is a gene and has strong coding potential. It has a blast match with Sonali with 100% coverage and an e-score of e-120. It has synteny with phage Mufasa and has a z-value of 2.795 and final score of -3.905. There is not strong evidence for a protein prediction. CDS 40561 - 41109 /note=This gene has strong coding potential and syteny with phage Mufasa. Glimmer and GeneMark both agree and has a strength of 12.39. It has a 124bp gap upstream but if the start site with the longer ORF was used, the gap would be smaller although this call has all coding potential and strong glimmer score. The z-score is 1.579 and the final score is -5.474. There is strong similarity in other phages that the function is a RusA-like resolvase with probability above 90%. CDS 41106 - 41309 /note=This gene has strong coding potential. Both Glimmer and GeneMark agree on the start with the strngth of 8.08. There are slight overlaps but they are only 3bp. The z-score is 2.131 and the final score is -4.229 a little high but compared to the other score is great. The phage Sonali calls for the same most common start in the Starterator Report. CDS 41306 - 41461 /note=Although the ORF length is not the longest, coding potential, Z-value/Final score, synteny, and BLAST data support the conclusion that this is a gene with a start site of 41306. CDS 41451 - 41663 /note=Although the ORF length is not the longest, coding potential, Z-value/Final score, synteny, and BLAST data support the conclusion that this is a gene with a start site of 41451. There is not strong enough evidence for a protein prediction. CDS 41678 - 42187 /note=Coding potential, ORF length, Z-value/Final score, synteny, and BLAST data support the conclusion that this is a gene with a start site of 41678. /note=99.2 probability this is a regulatory protein and Mufasa has a helix-turn-helix DNA binding protein. CDS 42255 - 42449 /note=Although the ORF length is not the longest, coding potential, Z-value/Final score, synteny, and BLAST data support the conclusion that this is a gene with a start site of 42255. /note= /note=Function has 98.2-98.6 probability with 89.0-93.75% coverage. CDS 42521 - 42763 /note=Coding potential, ORF length, Z-value/Final score, synteny, and BLAST data support the conclusion that this is a gene with a start site of 42521. There is not strong evidence for a protein prediction. CDS 42869 - 43210 /note=Coding potential, ORF length, Z-value/Final score, synteny, and BLAST data support the conclusion that this is a gene with a start site of 42869. /note= /note=Base pair length is 342 and it has coding potential throughout that includes the whole gene. The closest match as Sonali with an e-value of 5e-169 with 99% coverage. The glimmer and genemark are both in agreement it has the longest ORF and the z-value is .999 and -6.733. CDS 43210 - 43599 /note=High Coding potential according to Genemark, Longest ORF length also has best Z-value/Final score, High synteny, and BLAST data support the conclusion that this is a gene with a start site of 43210. CDS 43596 - 43826 /note=Coding potential, ORF length, Z-value/Final score, synteny, and BLAST data support the conclusion that this is a gene with a start site of 43596 /note= /note=The gene is 231 bp long has coding potential the closest match is Sonali with an e-value of 4e-104 and a cover of 98%. The genemark and glimmer are in agreement the z-value is 1.548 and the final score is -5.543. CDS 43819 - 44166 /note=Changed the start site to 43819. Longest ORF with a 8bp overlap. RBS does not matter that much on these gene since it has an overlap of 8bp with the last gene, so it is a continuation of the previous gene. 8bp overlap is better than 52 gap. CDS 44163 - 44456 /note=Coding potential, ORF length, Z-value/Final score, synteny, and BLAST data support the conclusion that this is a gene with a start site of 44163. /note= /note=The bp length is 294 and it has all included coding potential, the closest match was Sonali with a e-value of 8e-137 and a 100% coverage. The z-value is .690 an the final score wasa -7.672. CDS 44446 - 44643 /note=CHANGE NEEDED /note=Start called by GeneMark is most accurate according to coding potential. /note=Function unable to be determined through HHPred, and all of the Pham members have it labelled as "function unknown" and NCBI BLAST labels matching genes "hypothetical proteins." CDS 44640 - 45053 /note=POSSIBLE CHANGE NEEDED /note=The LORF has starting site at 44490, which if changed to, would eliminate the existence of Gene 64 which is a hypothetical and comparatively improbable gene. The RBS scores, however, are much worse at the start site 44490. Start 44640 also is NOT the most annotated start. /note=The function is determinable through unanimous answers in NCBI BLAST, as well as "helix-turn-helix DNA binding domain" showing up in the PhagesDB Function Frequency and PhagesDB BLAST results. CDS 45050 - 45310 /note=Function is indeterminable through HHPred results, and PhagesDB BLAST results show shared genes as "hypothetic protein." CDS 45307 - 45795 /note=Function is unclear, however, "queuine trna-ribosyltransferase" appears repeatedly in PhagesDB BLAST and has the highest percentage in PhagesDB Function Frequency. This was not an option on the function pop-down, so "hypothetical protein" was chosen. CDS 45792 - 46145 /note=Coding potential, most annotated start site, good overlap, LORF, and good RBS scores all point to this being a gene, however, it is hard to determine the function based on BLAST info. CDS 46142 - 46291 /note=The start site is at 46142, as called by Glimmer and Genemark. It has a 4 bp overlap with its upstream gene which means its RBS scores don`t have to be great. The other potential start sites have too much of a bp overlap to be credible. It also includes all coding potential as evidenced by Genemark. There are strong blast results with phage Sonali ( 1e-25 and a 94% identities with 95% positives). CDS 46288 - 46503 /note=Start site supported. Shows coding potential and has good z-score and final score. 11 "MAs" at start 46288 (PHAM Data). NCBI Blast has a 67.74% aligned and a coverage of 81.69%. CDS 46503 - 46709 /note=Glimmer and Genemark agreement. LORF has too much of an overlap to be considered. Coding potential in areas of interest, and great Z-Score/Final Score. It can be concluded that this is a gene, and the start site is at 46503. Has synteny with gene 71 in Sonali and the start site lines up. /note= /note=Hypothetical protein because Sonali wis listed as hypotheical and they are in the same cluster and pham data. also because NCIB BLAST have % identity and alignment above 90%, unknown function CDS 46706 - 47275 /note=Start at 46,706 and glimmer score 11.8,z-score of 2.857 and final score of -2.681, has all coding potential, it is a gene /note=PhagesDB blast- unknown function not strong enough data to make another functional call /note=NCBI- hypothetical protein coverage of 91.5344% /note=PhagesDB has this stop of 47275 gene as #73 not #72 /note=PhagesDB and NCBI blast agree /note=HHPred: the e-values are poor, and coverage/probability are low, there is also no listed functions all are unknown. It is not strong enough to clearly make a call, hypothetical protein. CDS 47275 - 47673 /note=Glimmer and Genemark both agree on the start. It is the most annotated start and has all of the coding potential. The gene was similar to sonali on phagesdb Blast. CDS 47670 - 47918 /note=This is a gene based on the coding potential and length of the gene. The only issue is that there wasn`t any significant blast data in either phages DB or NIH, however there is strong coding potential. This gene is an Orpham (a gene which lacks a detectable homologous sequence in any other species, essentially meaning it appears to be unique to that specific organism and has no known similar gene in other lineages. Orphan genes, ORFans, or taxonomically restricted genes (TRGs) are genes that lack a detectable homologue outside of a given species or lineage. Most genes have known homologues). /note=The gene has 4 bp overlap with upstream gene (which is highly commendable) and a 112 gap with downstream gene (which isn`t great, however DNAmaster proves this to be a legitimate gap). /note= /note=Gene start site is 47670. Glimmer and geneMark both called it. It has the longest ORF, a z-score of 2.931 and a final score of -2.584. /note= /note=Based on the data of HHPRED (very low probablity) and the lack of DeepTMHMM (Transmembrane prediction), there is reason to believe these gene encodes for a protein with an unknown function. CDS 48029 - 48274 /note=Gene with a start site of 48029. /note=This start contains strong coding potential, the longest OFR, a high z-score (2.046), and a final score of (-4.411). While there is a large gap of 290 bp in between gene 75 and 76, there are no other possible starts due to stop codons displayed on genemark. Phagesdb blast displayed high similarites in Sonali, Sunshine, and Mufasa. CDS 48335 - 48679 /note=CHANGE MADE AND NEEDS REVIEW! /note= /note=Start site was changed to 48335. While strong coding evidence suggests the second start, there is subsequent evidence to say start one. Based on the starterator data in comparison to another known phage, Sonali, and phamerator map data compared Sonali and Sunshine, there is evidence of similarity. After changing the genome length and reblasting the sequence, Sonali had a 69% identical and low e-score of 2e-37, and while Sonali could be wrong, it did make it through the annotation process. This new start site also has the longest ORF and minimizes the gap from 138 bp to 60 bp. Due to the fact that phage genomes are highly compacted, the gap was minimized and start site was changed. CDS 48783 - 49256 /note=This is a 474bp gene with a start of 48783. This is the LORF with a z-score of 2.362 and final score of -3.747. There is also a 4 bp overlap with the downstream gene. Based on comparative blast data, there are high similarities in identities with Sunshine (96%) e score of 3e-84 and Sonali (95%) e-score of 2e-83. /note= /note= /note=Start: 48783 CDS 49253 - 49453 /note=ADDITIONAL REVIEW /note= /note=For starters, this is a gene. There are 201 bp and strong coding potential. Both Sunshine (96%; 3e-31) and /note=Sonali (67%; 4e-17) had blast similarities. /note=GeneMark and gimmer both call start at 49253. I don`t think the longest ORF here is the start site. While there is strong coding potential, the overlap would be far to great with the upstream gene (#78) Based on the guiding principles, gene overlap has a "maximum" of 30 bp overlap but typically ranges from 1-4. The final score for the start selected is more favorable (-2.505). CDS 49486 - 50049 /note=Coding potential, synteny, BLASTs supports start of 49486. LORF. Most common start. RBS sequence has good Z-score (3.158) and final score (-2.861). Need to look further into function because synteny suggests unkown function and HHpred /NCBI/CDD do not agree on function. Gene is suggested by NCBI to be a "hypothetical protein" and suggested by HHPRED to be a "Paratox; prophage, natural competence, Streptococcus, protein binding, VIRAL PROTEIN". There is not enough evidence to make a function call. Still needs further investigation. CDS 50046 - 50195 /note=This gene is likely to start at this sight because of the coding potential, LORF, and small gap. There is very little data about function, but the most likely function for this gene is a hypothetical protein due to the 100% coverage on NCBI blast for that function and the fact that the genes it has synteny with are also hypothetical proteins. CDS 50216 - 51058 /note=Coding potential, synteny, BLASTs support this gene with a start of 50216. Synteny suggests that this is a a helix-turn-helix DNA binding domain protein and PhagesDB, NCBI, and HHPRED also support this functional call. CDS 51055 - 51561 /note=Changed start. This gene needs to be investigated more due to inconclusive data on its function. Due to synteny, HHPRED calls it a hypothetical protein HOV09_gp82 [Arthrobacter phage Sonali], while other databases suggest its function to be tyrosine integrase, minor tail protein, or unknown function based on synteny. Phamorator shows other genes in synteny to be helix-turn-helix DNA binding domain protein. CDS 51562 - 51864 /note=This gene is likely to start and stop at these sites because of the Z score, lowest final score, LORF, and coding potential. Good coverage and probability for Putative Partitioning Protein and hypothetical protein. I do not see Putative Partitioning Protein on approved function list though. CDS 52182 - 52631 /note=HHPRED: Disulfide bond isomerase, DsbC, C-terminal domain /note=NCBI: hypothetical protein /note= /note=I believe this is a gene since it has sufficient base pairs (>120bp), has coding potential, matches with an E-value of 2e-50 with Sonali, and has synteny with gene 84 on Sonali in Phamerator. However, I don’t believe it starts at the site agreed upon by Glimmer and GeneMark, since it does not include all coding potential, Sonali calls for a start earlier than what was called, does not have the longest ORF, and the z-value for the start at 52164 is still decent, which is why I changed it to start at 52164. Function is unknown since it has <90% probability in HHpred with <80% coverage. Also, it has 100% coverage and a good e-value (1.13597e-51) with Sonali gene on NCBI Blast which is also a hypothetical protein. CDS 52619 - 52834 /note=I believe this is a gene since it has sufficient base pairs (>120bp), has coding potential, matches with an E-value of 4e-35 with Sonali, and has synteny with gene 85 on Sonali in Phamerator. However, I don’t believe it starts at the site agreed upon by Glimmer and GeneMark, since it does not include all coding potential, Sonali calls for a start earlier than what was called, does not have the longest ORF, and the z-value for the start at 52619 is better. This is why I changed the start site to 52619. Function is unknown since it has <90% probability in HHpred with <80% coverage. Also, it has 100% coverage and a good e-value (8.50912e-42) with Sonali gene on NCBI Blast which is also a hypothetical protein. CDS 52831 - 53424 /note=I believe this is a gene since it has sufficient base pairs (>120bp), has coding potential, matches with an E-value of e-105 with Sonali, and has synteny with gene 86 on Sonali in Phamerator. However, I don’t believe it starts at the site agreed upon by Glimmer and GeneMark, since Sonali calls for a start later than what was called, there is an overlap of 25 bp, the start codon of ATG is more likely, and the z-value for the start at 52831 is better. This is why I changed the start site to 52831. Function is likely a ParB-like dsDNA partitioning protein since it has 100% coverage and a good e-value (2.90386e-125) with Sonali gene on NCBI Blast which is also a ParB-like dsDNA partitioning protein, HHpred claims 99.5% probability with a ParB family protein; DNA-binding protein but the coverage is below 80% so I find the NCBI Blast to be more likely. CDS 53421 - 54311 /note=I believe this is a gene since it has sufficient base pairs (>120bp), includes all coding potential, matches with an E-value of e-171 with Sonali, and has synteny with gene 87 on Sonali in Phamerator. I believe it starts at the site agreed upon by Glimmer and GeneMark, since it lines up with Sonali, has a common start of GTG (26%), and has a decent Z-score (1.630) and final score (-5.303). Function is likely a Queosine tRNA-guanine transglycosylase protein since it has a probability above 90% and a coverage above 80% on HHpred. Also, it is well aligned and covered by the same type of gene in Sonali. /note= /note=a lot of evidence points to the function being: queuine tRNA-ribosyltransferase buut it is not on the list. CDS 54334 - 55038 /note=I believe this is a gene since it has sufficient base pairs (>120bp), includes all coding potential, matches with an E-value of e-118 with Sonali, and has synteny with gene 88 on Sonali in Phamerator. I believe it starts at the site agreed upon by Glimmer and GeneMark, since it lines up with Sonali, has a common start of ATG (68%), and has the best Z-score (2.868) and final score (-3.173). Function is likely a Queuosine biosynthesis protein since it has a probability above 90% and a coverage above 80% on HHpred. Also, it is well aligned and covered by the same type of gene in Sonali. CDS 55035 - 55436 /note=***DNA Master Calls this as Gene 90*** ***Phamerator calls this Gene 91*** /note= /note=The start site of 55035 contains all coding potential, is called by both Glimmer and Genemark, is the LORF and has good z-score numbers. It is long enough to be a gene with 402 bp. It contains all coding potential. BLAST result shows best similarity with Sonali. /note= /note=REVIEW FUNCTION /note=Likely function: QueD-like queosine biosynthesis protein or PreQ0 pathway queD-like protein CDS 55433 - 56188 /note=***DNA Masters calls this Gene 91*** /note= /note=The called start site seems to be correct. Both Glimmer and Genemark agree, it is the LORF, and has strong z-score numbers. It contains all coding potential, and choosing the next best start site would cut off a significant portion of the potential. Contains BLAST similarity with Sonali and Sunshine23 (though this is still a draft). /note= /note=Likely function: QueE-like queosine biosynthesis protein CDS 56188 - 56856 /note=***DNA Masters Gene 92*** /note= /note=REVIEW START SITE /note=I believe the called start site is correct as it is contains the best z-scores, is the LORF, and has other positives. However, the Genemark Data is a little questionable when determining if contains all coding potential, and Genemark shows the start as part of the previous gene. /note= /note=Function called as almost every BLAST result pulls the exact same function: GTP cyclohydrolase I. CDS complement (56922 - 57083) /note=NEEDS REVIEW /note= /note=I don`t believe Genemark is showing coding potential, thought it`s a little hard for me to read. The called start site also does not contain strong z-score numbers. Additionally, BLAST results only show similarity with Sonali. Is it possible this is not a gene? /note= /note=Ending decision: keep it as a gene! Because it overlaps with the upstream gene, it would close a gap and increase gene density. There isn`t as much evidence pointing to it not being a gene. CDS complement (57080 - 57274) /note=CHANGE MADE -- REVIEW? /note= /note=Start site moved to 57274. This still contains all the coding potential, but has stronger z-score numbers. I would still appreciate if someone reviewed this information. /note=Shares BLAST data with Sonali, but the alignment is not 1:1 /note= /note=(SECOND REVIEW - P2) /note=I also agree that the start site should be moved, And I am in agreement that 57274 is the most likely start site, but it does leave a large gap with subsequent genes. /note= /note=Potential Function: helix-turn-helix binding protein? CDS complement (57389 - 58384) /note=***DNA Master Gene 95*** /note=Start site confirmed with a strong z-score of 2.963, and a final score of -2.505. Contains high coding potential, and is the LORF. Phamerator shows site 62 which is the "most annotated" start among similar phages. Strong BLAST similarities with Mufasa8 and Sonali. /note= /note=According to Genemark, has high coding potential in region. /note= /note=Function could be hypothetical protein or a domain containing protein. CDS complement (58455 - 58715) /note=Only two potential start sites, with 58715 being the most likely due to its Z Score and Final score. Additionally, as it is the LORF, it closes a gap with subsequent genes. /note=Second start site is not possible due to length being only 33bp /note=High Coding potential according to Genemark. /note=This sequence is only shared with one other listed phage, but shares a start site with said phage. /note=No clear function, as function has not been identified even in phages with synteny. CDS complement (58788 - 59000) /note=Although this start site does not have a better Z score/ Final score than the predicted start site at 58940, this position has a very minute overlap with the subsequent gene, unlike the other proposed start site. Additionally, when this gene was blasted, this start site was conserved with Sonali/Sunshine23_Draft. /note= /note=After using Genemark, this gene has high coding potential but also has minute coding potential on forward strand. CDS complement (58997 - 59257) /note=Start site of 59257 is most likely, as it has the highest Z score, lowest final score, and Glimmer + Genemark agreement+ Longest ORF /note= /note=Genemark shows high coding potential for the area, and BLAST data suggests high level of synteny with Sonali at 93% /note= /note=Similar PHAGES also share a start site. CDS complement (59316 - 60098) /note=Glimmer and Genemark agreement on start site 60098. Since this gene connects to Gene 1 (A forward gene) the gap caused by this start placement is necessary due to this gene being on the reverse strand/ Maruru is Circularly Permitting. /note=Most similar genes share this start site, with the exception of one. /note= /note=Genemark shows High Coding potential for this region. /note=Z/Final Score are sufficiently high.