CDS complement (210 - 293) /note=This gene was created because the original gene was the only gene on the forward strand in the region. This gene is similar to genes in other phages in the family like DustyDino. HHPRED has one significant result for a membrane signaling protein. DEEP TMHMM indicates a membrane protein. Same gene as 113 CDS complement (371 - 529) /note=The start of the gene is the suggested start from Starterator which was best candidate out of all start options. On phages db and NCBI when blasting the amino acid sequence I got 100% coverage with the phage FORK. HHpred did not give any conclusive results for specifically what the protein function is. DeepTMHMM gave results for a potential membrane hypothetical protein. NOTE: No transmembrane domains, so changed to NKF. CDS complement (526 - 684) /note=The start of the gene is the suggested start from Starterator which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to Welcome, ASegato, and Musseta phages. HHpred had some high likely hood proteins but none of the matches were good enough to declare a protein. DeepTMHMM says the gene codes for inside the host. CDS complement (819 - 1262) /note=The start of the gene is the suggested start from Starterator which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to Welcome, ASegato, and Musseta phages. HHpred had some high likely hood proteins but none of the matches were good enough to declare a protein. DeepTMHMM says the gene codes for inside the host. CDS complement (1414 - 1710) /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to DustyDino, RunningBrook, and Erenyeager phages. HHpred had some low likely hood proteins but none of the matches were good enough to declare a protein. DeepTMHMM says the gene codes for inside the host. CDS complement (1714 - 2187) /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to ASegato, StevieWelch, and Necrophoxinus phages. HHpred had some one low likely hood proteins but none of the matches were good enough to declare a protein. DeepTMHMM says the gene codes for inside the host. CDS complement (2268 - 2654) /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to Musetta, StevieWelch, and Welcome phages. HHpred had some high likely hood proteins but none of the matches were good enough to declare a protein. DeepTMHMM says the gene codes for inside the host. CDS complement (2682 - 3029) /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to Musetta, Yuma, and DustyDino phages. HHpred had some high likely hood proteins one being "DUF3574 ; Protein of unknown function" Which had a probability of 99.9%. DeepTMHMM says the gene codes for inside the host. NOTE: This is a new function, but based on literature searches and HHPred hits (plus convo w/ SEA faculty), it appears to be a good annotation. CDS complement (3096 - 3374) /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to ASegato, Yuma, and Erenyeager phages. HHpred had some high likely hood proteins one being "DUF6926 ; Domain of unknown function" Which had a probability of 85.2% but none high enough to declare a function. DeepTMHMM says the gene codes for inside the host. CDS complement (3385 - 3540) /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to ASegato, Welcome, and Erenyeager phages. HHpred had some low likely hood proteins but none high enough to declare a function. DeepTMHMM says the gene codes for the membrane the host. NOTE: Prof changed to hypothetical b/c no transmembrane domain. CDS 4179 - 4331 /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to Fork, Musetta, and Erenyeager phages. HHpred had some high likely hood proteins but none high enough to declare a function. It appears to have receptor like characteristics. DeepTMHMM says the gene codes for the inside the host. CDS 4334 - 4639 /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to Musetta, Necrophoxinus, and Welcome phages. HHpred had some high likely hood proteins but none high enough to declare a function. DeepTMHMM says the gene codes for the inside the host. CDS 4639 - 4785 /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to Wolfstar, Hubbs, and Lupine phages. HHpred had some high likely hood proteins one being Zn-ribbon at 97.9% porabality with good coverage/e value. DeepTMHMM says the gene codes for the inside the host. NOTE: This could be DNA binding protein, but not much evidence beyond HHPred CDS 4785 - 4985 /note=This start sight includes the full coding potential for the region. Starerator indicated 5 manual annotations for the 4803 start sight and 3 for the chosen start sight. The 4785 start sight was preferred because of the longer reading frame, higher Z score and less negative final score. HHPRED did not indicate any regions above 90 percent probability for function. Both blasts indicate no known function for the gene. CDS 4982 - 5230 /note=I selected this start sight against the common consensus from Starerator. There are 11 manual annotations for the 4985 start sight and only 1 for the 4892 start sight. With all values as similar as they are I am preferring the slightly longer reading frame for this entry. In reality because both start codons are in the same reading frame next to each other I expect that sometimes transcription begins with the first and sometimes it begins with the second. HHPRED provided no hits above 90% probability or with a reasonably low E value to consider a function. Both NCBI and PhagesDB blast list the gene as hypothetical protein. There are no transmembrane domains. CDS 5227 - 5469 /note=This start sight is supported by two other manual annotations on staterator, includes all the coding potential and has the LORF. HHPRED has no hits with over 90% probability or 50% coverage to suggest a function. Both NCBI blast an PhagesDB blast show no know function. CDS 5603 - 5989 /note=The start for this gene has 11 manual annotations at this start sight. HHPRED suggests 8 different hits all above 90% probability and over 60% coverage for a ribosomal protein. Similar genes are expressed in the mitochondria of a protists, and the chloroplast of spinach. The highest coverage occurs in the bacterial species Cutibacterium acnes which is a gram positive, rod shaped, Actinobacteria somewhat similar to HollowPurple`s host Microbacterium Forliorum. While HHPRED gives a high number of high probability responses each has a relatively high E value. Both NCBI blast and PhagesDB blast suggest no known function but do not have perfect alignment with this phages gene. Based on the high number of high probability, high coverage hits all for 50S ribosomal protein L24 provided by HHPRED across a variety of species we conclude this gene actually probably is 50S ribosomal protein L24 despite the E values. The phages codes for 4 TRNAs so it is possible that the ribosomal subunit is better suited to fit these TRNAs. NOTE: Prof change function to "ribosomal protein, uL24" as per Official Functions List, based on HHPred results. CDS 5989 - 6228 /note=This start sight is confirmed by 8 manual annotations and includes all of the coding potential. I has a slightly better final score than the 5986 start and a more preferred start codon. Because they are in the same reading frame this gene may reasonably start at either but it is likely it often starts at the selected start sight. HHPRED provides no hits for function over 90% probability or 50% coverage. NCBI blast and PhagesDB blast indicate no known function. CDS 6225 - 6425 /note=This start sight is confirmed by 20 manual annotations, includes all the coding potential and has the best scores. HHPRED provides no hits for function over 90% probability though it does provide a relatively high coverage hit for a Walker A-type ATPase; ATPase, Chromosome segregation, Bacterial cell division, Hydrolase, DNA BINDING PROTEIN with 78.8% coverage. NCBI blast and PhagesDB blast indicate no known function. CDS 6418 - 6582 /note=This start sight avoids having a large gap or overlap and has the best Z and final score and is supported by 11 manual annotations. HHPRED provides no hits for function over 90% probability. It includes two related hits with high coverage for Spo0A_C ; Sporulation initiation factor Spo0A C terminal and SPO0A; RESPONSE REGULATOR, SIGNALING PROTEIN; 2.0A with 83% coverage and about 50% probability. It has no transmembrane domains and both blasts indicate no known function. CDS 6591 - 6863 /note=This start sight has the best scores all around and is confirmed by 11 manual annotations. HHPRED provides no hits for function above 90% probability or above 50% coverage. Both NCBI blast and Phages DB blast suggest no known function. CDS 6863 - 7039 /note=This start sight is confirmed by 6 manual annotations. It is preferred over the 6860 start because of it utilizes ATG and has a marginally better final score. HHPRED provides no hits for function above 90% probability and 50% coverage. Both NCBI blast and PhagesDB blast indicate no known function. CDS 7039 - 7254 /note=This gene was no originally included by phages DB but fills a large gap between the previous gene and the next gene. I has a region of coding potential and appropriate overlaps with adjacent genes. Phages DB blast and NCBI blast do not suggest any functions with low enough E values to be considered. HHpred provides 10 hits with high probability (between 87 and 93%) and varying coverage (between 14 and 71.8%) for a Thioredoxin domain-containing oxidoreductase radical SAM enzyme. CDS 7271 - 7435 /note=This start is confirmed by 10 manual annotations and has the best scores. After adding a gene before it the gaps and overlaps are appropriate. NCBI and phages DB blast both yield no indication of function. HHpred has no hits for function above the necessary thresholds. DEEPTMHMM indicates a transmembrane region on this protein. CDS 7435 - 7758 /note=This start sight is confirmed by 9 manual annotations. HHPRED provides no hits for function over 90% probability or 50% coverage. Both NCBI blast and PhagesDB blast indicate no known function. CDS 7760 - 7963 /note=This region has very low coding potential in the chosen reading frame but the region also makes the most sense for a gene to be present. The other reading frame that includes more coding potential would cause a significant overlap with the previous gene and a significant gap to the next gene. HHPRED gives no hits for function over 90% probability or 50% coverage. Both NCBI blast and PhagesDB blast indicate no known function. CDS 7960 - 8160 /note=This start sight was confirmed by 8 manual annotations. While HHPRED provides many borderline gene functions having near 90% probability and over 70% function, there is not a consistent trend in suggested function and the high associated E values indicate that it is likely by chance that these sequences aligned not because of real similarity. Both NCBI blast and PhagesDB blast indicate no know function. CDS 8717 - 9022 /note=Start site matches on Glimmer and genemark. Start site also gives the best z-score, final score, and spacer. A start site at 8606 would provide a longer gene, but the start codon for 8717 is ATG, which is more common than TTG. There is no known function other than hypothetical protein. There wasn`t enough coding potential or coverage in HHpred to assign a function. There is no transmembrane domain. CDS 9019 - 9279 /note=The chosen start site gives the best gap, z-score, and final score. The start codon GTG is less common than ATG; however, selecting a different start site would leave a shorter gene and a much larger gap. PhageDB blast brings up hits for DNA methylate/DNA methyltransferase. However, none of them are close enough to declare a function, and these are from different clusters. Upon closer look, DNA methyltransferase tends to be much longer than 261 BP. No ED2 has had this function declared. HHpred also brings up hits for aminopeptidase and apoptosis alternative splicing membrane mitochondrion. There is not enough probability to declare a function. At this time, this is a hypothetical protein. CDS 9310 - 9600 /note=The genemark and glimmmers tart sites match. The chosen start site gives the best length, gap, spacer, z-score, and final score. There is no known function, and this is a hypothetical protein. There is no transmembrane domains. All of the codin potential is included in the start and stop. CDS 9597 - 9824 /note=Glimmer and genemark start site match. Best length, Gap, Spacer, Z-score, and final score for the chosen start site. The entire coding potential is included in the start and stop sites. There are no transmembrane domains. HHpred brought up some close hits but did not have enough probability. No function could be identified but this gene may play a role in protein transport. CDS 9821 - 10072 /note=The start site was changed because length, gap, spacer, z-score, and final score were better at 9821. Wolfstar 30 had synteny with this gene and was called as a cyclic oligonucleotide sequestering protein. There were also hits in HHpred for this gene with high probability and coverage. The same hits were similar in Wolfstar 30. The forum https://seaphages.org/forums/topic/5687/?page=1#post-11125 gives good evidence as to what HHpred hits are important https://seaphages.org/forums/topic/5687/?page=1#post-11125. Alpha fold sequence was run. The alpha fold was mostly very high The HHpred indicated that there was Vs.4 and an inhibitor complex viral protein from Pseudomonas Phage. NCBI has good alignment with the cyclic oligonucleotide sequestering protein. CDS 10186 - 10440 /note=Startsite matches on glimmer and genemark. The chosen start site gives the best z-score, final score, and start codon. PhageDB blast indicates a NRDH-like Glutareodoxin. HHpred and NCBI bring up good hits on this as well. There are no transmembrane domains. CDS 10596 - 11933 /note=The start site chosen gives the longest gene, best gap, spacer, Z-score, and final score. There is a large gap, but no start codons would allow this gene to start sooner. On Pham maps, this was left as a gap. Phage DB shows this could either be a metallophosphoesterase or a phosphoesterase. The forum https://seaphages.org/forums/topic/5557/ goes over the requirements for a metallophospheosterase. This gene does not contain the HEXXH motif or an ExnHxHx7Sx2D as shown in this forum https://seaphages.org/forums/topic/5672/. in its amino acid sequence. Based off of this information this gene is most likely phosphoesterase. NOTE: Prof change to hypothetical (no HHPred hits to phosphoesterase) CDS 11938 - 12423 /note=Glimmer and Genemark give different start sites. The selected start site gives a longer gene and a smaller gap. Phage DB indicates a HNH endonuclease. The forum https://seaphages.org/forums/topic/5505/?page=1#post-10206 goes over the requirements for this function. There must be a H-N-H within 30 aa. The forum suggests a variation such as H-N-N is accepted. There is an H-N-N within 30 amino acids of each other. This starts at the first H in the amino acid sequence. HHpred has good probability for an H-N-H sequence and NCBI has good alignment. CDS 12528 - 13058 /note=Glimmer and genemark start sight match. It gives the best length, gap, spacer, z-score, and final score This is the suggested start sight on starterator There is no evidence that this has a known function. There is some hits on HHpred for RNA polymerase but there is not enough coverage to declare a function. There is no transmembrane domains. CDS 13055 - 14800 /note=Selected start sight gives the longest gene, best gap, spacer, z-score, and final score. Glimmer and Genemark start sites match. PhageDB indicates a terminase. Since no other terminase has been found Large and Small subunit are not differentiated. CDS 14862 - 16391 /note=Pham starterator shows there are 19 manual entries for 14862 start site. This is a gene shown through the Genemark sequence and the overlaps of around 60 BP. Genemark and Glimmer agree on this start site. Phages DB, NCBI Blast and HHPred have 100% probability that this is a portal protein, shown by comparison to at least four other phages in HHPred. CDS 16459 - 17400 /note=There are 30 Manual entries on Starterator for the start site 16459. Genemark and Glimmer also confirm that this is the start site. NCBIblast, Phages DB blast and HHPred confirm that the function is capsid maturation protease. HHPred has a 99.5% probability. CDS 17403 - 17855 /note=GeneMark and Glimmer agree on the start site of 17403. There are 25 manual annotations on starterator for the start site 17403. Phages DB and NCBI Blast agree that this gene is a hypothetical protein. HHPred has a 99% probability of an uncharacterized conserved protein, or a capsid stabilization protein. CDS 17848 - 19050 /note=Starterator shows 25 MA`s for the start site of 17848. Genemark and HHPred also agree on this start site. This is identified as a gene based on its base pair length and the identification in Genemark. NCBI Blast agrees that the function of this gene is major capsid protein. HHPred also presented these results with a 99% probability of this gene being a major capsid protein. So, we can be confident of its function. CDS 19236 - 19763 /note=There are 15 MA`s for the start site 19236 on Starterator. Genemark and Glimmer agree with this. This is identified as a gene based on the number of base pairs present. The gap seems rather large. NCBI Blast identifies this as a hypothetical protein with a 98% coverage. HHPred does not have any significant hits to identify this gene. CDS 19763 - 20110 /note=This start site of 19763 is confirmed by starterator with 18 MA`s. Glimmer and Genemark confirm this start site. HHPred and NCBI blast have different results for the function of the gene. **Take a closer look. NOTE: Prof review changed this to hypothetica – the hyaluronidase is a cool function, supported by HHPred (but low coverage), but not on our functions list – it could be some other kind of lyase or maybe a depolymerase of some kind. CDS 20110 - 20721 /note=There are 25 manual annotation of the start site 20110 in Starterator. Genemark and Glimmer both agree with this. This is identified as a gene based on its base pair length and the identification in Genemark HHPred did not have any significant hits above 76%. NCBI Blast had a 100% probability that the gene is a hypothetical protein. CDS 20718 - 22478 /note=There are 17 manual annotation of the start site 20718 in Starterator. Genemark and Glimmer both agree with this. This is identified as a gene based on its base pair length and the identification in Genemark. NCBI blast identifies this gene function as glycoside hydrolase with a probability of 99.82%. HHPred also presented these results with a 99.8% probability of this gene being glycoside hydrolase. CDS 22591 - 23040 /note=There are 23 manual annotation of the start site 22591 in Starterator. Genemark and Glimmer both agree with this. This is identified as a gene based on its base pair length and the identification in Genemark. NCBI blast identifies this gene function as a head-to-tail stopper with a probability of 100%. HHPred also presented these results with a 97% probability of this gene being a tail attachment protein. Additionally, there is a significant number of hits on HHPred that show no know function. One Hit shows the this is a minor tail protein which supports this identification because the head- to-tail stopper is considered a minor tail protein. Based on the results from both databases, we can determine that this is a head-to-tail stopper. CDS 23033 - 23482 /note=There are 25 manual annotation of the start site 23033 in Starterator. Genemark and Glimmer both agree with this. This is identified as a gene based on its base pair length and the identification in Genemark. NCBI blast identifies this gene function as NKF with a probability of 100%. HHPred presented results with a 98% probability of this gene being NKF. CDS 23482 - 23949 /note=There are 25 manual annotation of the start site 23482 in Starterator. Genemark and Glimmer both agree with this. This is identified as a gene based on its base pair length and the identification in Genemark. NCBI blast identifies this gene function as a tail terminator protein with a probability of 96%. HHPred also presented these results with a 97% probability of this gene being a phage tail protein. This supports the conclusion that this gene is a tail terminator protein. CDS 23973 - 24188 /note=There are 12 manual annotation of the start site of 23973 in Starterator. Genemark and Glimmer both agree with this. This is identified as a gene based on its base pair length and the identification in Genemark. NCBI blast identifies this gene function as a hypothetical protein with a probability of 100%. HHPred lacks any significant hits. This indicates that this gene is a hypothetical protein. CDS 24188 - 24748 /note=There are 25 manual annotation of the start site 24188 in Starterator. Genemark and Glimmer both agree with this. This is identified as a gene based on its base pair length and the identification in Genemark. NCBI blast identifies this gene function as major tail protein with a probability of 75% and a higher probability of hypothetical protein. HHPred presented a 82% probability of this gene being a major tail protein. Based on the evidence provided in each database, we can determine that this gene is a major tail protein. CDS 24767 - 25552 /note=blast search showed a possibility for a major tail protein, HHpred showed a low probability for it being an 85% probability. PECAAN showed both Gimmer and Genemark agrees score being 24767. Compared Hollowpurple with Dusty Dino on Phamerator and had similarities with Dusty Dino saying it was a Major tail protein. There is a little gap between gene 52 and 53. CDS 25617 - 26228 /note=Blast search shows to be a tail assembly chaperone with a good e-value. comparing it to DustyDino in phamerator it shows it to be similar to a Major tail Protein. HHpred shows about a 93% probability of it being a Tail assembly chaperone. PECAAN shows both Glimmar and Genemark agree. There`s a little gap between gene 53 and 54. CDS join(25617..26213,26213..26764) /note=Ribosomal slippage, CDS join (25617..26213,26213..26764) /note=https://seaphages.org/forums/forum/146/ CDS 26788 - 30639 /note=HHpred shows many hits being over 90% probability for it being a tape measure protein. Phamerator shows it to be a tape measure protein when compared to the genes 56 and 59 of DustyDino. PECAAN for Glimmer and GeneMark both agree. GeneMark shows it to have coding potential. There are 4 manual annotations of the start site being at 26788. There`s a small gap between genes 55 and 56. CDS 30639 - 31517 /note=Blast search shows it to potentially be a Minor tail protein. HHpred shows multiple hits being over 90% probability. HHpred doesn`t show it to have any significant probability of it being a minor tail protein. Glimmer and GeneMark both agree. GeneMark shows it to have coding potential. Phamerator shows it to be similar to DustyDino for it to be a Minor tail protein. There are 25 Manual annotation for the start site being 30639. CDS 31508 - 33133 /note=Blast search shows it to potentially be a minor tail protein. GeneMark and Glimmer both agree. GeneMark shows it to have coding potential. Shows Gene to be similar to DustyDino gene 61. HHpred shows multiple hits being over 90% probability. There are 24 Manual annotations for the start site being 31508. CDS 33130 - 33702 /note=Blast search shows it to have an NKF. Glimmer and GeneMark both agree. There are 25 Manual annotations for the start site being 33226. HHpred shows to have no significant hits. GeneMark shows coding potential. NOTE: Prof changed start to starterator suggested-start w/ better gap & scores. CDS 33699 - 34316 /note=GeneMark shows it to have coding potential. There are 25 Manual annotations for the start site being 33699. HHpred shows no significant hits. Blast search showed NKF. Glimmer and GeneMark both agree. CDS 34354 - 34920 /note=Genemark shows it to have coding potential. Glimmer and GeneMark both agree. There are 11 Manual annotations for the start site being 34354. Blast search shows that the protein has NKF. HHpred shows no significant hits. CDS 34917 - 35849 /note=Genemark shows it to have coding potential. Glimmer and GeneMark both agree. There are 17 Manual annotations for the start site being 34917. Blast searched showed potential for endolysin. HHpred shows significat hits being over 90% probability. In phamerator DustyDino shows similarities with Hollowpurple for genes 62 and 65. NOTE: Prof accepted endolysin but noted lack of HHPred hit. CDS 35849 - 36127 /note=Genemark shows coding potential. Glimmer and GeneMark both agree. There are 11 Manual annotations for the start site being 35849. Blast search shows potential for it being a tail needle protein. HHpred shows many significant hits being over a 90% probability. In phamerator, DustyDino shows similarities with Hollowpurple for genes 63 and 66. CDS 36139 - 36378 /note=GeneMark shows it to have coding potential. Glimmer and GeneMark both agree on the start site. There are 19 Manual annotations for the start site being 36139. Blast search shows it to have a NKF. HHpred shows it to have no significant hits. deepTMHMM prediction shows probability for it to be a Membrane protein CDS complement (36430 - 36855) /note=GeneMark shows it to have coding potential. Glimmer and GeneMark do not agree. There are 2 manual annotations for the start site being 36855. Blast search shows it to potentially be RuvC-like resolvase. HHpred shows it to have many significant hits. In phamerator DustyDino shows similarities with Hollowpurple for genes 65 and 68. CDS complement (36863 - 37444) /note=start site has the smallest gap without overlap, and is not the LORF and also has minimal coding potential, but has the most likely gene combination, and al of these combined makes this start point most likely. There is nothing with high enough probability or coverage from HHPred, and every comparable phage in NCBI has hypothetical protein up to 100% probability and coverage. There is no membrane possibility. CDS complement (37470 - 37844) /note=The glimmer start point has a slight overlap but is the LORF, has all of the coding potential, and is a good length. There is not enough probability for any of the HHPred matches for a probable function. every gene in the NCBI blast shows for a hypothetical protein, with up to 100% coverage, identity, and alignment. There is no transmembrane results. CDS complement (37834 - 38043) /note=The glimmer start point has a small overlap, but has the most coding potential out of the other potential start points. the start codon is also the most likely codon, and the length of the gene is a good length. There is a high probability (97%) with high coverage (95%) of a YorP protein from HHPred. There are two additional YorP proteins with high probability from HHPred. All phages from NCBi blast are hypothetical proteins with up to 100% match and coverage. The DeepTMHMM showed no evidence. There was no evidence found for a YorP protein for phages, so hypothetical protein is what we went with. CDS complement (38040 - 38438) /note=The suggested start from glimmer has the most coding capacity available for the reading frame. It has a small overlap but any other gene has too big of a gap. The length is a good length for a probable gene. There are multiple YorP genes on HHPred with high probability but low coverage. All genes from NCBI have hypothetical protein functions. Deep TMHMM had no results. Coverage on HHPred seemed too low to be viable. CDS complement (38435 - 40966) /note=The glimmer start point has the smallest gap, the most coding potential, and a good size for a gene, but is not the LORF and has a less likely start codon. I believe that the starterator start point (10) is the best start point with 20 MAs. In the NCBI blast, the function DNA primase/polymerase had a 99% probability and very high coverage rate, making it the most compelling option. NOTE for reviewers: Poor HHPred coverage. CDS complement (41355 - 41972) /note=The glimmer start site has the lowest gap, a good number of base pairs, all of the possible coding potential, and has a likely start codon. It does not have the LORF. Hypothetical proteins are the most matched on NCBI. Probability on HHPred shows it could be DNA binding with a 9.6% probability, but the coverage is at 65%. There is no transmembrane possibility. I went with the hypothetical protein due to more frequent matches and overall higher coverage and probability combined. CDS complement (41986 - 42171) /note=The glimmer suggested start has the smallest comfortable gap without severe overlap, has a decent gene length and all the coding function. It does not have the LORF and it has an uncommon start codon. No matches on HHPred had a high enough coverage or probability to be strongly considered, but NCBI blast showed many hypothetical protein matches with probability and coverage up to 100%. There is no transmembrane potential. CDS complement (42149 - 43009) /note=The glimmer suggested start has the smallest possible gap, all coding potential, the LORF, and a good length for a gene. NCBI shows a 100% match for exonuclease protein function. NOTE: Prof review... it also contains helicase domain, so changing to CDS complement (43012 - 43716) /note=The glimmer start point has the LORF, all of the coding potential, a small amount of overlap, and a decent length for a gene. In NCBI blast, there are high probability matches for hypothetical proteins and ERF family SSdna binding protein. We went with the ssDNA protein because compared to the phage Welcome, the genes lined up to the point we considered it good evidence. Others are calling this an erf Family ssDNA binding protein, but it is not an available function. NOTE: Prof changed (from SSB) due to strong ERF family hit and similarities to other phages CDS complement (43713 - 44042) /note=The glimmer start has the shortest gap, a decent length for a gene, the LORF, all of the coding potential, and has a high probability start codon. Every comparison in NCBI blast is a hypothetical protein, with high coverage and probability. Nothing in HHPred is high enough coverage or probability to be of note. Deep TMHM shows no membrane protein possibility. CDS complement (44039 - 44770) /note=The glimmer start has all of the coding function, the most likely start codon, and a decent gene length, but does not have the smallest gap or the LORF. The glimmer start is the most likely start point but 44785 is another potential start. There is a 99% probability and coverage for an exonuclease function from NCBI blast. HHPred has possibility for a crispr gene, but the coverage from NCBI seems more likely and more reliable. There are other function possibilities on NCBI, but none of them have high enough probability to be considered. CDS complement (44831 - 44992) /note=The glimmer start has the smallest gap, the most probable gene length, the LORF, a common start codon, and all the coding. None of the possibilities from HHPred have high enough coverage or probability to be viable. There are many likely matched from NCBI that read as hypothetical proteins. There is no membrane domain found. CDS complement (44985 - 45308) /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to Fork, Musetta, and Necrophoxinus phages. HHpred had some low likely hood proteins but none high enough to declare a function. DeepTMHMM says the gene codes for the inside the host. CDS complement (45305 - 46612) /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to Fork, Musetta, and StevieWelch phages which had e values of 0. HHpred had some very high likely hood proteins RNA ligase hitting a 100% probability with 94% coverage. DeepTMHMM says the gene codes for the inside the host. CDS complement (46629 - 47549) /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to ASegato, Musetta, and Jacko phages which had e values of basically 0. HHpred had some very high likely hood proteins polynucleotide kinase hitting a 100% probability with 98.69% coverage. DeepTMHMM says the gene codes for the inside the host. CDS complement (47546 - 48211) /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to ASegato, Musetta, and Lyell phages which had e values of basically 0. HHpred had some very high likely hood proteins Ectoine hydroxylase hitting a 94.7% probability with 72.4% coverage with many more high probabilty hits for oxidoreductase. HHpred probabilities were high for oxidoreductase I decided to call this gene oxidoreductase. DeepTMHMM says the gene codes for the inside the host. NOTE from prof to the reviewer: This function is flagged on the Official Functions List – please review. CDS complement (48208 - 48567) /note=The start of the gene is the suggested start from Starterator/glimmer/genemark which was best candidate out of all start options. On ncbi/phages db blast, the gene had a very similar gene to Lyell, DustyDino, and StevieWelch phages. HHpred had some low likely hood proteins but none high enough to declare a function. DeepTMHMM says the gene codes for the inside the host. CDS complement (48567 - 50087) /note=There are 42 manual annotations of the start site 50087 in Starterator. Genemark and Glimmer both agree with this. This is identified as a gene based on its base pair length and the identification in Genemark. NCBI blast identifies this gene function as DNA Helicase with a probability of 99%. HHPred provided results with a 100% probability of this gene being a DNA binding protein and transcription regulatory protein. This evidence supports the function of this gene as DNA Helicase. CDS complement (50084 - 50335) /note=There are 25 manual annotation of the start site of 50335 in Starterator. Genemark and Glimmer both agree with this. This is identified as a gene based on its base pair length and the identification in Genemark. NCBI blast identifies this gene function as a hypothetical protein with a probability of 100%. HHPred also presented these results with a 97% probability of this gene having no known function with a high coverage. CDS complement (50381 - 50560) /note=There are 25 manual annotation of the start site of 50560 in Starterator. Genemark and Glimmer both agree with this. This is identified as a gene based on its base pair length and the identification in Genemark. NCBI blast identifies this gene function as hypothetical protein with a probability of 100% with 100% coverage. HHPred did not have any significant hits, but it presented one result with 87% probability of this gene being a hypothetical protein. Based on the evidence provided by these databases, we can determine that this gene is a hypothetical protein. CDS complement (50576 - 50773) /note=There are 24 manual annotation of the start site 50773 in Starterator. Genemark and Glimmer both agree with this. This is identified as a gene based on its base pair length and the identification in Genemark. NCBI blast identifies this gene function as a hypothetical protein with a probability of 100% with a high coverage. HHPred did not have any significant hits. CDS complement (50773 - 50934) /note=The suggested start on glimmer and genemark were the same as the starterator start. It has the LORF, all the coding potential, the smallest gap with little overlap. The length is a good gene length as well. Every similar phage has called this as a hypothetical protein, and there are no deep TMHMM membrane indications. HPred has high probability and coverage for DNA ligase, but none of them have coverage or probability above 85%. CDS complement (50931 - 51146) /note=This start is confirmed by 25 manual annotations and has the best scores. Phages DB and NCBI blast both indicate no know function. HHPred provides high probability high coverage hits for three different functions: HEPN toxins, a structural regulatory protein or a Nucleotidyltransferase substrate binding protein. CDS complement (51221 - 51568) /note=The start sight on Glimmer and genemark match. SS on the starterator was also 51568. There were 53 manual annotations for this. There is not hits for a known function in phagedb. HHpred gives some evidence for a secreted protein. However, there is no secreted protein on known functions list. No evidence can be found to further establish this as a function. However, there is a transmembrane domain. NOTE: Prof does not see transmembrane domain so changed to hypothetical. CDS complement (51565 - 51801) /note=The genemark and glimmer start sight match. Chosen start sight has the shortest gap, best spacer, z-score, and final score. There is no known function. There are som e hits on HHpred but none that indicate a function. There is a transmembrane domain. CDS complement (51786 - 51866) /note=Three other manual annotations have selected the 51908 start but the 51866 start has a much more reasonable overlap and a better final and z score. While HHPred does not provide any hits for function over the necessary probability there are many hits over the 50% coverage limit with relatively high probability the indicate function as membrane protein. DEEPMTHMM indicates that it is a membrane bound globular+SP protein. CDS complement (51863 - 52066) /note=This start is confirmed by 11 manual annotations and has the best scores. NCBI blast, Phages DB, and HHpred all fail to provide evidence for a function. CDS complement (52084 - 52290) /note=There is only one other manual annotation and it is for the 52290 start but the 52254 has much better z and final scores. NCBI blast and Phages DB blast both fail to indicate a function. HHpred does not have any functions above 90% probability but it has multiple hits with relatively high probability for two functions: etrahydrobiopterin biosynthesis enzymes-like proteins, and 50S ribosomal protein L1. NOTE: Prof changed start site to 52290 (smaller gap). CDS complement (52290 - 52430) /note=This start is confirmed by 17 manual annotations and has the best scores. NCBI blast, Phages DB blast and HHpred all fail to provide an indication of function. CDS complement (52515 - 52685) /note=This start is confirmed by 11 manual annotations and has the best scores. NCBI blast, Phages DB blast and HHpred all fail to provide an indication of function. CDS complement (52719 - 53048) /note=The glimmer and genemark suggested starts had the smallest gap with no overlap, a good gene length, and all the potential coding potential. This start also had the most manual annotations, even though there are other starts with many MAs, I think that this one is the most probable. HPred has no potential matches that have a high enough probability and coverage to be considered. Every good match from NCBI is a hypothetical protein. Deep TMHMM has no membrane protein probability. CDS complement (53051 - 53173) /note=the suggested start has the LORF, the smallest gap with only a little bit of overlap, and a good gene length. There is very little coding potential, but the area does have all of the coding potential. HHPred did not have any potential matches with high enough probability and coverage, and all NCBI blast calls were hypothetical proteins. Deep TMHMM has no potential for membrane functions. CDS complement (53170 - 53376) /note=The genemark and glimmer suggested start has the LORF, the smallest gap with little overlap, a good gene length, and all the coding potential. Every similar phage from NCBI blast is classified as a hypothetical protein, and there are no matches on HHPred that is reasonable enough for a potential call. There is no membrane evidence from Deep TMHMM. CDS complement (53373 - 53633) /note=the genemark and glimmer suggested start has all of the coding potential, the smallest gap, a good gene length, and a common start codon. No possibilities on HHPred have both high enough coverage and high enough probability to be considered. Every possible match on NCBI blast is labeled as a hypothetical protein. There is no membrane possibility presented by Deep TMHMM. CDS complement (53642 - 54202) /note=There is the most manual annotations for the start sight at 54, 202. The blast on phageDB shows that this is a helix-turn-helix-binding protein. Fork, Welcome, and Musetta show synteny with hollow purple for this function. They all have 186 amino acid pairs. This gene has the most synteny with Asegato with is a DNA binding domain protein. There are more hits for a helix-turn-binding protein. NOTE: Prof changed to hypothetical b/c lack of HHPred support CDS complement (54202 - 54390) /note=Based of off the information shown this gene has a good start and stop and good gap and z score. But based off of the information shown in NCBI blast this is a hypothetical protein because it shows a 100 percent coverage and the highest probability on HHpred shows a 63% chance of being a Pre-mRNA-processing factor 17 so it doesn`t have a good enough probability to call it that so i decided on hypothetical protein. CDS complement (54387 - 54710) /note=Based of off the information the gene has a good start site and stop site and a good z score and gap and there isn`t any defining information to determine what it is from what i looked at it is a hypothetical protein because it has 100 percent coverage on NCBI blast and Hhpred`s highest probability is 79 percent so not a high enough probability to call it that so i decided on hypothetical protein. CDS complement (54743 - 54997) /note=This gene shows good information but the start and stop site are both good as well as the Z-score and it doesn`t have a very large gap. There isn`t good enough information available to call it anything in particular the highest probability it has is 60 percent for a domain of unknown function so but NCBI blast shows 98 percent chance for hypothetical protein. so with that information i decided to call it a hypothetical protein. CDS complement (55088 - 55906) /note=Based off of the information from Blast, and HHpred there was no function they highest probability was for 69% percent that it was a structural protein but i wasn`t confident enough to determine that. based of the information provided to me. On NCBI blast it says that it has a 97% chance of being a hypothetical protein so that is why i am calling it a hypothetical protein CDS complement (55926 - 56141) /note=Based off of the information i was able to determine that it was a Hypothetical protein because on NCBI blast said that it was a 97% chance of it being a Hypothetical Protein. But there was something i marked as evidence that said that it could be a Nickel Responsive Protein but it was 80 percent so it was close but not close enough to feel confident about calling it that. CDS complement (56138 - 56575) /note=Based off of the information presented from HHpred and Blast there was No known function it has a chance of being a hypothetical protein but i just decided to say that there was no known function because blast`s highest percentage was 30% so i felt confident enough to say NKF CDS complement (56626 - 57018) /note=Based off of the information from blast and HHpred and and NCBI blast i was unsure on what to call this because on hhpred it has a 92.5 percent chance of being a SH3-like barrel transitional protein but i wasn`t sure what that was or couldn`t find any information about that cause everything else is saying hypothetical protein so could use further review but i wasn`t sure so i just called it a hypothetical protein. CDS complement (57052 - 57312) /note=based off of the information on hhpred and blast it was showing that there was no function present the highest probability it has was 33 percent for an uncharacterized protein which isn`t high enough so i just decided to say no known function. CDS complement (57309 - 57683) /note=based off of the information present i was pretty confident to call this NKF based off of the information from hhpred and the blasts it had a 73 percent chance of being a tail sheath protein but i wasn`t confident enough to make that assumption so i just called it NKF CDS complement (57716 - 58171) /note=Based from the information present from Blast, NCBI blast, and HHpred that this is a hypothetical protein. HHpred doesn`t show anything promising enough to decide it, so i decided to call it a hypothetical protein because NCBI blast says it is a Hypothetical protein so i felt confident enough in that. CDS complement (58227 - 58715) /note=The information for this gene is very interesting. Based off of the information shown in blast, NCBI blast, and HHpred showed peculiar information HHpred showed that it had a 93 percent chance of being a YorP protein but its coverage was too low so i wasnt confident enough to call it that. Instead i decided to call it a hypothetical protein because blast showed hypothetical protein and i was not confident enough to call it anything else based on the information presented to me. CDS complement (58822 - 59148) /note=From the information that i gathered about this gene is that it is a hypothetical protein and i decided that based off of the information present in blast, NCBI blast, and HHpred. HHpred says that it has a 70 percent chance of being a DE NOVO protein and 40% percent coverage on it. but I wasn`t confident enough to say that this was what it was so i decided to call it a hypothetical protein. CDS complement (59145 - 59324) /note=This gene is very interesting because it is an Orpham gene or it is an Orphan gene so there wasnt much information to go off of but on HHpred there was one that was interesting to me it has a 69% probabiity of being EcoliA0-Like inhibitor or immunity protein and i brought this up because it has a 78% coverage so i thought it might be considered but there wasnt enough information present to call it that for sure. CDS complement (59321 - 59662) /note=Based off of the information shown in hhpred, and both blasts, this gene doesnt have any very promising information at all so that is why i was pretty confident in saying it NKF the highest probablity it had was 30 percent so i decided to just NKF cause there wasnt much info to go off of. CDS complement (60283 - 60366) /note=This gene was created because the original gene was the only gene on the forward strand in the region. This gene is similar to genes in other phages in the family like DustyDino. HHPRED has one significant result for a membrane signaling protein. DEEP TMHMM indicates a membrane protein. Same Gene as gene 1. The next 10 genes are a repeat of genes 1-10 CDS complement (60444 - 60602) /note=Based of of the information provided by both blasts and HHpred. there wasn`t much information about these so i just decided to call it a hypothetical protein based of of NCBI blast because HHpred showed a 50 percent chance of being KBTB_W-LIR which obviously isn`t very high to call it that so I decided that it was a hypothetical protein. CDS complement (60599 - 60757) /note=Glimmer and GeneMark had different suggested start sites, the Glimmer start site was chosen because it has the best Z score and was also the most annotated as shown on pham starterator. /note=HHPred showed convincing evidence for this to be a ribbon-helix-helix DNA binding domain, and there was synteny seen with the phage "Welcome", which also called this gene a ribbon-helix-helix DNA binding domain. NOTE: Prof changed to hypothetical due to lack of HHPred support (see also gp3) CDS complement (60892 - 61272) /note=Glimmer and GeneMark disagreed on the start site. I decided to go with the GeneMark start site because it has an ATG start codon, and it is also the only one that has been annotated in this cluster. It doesn`t have the best Z score, but I believe it makes the most sense. /note=There was no evidence pointing toward any particular function for this gene, so it was just called a hypothetical protein. DeepTMHMM showed that this protein most likely would operate inside, so it could not be called a membrane protein. CDS complement (61487 - 61783) /note=The recommended start site was the same for Glimmer, GeneMark, and Pham Starterator. It has the highest Z score, longest open reading frame, a reasonable spacer, and an ATG start codon. /note=There was no evidence to point toward any known function for this protein, so it was just called a hypothetical protein. DeepTMHMM showed that this protein most likely operated inside, so it could not be called a membrane protein. CDS complement (61787 - 62260) /note=The suggested start site was the same for Glimmer and GeneMark. It was also the most annotated start. On GeneMark, it shows that the coding potential is really high near the start site. However, it doesn`t have the best Z score compared to other start site options. I still ended up choosing the recommended start site to include all the high coding potential. /note=There were no hits on HHPred, so its safe to put this as a hypothetical protein. The protein has synteny with Asegado, Necrophoxinus, and StevieWelch, which all listed this as hypothetical. Deep TMHMM showed that this protein most likely operates inside, so it cannot be listed as a membrane protein. CDS complement (62341 - 62727) /note=Glimmer and GeneMark had the same recommended start site. It is also the most annotated start. It doesn`t have the best Z score that is available, but there is coding potential right at the start, so I went with the suggested start. /note=There was one hit on HHPred for a YajC ; Preprotein translocase subunit that was almost statistically significant, but not enough to call this something. CDS complement (62755 - 63102) /note=Glimmer and GeneMark agreed on the start site, there`s coding potential throughout the gene, and the recommended start site was the most annotated, so I just went with that one. HHpred provides two hits with high probability and coverage for SAM lyase, a protein involved in epigenetic regulation. CDS complement (63169 - 63447) /note=Glimmer and GeneMark had the same suggested start site, and it was also the most annotated start site. It also had the best Z score, so I chose the recommended. /note=There was no evidence for any known function for this protein. No other phages called it anything, HHPred didn`t have any significant hits, and DeepTMHMM showed that this most likely operates inside. CDS complement (63458 - 63613) /note=Glimmer and Genemark gave the same suggested start, which was also the most annotated start. It has the best Z score as well, so I went with the suggested. /note=There was no evidence towards a specific function, but DeepTMHMM gave some interesting results. However, there was no significant evidence in HHPred, all other phages with synteny just called this a hypothetical protein.