CDS 147 - 362 /note=This is a gene due to its length and coding potential. This start site was chosen because it gives a good length for the gene, encompasses the full coding potential, has good gaps, spacers, z, and final scores, and is the manually entered start site for 45 manual entries. No other start sites were chosen for manual entry. This gene has no known function because it does not have a good match on HHPred, and no phage in the phages db has a listed function for this gene. DeepTMHHM indicates that this is not a membrane protein. CDS 362 - 685 /note=Both Glimmer start and Genemark agree that the start point is 362, this gene does have coding potential and it is long enough to be a gene looking at the results obtained from genemark. I chose the suggested start because the length is more than 120base pairs and the gap is less than 50. It does have similar phages which are Crozenni, Albright, Abigail, Johnathan, and Limabean it has more similar phages but these are the main ones I choose from the results of phagesdb. It is the only gene in this region and it does have a depicted function Terminase small subunit. CDS 689 - 2317 /note=Both Glimmer and GeneMark state the starting base pair (bp) begins at 689 and the Starterator has 30 manual annotations saying this is the starting point for this gene. I say it`s considered a gene due to having high coding potential in GeneMark, it being over 120 bp (it`s 1629 bp), the gap, spacer, z-score, and final score all indicate it is a gene plus, it`s the longest open reading frame (LORF). In both PhagesDB and HHpred, they have very high percentages stating these gene has a function of a terminase, large subunit. It`s most aligned with the other Bacteriophages in the EB class such as Abigail, LimaBena, and Albright. They also believe this specific gene is a terminase, large subunit. CDS 2329 - 4353 /note=I believe this is a gene. Bengal Phage Gene 4 has a Glimmer start position at 2329, that is also supported by GeneMark. This gene goes in the forward direction and it`s length is 2025 BP. Gene 4 has an ATG start codon.. There is a gap of 11 BP, which is smaller than 50BP. On Phages DB Blast and Phamerator it matched up by genes with LimaBean_4, SansAfet_4, Arroyo_5. HHPred showed a 92% probability. Also the function of Gene 4 is most likely a portal protein, Phagesdb shows a 64% frequency match. CDS 4353 - 5105 /note=Gene 5 of the Bengal phage has a Glimmer-predicted start at position 4353, which is also supported by GeneMark. The gene is not reversed, and it spans a total length of 753 base pairs, with an ATG start codon and a stop codon at position 5105. There is a small gap of -1, indicating a slight overlap with the previous gene. The gene is present in other phages, including Hitchhiker, and shares homology with numerous genes in the EB cluster on protein number 5. HHPRED analysis shows a 99.8% probability, strongly suggesting that the gene encodes a capsid maturation protease, which is an essential protein involved in viral assembly. CDS 5142 - 6323 /note=Gene: Bengal_6 Start: 5142, Stop: 6323, Start Num: 44 /note=Candidate Starts for Bengal_6: /note=(35, 5112), (Start: 44 @5142 has 153 MA`s), (59, 5202), (81, 5319), (105, 5403), (135, 5496), (160, /note=5586), (166, 5613), (191, 5748), (199, 5805), (219, 5955), (246, 6102), (249, 6114), (279, 6300), /note= /note=Start 5142 has the most MAs. The 36 bp gap lacks coding potential and is <50 bp. Length, spacer, z-score all within range. HHPRED confirm major capsid protein function , similar to cluster members. DeepTHMM suggest inside and outside membrane topology. CDS 6389 - 6730 /note=This is a gene because it is the correct length, and it has good coding potential. This is the selected start site because it encompasses all of the coding potential, gives the longest coding length, and has the best gap, spacer, z, and final scores. The gap is a bit larger than we would like, but all other start sites give larger gaps. It also is an ATG start site which is the most likely start codon. This is the start site chosen for other phages (41 MAs). This gene has no known function, it does not match anything on HHPred, and it is not a membrane protein according to DeepTMHMM. No other phages have this gene listed with a function. CDS 6765 - 7325 /note=Glimmer and Genemark start agree that 6765 is the start point and I chose the suggested start because I clarified with the starterator. I accepted the Gene candidates because they had a start codon of ATG, a good length above 120, a gap greater than 30 good, and a Z-score. This gene has similar phages according to the results from phages DB as well as HHpred, they show that similar phages are Crozenni, Albright, Johnathan, Arroyo, LimaBean, and Avocadoman. It is the only gene in that region and it does have a function which is a head-to-tail adaptor. CDS 7325 - 7654 /note=Both Glimmer and GeneMark have the same starting point at 7,325 base pairs (bp). It is long enough to be a gene since it is 330 bp in length, which is over 120 bp. Looking at GeneMark, I can see that this gene has a ton of coding potential with high marks in the area of 7300 to 7600 bp. This gene has been identified in other EB class bacteriophages such as Albright, CroZeeni, DickRichards, TukTuk, and Eula. Based on the findings in GeneMark, I believe this is the only gene in the area. The predicted function I was able to identify with the evidence from PhagesDB, HHPRED, and NCBI blast, this gene`s function is a head-to-tail stopper. CDS 7654 - 7911 /note=I believe this is a gene. Bengal Phage Gene 10 has a Glimmer start position at 7654, that is also supported by GeneMark. Gene 10 has an ATG start codon and it`s total length is 258 BP. There is a gap of -13 BP, which means that it might have some overlap with the previous gene, but I think we are ok because it is less than 30 BP. This gene is present in other genes including: SansAfet, LimaBean, and Arroyo. CDS 7911 - 8306 /note=Phage gene 11 starts at position 7911, as predicted by both Glimmer and GeneMark, and terminates at position 8306, resulting in a total length of 396 base pairs. It is a forward-oriented gene with an ATG start codon and has a gap of -1 relative to the preceding gene. HHPRED analysis identifies this gene with a 99.8% probability as encoding a tail terminator protein. Comparative analysis in PhagesDB reveals numerous homologous genes within the same cluster and at the same genomic location, consistently corresponding to protein number 11 in other phages. CDS 8320 - 9117 /note=Gene 12 in the Bengal phage has both Glimmer and GeneMark predicting its start at position 8320. It is a forward gene with a total length of 798 base pairs, ending at position 9117. The gene has a gap of 13 base pairs before it, and it utilizes an ATG start codon. HHPRED analysis confirms a 100% probability, identifying the gene as encoding a major tail protein. This function is further supported by PhagesDB, which shows a large number of homologous genes across different phages with the same function. The major tail protein plays a crucial role in the structure and infection process of the phage. CDS 9243 - 9551 /note=This is a gene because it is the right length and it has good coding potential. This is the chosen start site because while it has a very large gap, this is the start site that gives the smallest gap and greatest coding potential. There also need to be a start site here/gene because an enormous gap would be left between the previous gene and the next gene. This is the start site manually entered for 86 other phages. This is a tail assembly chaperone gene because it matched highly on HHPred for this function and because many other phages including Albright and Crozenni (closely related phages) call this gene a tail assembly chaperone. CDS join(9243..9521,9521..9934) /note=Neither Glimmer nor Genemark shows the start site, it has coding potential with a length greater than 120bp /note=a slippery nucleotide: 9520; CDS complement (9958 - 10086) /note=Based on both Glimmer, GeneMark, and Starterator (5 manual annotations), this gene begins at 10086 base pairs (bp). Within looking at what the gene includes, it fits all the characteristics of a gene other than the z-score not being over 2. It appears on GeneMark, that there is some coding potential and the Bacteriophages in its EB class that are most closely aligned with this gene are Albright, CroZenni, and SansAfet. This gene wasn`t categorized to have a function and looking at HHpred it says it could be a high chance of being an early endosome antigen, but it doesn`t have a high enough percentage of coverage. So, with that being said, I state that this gene`s function is a hypothetical protein. CDS 10285 - 13044 /note=Bengal gene 16 has a Glimmer-predicted start at position 10285, which is also supported by GeneMark. The gene is going in the forward position, and it spans a total length of 258 base pairs, with an ATG start codon and a stop codon at position 13044. My concern is that the gap between the previous gene is 198 bp. The typical gap should be less than 50 bp. There was a match with LimaBean, AvocadoMan, and Abigail. HHPred does say that it believes this is a gene. /note= /note=This start site encompasses the full coding potential and gives the smallest gap possible. Other scores are good for this. This is a tape measure protein because it matches well with tape measure protein on HHPred and other phages. It is also fairly long which is expected for a tape measure protein. CDS 13041 - 13895 /note=The gene located at position 13041 was identified by both Glimmer and GeneMark. This gene has a length of 855 base pairs and is transcribed in the forward direction. The gap of -4 indicates an overlap with the preceding gene. The start codon is ATG, and the stop site is at 13895. HHPRED analysis shows a 100% probability for a distal(minor) tail protein function. PhageDB confirms the presence of homologous genes in the same location in other phage with minor tail protein function. CDS 13895 - 15865 /note=Gene: Bengal_18 Start: 13895, Stop: 15865, Start Num: 21 /note=Candidate Starts for Bengal_18: /note=(Start: 21 @13895 has 24 MA`s), (33, 14141), (35, 14195), (37, 14258), (39, 14315), (40, 14327), (51, /note=14633), (52, 14666), (56, 14828), (59, 14894), (62, 14936), (67, 14993), (73, 15155), (74, 15167), (78, /note=15254), (79, 15269), (88, 15539), (89, 15569), (91, 15614), (94, 15644), (99, 15701), (101, 15734), /note=(109, 15833), (110, 15839) /note= /note=Start 13895 has most MAs. Length, gap, spacer, z-score, final score and start codon all within range. Longest ORF. HHPRED and NCBI Blast suggest Minor tail protein function, similar to other cluster members. DeepTMHMM suggest outside topology. CDS 15858 - 16028 /note=There were some concerns as to whether or not this is a gene. It has a small amount of coding potential, and it is long enough to be a gene. If we didn`t include this as a gene then there would be a large gap between the gene before and after. It seems that other phages include this as a gene, so we called it. The start site chosen is 15858 instead of the glimmer and genemark start of 15846 because it doesn`t make the gene much shorter, and it gives better gap, spacer, z, and final scores. This is also the chosen start site for all other annotations (26). This protein has NKF because it doesn`t match with anything on HHPred and no other phage gives this protein a function. There is one phage in cluster EB that calls this a minor tail protein but their gene for it is much longer than ours, so I do not believe it is the same one. This is not a membrane protein according to DeepTMHMM. CDS 16028 - 17116 /note=Glimmer start and Genemark start both agree with the start point being 16028, I selected the suggested start because it has a length that is more than (120bp), this gene does have coding potential and is long enough to be a gene looking at the genemark. It is not the only gene with this function and it has similar phages with the results from phagesdb which are Crozenni, Albright, Abigail, Burritobowl, Johnathan, and Limabean. However, many more phages are similar other than the ones listed. This is the only gene in this region and has a predicated function, minor tail protein. CDS 17126 - 17929 /note=Based on Glimmer, GeneMark, and Starterator (90 manual annotations), this gene begins at 17,126 base pairs (bp). The suggested start has all the indications this is a gene based on it being the longest open reading frame, final score, z-score, spacer, gap, and it`s over 120 bp (it`s 804 bp). This gene has a ton of coding potential based on GeneMark and it does show a possibility of being a function. So, through hardship, we were able to determine that this gene based on the other genes in Abigail, CroZenni, and DickRichards, we believe this is another minor tail protein. CDS 17941 - 18690 /note=Bengal gene 22 has a Glimmer-predicted start at position 17941, which is also supported by GeneMark. The gene is going in the forward position, and it spans a total length of 750 base pairs, with an ATG start codon and a stop codon at position 18690. The gap between the last gene is 11, which is less than 50bp. There was a match with LimaBean, AvocadoMan, and Abigail. HHPred says that this is a gene. There is 27% frequency of this genes function being Endolysin. CDS 18696 - 19178 /note=Gene 23 of the Bengal phage has a Glimmer and GeneMark-predicted start at position 18696. This gene is a forward gene with a length of 483 base pairs, stopping at position 19178. It has a small gap of 5 base pairs preceding it and uses a GTG start codon. HHPRED analysis provides a 67.7% probability, though the function remains unknown. PhagesDB indicates that corresponding genes are found in other phages. Except one with the function holin within the same location. NCBI BLAST suggest that it could be a membrane protein particularly a holin protein. DeepTMHMM prediction provides evidence that this is a membrane protein. We decided since HHPred did not call this a holin protein but it is the only holin protein in the genome, it has more than two trans-membrane areas, and it follows an endolysin. CDS 19189 - 19530 /note=Gene: Bengal_24 Start: 19189, Stop: 19530, Start Num: 34 /note=Candidate Starts for Bengal_24: /note=(Start: 34 @19189 has 62 MA`s), (43, 19219), (49, 19252), (50, 19255), (59, 19297), (79, 19462), /note= /note=Start 19189 has the most MAs. Length, gap, spacer, z-score, final score and start codon all within range. Longest ORF. HHPRED and NCBI Blast suggest hypothetical protein function, similar to other cluster members. DeepTMHMM suggests membrane protein. CDS 19534 - 19848 /note=This is a gene because it is the correct length, and has good coding potential. This start site is the correct site because it encompasses all of the coding potential, and has good gap, spacer, and z scores. The final score is also acceptable. This is also the chosen start site for multiple other phages. This is a membrane protein because while this protein does not match anything well on HHPred and most other phages call this as a NKF protein, DeepTMHMM shows that this protein crosses the membrane twice. CDS 19955 - 20551 /note=I looked at Glimmer Start and Genemark and they both agree that this is a gene, it does have coding potential even though it is kinda hard to tell looking at the Genemark and yes the coding is long enough to be a gene. It has a large gap which is questioning.This gene does have similar phages, Crozenni, Burritibowl, and Albedo are the phages I selected because of the high identity percentage that was obtained from HHpred though Albright was not part of the similar phages. It is the only gene in this region, and it does not have a function similar to that of similar phages. CDS 20562 - 21218 /note=Glimmer, GeneMark, and Starterator (45 manual annotations) have this gene starting at 20,562 base pairs (bp). Based on the characteristics of this gene, it does fit the standards even though it isn`t the longest open reading frame. From GeneMark, this gene does have a ton of coding potential and is closely related to that of Avacadoman, Doobus, LimaBean, and Abigail. This gene and the ones in their Bacteriophage have not classified it to have a function. So, it will stay as a hypothetical protein. CDS 21215 - 21361 /note=I am not entirely sure. Bengal Phage Gene 28 has a Glimmer start position at 21215, that is also supported by GeneMark. Gene 28 has a GTG start codon and it`s total length is 147 BP, barely above the 120 BP minimum. There is a gap of -4 BP, which means that it might have some overlap with the previous gene, but I think we are ok because it is less than 30 BP. This gene is present in other genes including: SansAfet, LimaBean, and Arroyo. Phagedb blast says that the function is Unknown. It seems like it has some coding potential. CDS 21427 - 22305 /note=Gene 29 has a Glimmer and GeneMark-predicted start at position 21427 and extends to position 22305, making it 879 base pairs long. This forward gene has a gap of 65 base pairs before it and utilizes a GTG start codon. HHPRED analysis assigns a 99.9% probability to the gene, suggesting it encodes a DNA-binding protein from the Cas4 family of exonucleases. PhagesDB confirms that homologous genes exist in other phages within the same location, indicating a function. Cas4 exonucleases are involved in DNA repair and recombination, which may play a role in maintaining the stability of the phage genome. NOTE: Prof change to RecB-like exo/helicase because it has a helicase domain. CDS 22305 - 22469 /note=Gene: Bengal_30 Start: 22305, Stop: 22469, Start Num: 1 /note=Candidate Starts for Bengal_30: /note=(Start: 1 @22305 has 19 MA`s), (6, 22389), /note= /note=Start 22305 has the most MAs. Length, gap, spacer, z-score, final score and start codon all within range. Longest ORF. NCBI Blast suggests hypothetical protein function, similar to other cluster members. DeepTMHMM suggests inside topology. CDS 22466 - 22849 /note=This is a gene due to its length and coding potential. This is the correct start site because it encompasses the full coding potential and gives the longest coding sequence. This is also the start site for 40 other manual annotations. This gives the best gap and spacer scores, though the z and final scores could be better. This is no known function due to it not matching well to anything in HHPred, and DeepTMHMM showing that this is not a membrane protein. Other phages also call this NKF. CDS 22963 - 23553 /note=Glimmer agrees with the start site and calls it a gene, I selected the suggested start because it has a length above 120bp, though the gap is over 30bp, and the Z-score is 2, which is great. It has coding potential long enough to be a gene, Phagesdb shows it has similar genes with the same function, and other phages have different functions. HHpred blast has a great probability percentage of it being a deoxyridine triphosphatase but a low coverage percentage. It has a predicted function, which is deoxyridine triphophatase. NOTE: Prof change to dUTPase (new name for same function). CDS 23550 - 24059 /note=Glimmer, GeneMark, and Starterator (with 63 manual annotations) state the starting base pair (bp). Based on the characteristics given by the suggested starter, it fits all the standards of what a gene is supposed to be, plus it`s the longest open reading frame. From GeneMark, it does look like there is a ton of coding potential, but there is no chance another gene could be in this specific area. When looking at PhagesDB, the best Bacteriophages that correlate with this gene are Abigail, Arroyo, CroZenni, LimaBean, Albright, Jovita, TukTuk, QMacho, SarBear, Icarian, SanaSana, and BabyDaisy. In HHpred and PhagesDB Blast indicated this gene does have a function and it would be the thymidylate kinase. CDS 24075 - 24824 /note=Bengal gene 34 has a Glimmer-predicted start at position 24075, which is also supported by GeneMark. The gene is going in the forward position, and it spans a total length of 750 base pairs, with an ATG start codon and a stop codon at position 24824. The gap between the last gene is 15, which is less than 50bp. On Phamerator`s gene map LimaBean, AvocadoMan, and Abigail do not match up with Bengal 34. HHPred says that this is a gene, I think. There is 78% frequency of this genes function being recombination directionality factor, from Phagesdb Function Frequency. CDS complement (24981 - 25070) /note=Gene: Bengal_35 Start: 25070, Stop: 24981, Start Num: 2 /note=Candidate Starts for Bengal_35: /note=(1, 25073), (Start: 2 @25070 has 14 MA`s), (3, 25064), (5, 25037) /note= /note=Start 25070 with MAs. Length, gap, spacer, z-score, final score and start codon all within range. NCBI Blast suggests hypothetical protein function, similar to other cluster members. DeepTMHMM suggests signal protein. CDS complement (25070 - 25369) /note=Gene 36 in the Bengal phage starts at position 25369 according to both Glimmer and GeneMark predictions. This gene is reversed, with a total length of 300 base pairs, stopping at position 25070. The gene has a gap of 73 base pairs before it and starts with an ATG codon. HHPRED analysis provides a 76.9% probability for a potential function, but no definitive role has been determined. PhagesDB data indicates some homologous entries with the same protein number, but no known function has been assigned. CDS 25443 - 25673 /note=This is a gene because it is the correct length, has good coding potential, and prevents a huge gap between genes. This is the only available start site, though there is a large gap between this gene and the previous gene. The previous gene is a reverse gene and this one is forward gene, looking at the coding potentials and the next gene this is the smallest gap possible. For the gene function there are high matches for both glutaredoxin/glutaredoxin like and thioredoxin. Multiple other phages call this gene as either one including Albright and Crozenni, which are phages close to our phage. We choose to call the function as thioredoxin since NCBI blast and HHPred shows this as the closest match to our gene by a slight precentage. Crozenni also calls this as thioredoxin. CDS 26029 - 26322 /note=This gene has a start site at 26029 base pairs (bp) and ends at 26322. Based on expectations it fits all but the gap is over 30 bp (355 bp) and the z-score isn`t over 2, but it`s the best fit. There is coding potential for this gene and it best aligns with the Bacteriophages Abigail, CroZenni, and Avacadoman. Based on this information those bacteriophages and ours do not have a function at the moment and will be classified as a hypothetical protein. CDS 26408 - 26773 /note=Based on Glimmer, GeneMark, and Starterator (with 43 manual annotations) say it has a starting position at 26,408 base pairs (bp). For this to be a gene, it had to meet all the expectations regarding its length, gap, spacer, z-score, final score, and if there is coding potential (from GeneMark). This specific gene met the stated conditions and is considered a gene. It`s also found in other EB-class Bacteriophages such as Abigail, Albright, CroZenni, Doobus, LimaBean, and Avacadoman. When I looked to see if this gene had a classified function, there was none, stating this gene as a hypothetical protein based on NCBI Blast`s and PhagesDB Blast`s findings. CDS 26773 - 26958 /note=Phage gene 41 starts at position 26,773, as predicted by both Glimmer and GeneMark, and terminates at position 26,958, resulting in a total length of 186 base pairs. It is a forward-oriented gene with an ATG start codon and has a gap of -1 relative to the preceding gene. HHPRED analysis provides a 66.1% probability for a given function, which is considered insufficient for a confident annotation. Additionally, PhagesDB identifies only one phage with a homologous gene at the same location, but its function remains unknown. Due to the lack of strong functional evidence, this gene remains uncharacterized. CDS 27078 - 29552 /note=Gene 41 in the Bengal phage has both Glimmer and GeneMark predicting a start at position 27078. This forward gene spans a significant length of 2475 base pairs and terminates at position 29552. There is a 119 base pair gap before the gene, and it begins with an ATG start codon. HHPRED analysis confirms a 100% probability for four different proteins, with additional matches above 90%, identifying it as encoding a DNA primase/helicase. PhagesDB also shows numerous homologous genes at protein number 42 across other phages with the same function. DNA primase/helicase is essential for DNA replication, further supporting its likely function in phage genome. CDS 29549 - 31423 /note=This is a gene because it is long, has excellent coding potential, and is a required gene. The start site was chosen because this is the only start site that contains the full coding potential and gives the best gap, z, and final scores. Many other phages also chose this start site, though not all of them which is why we relied mostly on the coding potential to make this decision. The function of this gene is DNA polymerase I because it matches really well with DNA Polymerase/DNA Polymerase I on HHPred and NCBI blast. This is also what many other phages have called as the function of this gene, including Crozenni and Albright, phages Bengal closely resembles. CDS 31423 - 31623 /note=I looked at Glimmer Start and Genemark and both agree that this is a gene, coding potential was hard to tell because there were no waves but it did have a start codon. It is the only gene in the region and it does have similar phages based on the results from phagesdb, HHpred has a high probability percentage but the coverage percentage is low, and with that, I did not select any. It does not have a depicted function. CDS 31620 - 31922 /note=Glimmer, GeneMark, and Starterator (with 79 manual annotations) state this gene begins at 31,620 base pairs (bp). While looking at the conditions to see if this is a gene or not, it fits all the categories: has a length of over 120 bp, does not have an overlap or gap of more than 30 bp, is between the spacers of 9-14, the z-score is over 2, the final score is the smallest number, (closest to zero for this score) and has coding potential on GeneMark. While looking at other EB-class Bacteriophages, Abigail, Albright, CroZenni, LimaBean, Didgeridoo, and BabyDaisy have a similar or almost exact gene as the one in Bengal. Searching for a gene function within HHpred showed no significant numbers to show it has a definite function. The other bacteriophages say this has unknown functions, so I am calling its function a hypothetical protein. CDS 31922 - 32707 /note=Bengal gene 46 has a Glimmer-predicted start at position of 31922, which is also supported by GeneMark. The gene is going in the forward position, and it spans a total length of 786 base pairs, with an ATG start codon and a stop codon at position 32707. The gap between the last gene is -237 base pairs, which implies a large overlap with the last gene. This overlap is much bigger than the recommended 30 base pairs. On Phamerator`s gene map LimaBean, AvocadoMan, and Abigail match with Bengal 46. HHPred confirms that this is a gene. HHpred has a 99% probability and a 92% coverage for RNA polymerase sigma factor. Gene Candidates /note=Gene Included: /note=Show /note=10 /note= entries /note=Search: /note=Direction Start Stop Length Gap Spacer Z-score Final Score LORF Start Codon All GM Coding Capacity Selected Gene /note=Forward 31706 32707 1002 -217 11 1.774 -5.054 TRUE GTG /note=Forward 31910 32707 798 -13 11 1.006 -6.681 GTG /note=Forward 31922 32707 786 -1 10 1.543 -5.482 ATG /note=Yes /note= /note=Forward 31946 32707 762 23 8 1.538 -6.018 GTG /note=Forward 32030 32707 678 107 18 2.005 -6.109 TTG /note=Forward 32075 32707 633 152 16 1.773 -6.095 GTG /note=Forward 32096 32707 612 173 5 2.191 -5.413 ATG /note=Forward 32183 32707 525 260 10 1.355 -5.879 GTG /note=Forward 32228 32707 480 305 11 0.967 -6.765 ATG /note=Forward 32234 32707 474 311 9 0.909 -6.906 ATG /note=Showing 1 to 10 of 18 entries /note=Previous12Next /note=Function: /note=DNA binding protein /note=Notes: /note=Bengal gene 46 has a Glimmer-predicted start at position of 31922, which is also supported by GeneMark. The gene is going in the forward position, and it spans a total length of 786 base pairs, with an ATG start codon and a stop codon at position 32707. The gap between the last gene is -237 base pairs, which implies a large overlap with the last gene. This overlap is much bigger than the recommended 30 base pairs. On Phamerator`s gene map LimaBean, AvocadoMan, and Abigail match with Bengal 46. HHPred confirms that this is a gene. HHpred has a 99% probability and a 92% coverage for RNA polymerase sigma factor. Phagesdb Function Frequency says that it is 72% DNA binding protein. /note= /note=(New name for RNA Polymerase Sigma factor is DNA binding Protein) CDS complement (32639 - 32857) /note=This gene is located at 32857 and is transcribed in the reverse direction. Glimmer did not identify a start site because it didn’t think it was a gene, but GeneMark supports its presence as a gene. We agree that this is a gene but without a known function. It has a length of 219 base pairs, starting with an ATG codon and stopping at 32639, with a gap of 11 from the previous gene. HHPRED analysis shows a 94.4% probability for an automated match, but no known function is associated with it. PhageDB indicates the presence of homologous genes in the same location in other phages, but these genes also lack a known function. CDS complement (32869 - 33033) /note=Gene 47 has a start of 33069 and stop of 3269 with a length of 201 bp, gap of 47. We determined the start codon to be GTG Reverse. NOTE: Prof T changed start site to 33033 to conform w/ GM predictions and other related phages. CDS 33144 - 33302 /note=This is a gene because it is the right length and has good coding potential. This 33117 start site was chosen because while the gene mark/glimmer mark suggested start sites were mostly chosen by other phages, it was not as good as the chosen start site. This start site gives a slightly smaller gap (though it is still larger than we would like), gives longer coding potential, and better z and final scores. This gene has no known function because it does not match well with anything on HHPred, is not a membrane protein according to DeepTMHMM, and most other phages call it as NKF. NOTE: Prof T changed start site to 33144 – conforms with GM prediction and other related phages. CDS 33302 - 33703 /note=Both Glimmer start and Genemark agree that it is a gene, looking at the length, which is above 120bp. I selected the suggested start because the length, Z-score, final score, and start codon of ATG are of great value even though the Gap is -1. I took a look at Genemark, and it has coding potential. Following the results depicted from Phagesdb, this gene has similar phages and has a depicted function, which is parb-like nuclease domain protein. The Pham map shows that it is the only gene in this region it does not overlap with any other genes. CDS 33691 - 34134 /note=Even though Glimmer and It fits all the standards to be a gene based on the length, gap/overlap, spacer, z-score, final score, and if there is coding potential (which there is). There seems to be only this gene in GeneMark around this area. Looking at the other EB-class Bacteriophages, the best matches were Avacadoman, CroZenni, SansAfet, and Doobus. Based on HHpred calculations, there isn`t enough coverage or probability percentages to signify a function. NCBI Blast, PhagesDB, and these bacteriophages I`ve named don`t have a specific function either. So, this will be classified function wise as a hypothetical protein. CDS 34139 - 34624 /note=Bengal gene 52 has a Glimmer-predicted start at position of 34139, which is also supported by GeneMark. The gene is going in the forward position, and it spans a total length of 485 base pairs, with an ATG start codon and a stop codon at position 34624. The gap between the last gene is 4 base pairs. This overlap is within the recommended 30 base pairs. On Phamerator`s gene map Bengal 52 matches with other genes from LimaBean, AvocadoMan, and Abigail. HHPred confirms that this is a gene, and it says 93.9% probability and 88% Type IV secretion system apparatus protein. There is a 100% frequency of this genes function being a "minor tail protein," from Phagesdb Function Frequency. CDS 34800 - 35081 /note=The gene at position 34,800–35,081 is a forward-oriented, 282 bp-long gene with an ATG start codon. It has strong HHPRED support, showing 97.9% probability for ICEA protein and 97.8% for HNH endonuclease. PhageDB analysis indicates homologous genes in the same location, corresponding to protein 53 in other phages within the same cluster, suggesting it likely encodes an HNH endonuclease CDS 35240 - 35557 /note=Gene: Bengal_54 Start: 35240, Stop: 35557, Start Num: 10 /note=Candidate Starts for Bengal_54: /note=(1, 34748), (2, 34763), (3, 34766), (4, 34868), (5, 34982), (Start: 6 @35141 has 7 MA`s), (Start: 10 /note=@35240 has 44 MA`s), (12, 35270) /note= /note=Start 35240 has the most MAs and is closest to the start tick mark on the coding potential map. Hypothetical protein based on other cluster members, HHPRED has low probability. DeepTMHMM suggests inside topology. CDS 35554 - 35712 /note=This is a gene because it is the correct length and has good coding potential. This is the chosen start site because it encompasses the full coding potential, it gives the best gap score and decent z and final scores. Choosing the start site before this causes way to much overlap with the previous gene. This is also the chosen start site for 61 other phages. This gene is NKF because it doesn`t match well with anything in HHPred, and it is not a membrane protein according to DeepTMMH. Other phages also call this as NKF. CDS 35765 - 36265 /note=I picked it as the suggested start because both Gimmer start and Genemark agree that it is a gene based on the long length and a great Gap, even though the Z-score is below 2. Coding potential was hard to tell looking at the Genemark, but I assume it has even though there was no curve, this gene does have similar phages and high percentage that it is a hypothetical protein. CDS 36329 - 36493 /note=Based on Glimmer, GeneMark, and Starterator, this gene starts at 36,329 base pairs (bp) and ends at 36,493 bp. Based on the candidates, the length is over 120 bp (165 bp), the spacer is between 4-12, the final score is the smallest (closest to zero), and it has coding potential. Still, the only issue is that the gap is over 30 bp, and the z-score isn`t over 2, but it`s the best decision out of the annotations. There`s no other bacteriophages that align with this gene that aren`t a draft and HHpred doesn`t have any good options for probability or coverage in any of its possible functions. So, this gene will be a hypothetical protein until further notice. CDS 36486 - 36746 /note=Bengal gene 57 has a Glimmer-predicted start at position of 36486 which is also supported by GeneMark. The gene is going in the forward position, and it spans a total length of 260 base pairs, with an GTG start codon and a stop codon at position 36746. The gap between the last gene is -8 base pairs, which implies a small overlap with the last gene. On Phamerator`s gene map AvocadoMan 55, and Abigail 55 match with Bengal 57. HHPred confirms that this is a gene. NKF. CDS 36736 - 37308 /note=This forward-transcribed gene is located at position 36736, with both Glimmer and GeneMark identifying it at the same site. It spans 573 base pairs, starting with ATG and stopping at 37308. A gap of -11 suggests slight overlap with the preceding gene. HHPRED analysis indicates a 99.9% probability for dihydrofolate reductase. CDS 37305 - 37829 /note=Gene: Bengal_60 Start: 37305, Stop: 37829, Start Num: 3 /note=Candidate Starts for Bengal_60: /note=(Start: 3 @37305 has 20 MA`s), (7, 37509), (8, 37512) /note= /note=Start 37305 has the most MAs. Longest ORF. Unknown function from other cluster members. DeepTMHMM suggests inside topology. CDS 37829 - 38212 /note=This is a gene due to its coding potential and its length. This is the correct start site because it encompasses the full coding potential, gives the best gap, spacer, z, and final scores. This is also the chosen start site for 23 other phages. This gene has no known function as it does not match with a high enough probability with anything on HHPred, no other phages call a function for this gene, and this is not a membrane protein according to DeepTMHHM. CDS 38209 - 39114 /note=This is gene because both Genemark and Glimmer start agree with the start site, I selected it as the suggested start looking at the length, gap, Z-score even though it is below 2. It does have a coding potential and long length enough to be called a gene, it has similar phages based on the results depicted from NCBI Blast, which also shows a high percentage of this gene having a function called Thymidylate Synthase looking at the results from phagesdb. CDS 39129 - 39353 /note=Based on GeneMark, Glimmer, and Starterator (with 23 manual annotations), this gene begins at 39,129 base pairs (bp) and ends at 39,353 bp. It fits the standards of it being 225 bp, which is over 120 bp, the gap is shorter than 30 bp, the spacer is between 9-14 bp, the z-score is over 2, the final score is the smallest number (closest to zero in this case), there is high coding potential, and is also the longest open reading frame (LORF). The EB class Bacteriophages with a similar gene to the one in Bengal are Albright, Burritobowl, and CroZenni. Their bacteriophages, PhagesDB Blast, and NCBI Blast don`t have a classified function for this gene. Plus, HHPred doesn`t have any high possible chance it could be any of its suggestions so that it will be classified as a hypothetical protein. CDS 39353 - 39547 /note=Bengal gene 63 has a Glimmer-predicted start at position of 39,353 which is also supported by GeneMark. The gene is going in the forward position, and it spans a total length of 195 base pairs, with an ATG start codon and a stop codon at position 39547. The gap between the last gene is -442 base pairs, which implies a large overlap with the last gene. This overlap is much bigger than the recommended 30 base pairs. On Phamerator`s gene map AvocadoMan 63, and Abigail 63 match with Bengal 63. HHPred confirms that this is a gene. Phagesdb Function Frequency has a 75% hit on tape measure protein. NOTE: Prof changed – there can be only 1 tape measure protein, and this is not it (see gp16); DeepTMHMM indicates that this has 2 TMRs, so changed to membrane protein. CDS 39625 - 39828 /note=This gene candidate starts at position 39691, with GeneMark predicting a slightly earlier start at 39625. It spans 138 base pairs in the forward direction, with an ATG start codon and a stop site at 39828. The gene has a gap of 143 from the previous gene. HHPRED analysis yields low probabilities, all below 60%, indicating weak functional predictions. PhageDB confirms the presence of homologous genes in the same location in other phages, but none have a known function. CDS 39825 - 40214 /note=Gene: Bengal_66 Start: 39825, Stop: 40214, Start Num: 5 /note=Candidate Starts for Bengal_66: /note=(4, 39633), (Start: 5 @39825 has 14 MA`s), (8, 39906), (11, 39954), (22, 40197) /note= /note=Start 39825 has the most MAs. Length, gap, spacer and Z-score within range. Hypothetical function based on other cluster members. DeepTMHMM suggests inside topology CDS 40195 - 40389 /note=This is a gene due to its length and coding potential. This is the chosen start site because of the coding potential. While the gap, z, final, and spacer scores are not as good for this start site as it is for a later start site, this is the only start site to encompass the full coding potential. Other phages do not choose this start site, but again we would be cutting off coding potential if we chose another start site. This is a no known function protein (possibly a membrane protein) because it does not match with a high probability anything on HHPred. Other phages call this as NKF. There is an interesting result on DeepTMHMM, it is noted though doesn`t really seem to show a membrane protein, so it was called as NKF. CDS 40402 - 40632 /note=Glimmer start and Genemark both call this a gene, I selected it as the suggested start cause it has a great length, Gap, Z-score above 2, and start codon ATG. It has 231bp, which is long enough to be a gene, it has similar phages such as Crozenni, it is the only gene in this region, and it does have coding potential, but it does not have a predicted function. CDS 40632 - 40982 /note=Glimmer, GeneMark, and Starterator (with 58 manual annotations) all believe this gene begins at 40,632 base pairs (bp) and ends at 40,982 bp. This gene fits all the standards in the suggested selection such as this being 351 bp in length, reaching over 120 bp, there being a single overlap of base pairs, and the spacer being one bp off, but I don`t think it will make too much of a significant difference, sadly the z-score is not over two, but it is the best option, the final score is the second closest to being the smallest number, it has high coding potential, and it is the longest open reading frame (LORF). It shows on Phages DB that Avacadoman, Abigail, and TukTuk have the highest e-value scores that align with Bengal`s gene, but not any of these bacteriophages, PhagesDB Blast, and NCBI Blast show a classification for this gene. HHPred was of no help either since none of the suggestions it gave had a higher probability of 90% or coverage of over 50%. So, this gene will be considered a hypothetical protein. CDS 41483 - 41611 /note=This forward-transcribed gene candidate starts at position 41483, identified only by GeneMark, as Glimmer did not predict a start site. It spans 129 base pairs, starting with ATG and stopping at 41611, with a gap of -1 from the preceding gene. HHPRED analysis shows a 90.1% probability for a TyxH-like protein. PhageDB indicates that some phages contain homologous genes in the same location, but no known function has been assigned to them. Due to the lack of functional confirmation, this gene is not considered required.