CDS 132 - 563 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="VroomVroom_1" /note=Original Glimmer call @bp 132 has strength 9.68; Genemark calls start at 132 /note=SSC: 132-563 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Arthrobacter phage Elezi] ],,NCBI, q1:s1 100.0% 2.95337E-67 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.274, -1.953940808934884, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Arthrobacter phage Elezi] ],,QNJ56502,80.6667,2.95337E-67 SIF-HHPRED: Terminase_4 ; Phage terminase, small subunit,,,PF05119.15,47.5524,98.9 SIF-Syn: Terminase, small subunit. No upstream genes & downstream gene is Terminase, large subunit, just like in phage Elezi. This gene meets the requirements on the Approved functions list. /note=Primary Annotator Name: Barden, Sophia /note=Auto-annotation: Glimmer and GeneMark agree on the designation for the start at 132. The start codon is ATG, which is a highly conserved and common start codon for this gene. /note=Coding Potential: Prominent and notable coding potential on both Host- and Self-trained GeneMark maps, in the forward direction, on this gene starting at 132 and persisting until stop 563. This indicates presence of a real gene with a start codon at 132 and a stop at 563. /note=SD (Final) Score: The RBS final score is -1.954, indicating that this gene candidate is the best supported of all the potential candidates provided. /note=Gap/overlap: There is no notable gap because this gene is the first of the genome. There is a noted 11bp overlap of G(stop@563 F) and the next G(stop@2283). /note=Phamerator: As of 2/7/2023, this gene was found in Pham 68548. This Pham represents 945 members, and 67 are denoted as draft. We see multiple phages of the AZ cluster containing the gene present in Pham 68548. /note=Starterator: This analysis was run 01/27/23 on database version 501. The “Most Annotated” start for this gene in Pham 68548 was start # 55, called in 348 of the 877 (39.7%) non-draft genes in the pham. However, the auto-annotated start for VroomVroom_1, G(stop@563 F) was start #59, called in 223 of 944 ( 23.6% ) of genes in pham. We decided to go with start number 55 at position 132 due to the high conservation of this start site amongst phage genomes within the AZ cluster. The coding maps and pham maps exhibit high potential and conserved synteny across the AZ cluster for this start site. /note=Location call: Considering all the evidence collected above, we conclude the start site for the gene(stop@3563F) is at 132. We conclude that this gene is real. /note=Function call: Based on the ample conserved data provided from both databases, the BLAST hits indicate that the gene is a terminase, small subunit. We see no CDD hits, but have noted significant HHpred hits in PECAAN that support the hypothesis that this gene functions as a small subunit, terminase. /note=Transmembrane domains: No evidence of Transmembrane Domains present from TMHMM prediction. We conclude that this is not a membrane protein. /note=Secondary Annotator Name: Martinez, Daniela /note=Secondary Annotator QC: /note= /note=Primary Annotator #2 Name: Martinez, Daniela /note=Auto-annotation: Both Glimmer and Genemark called the gene at 132. The start codon is ATG, which is a reasonably common codon. The region between 132 and 563 has good coding potential in the forward direction. This information coincides with the characteristics of a real gene. /note=Coding Potential: The gene has substantial coding potential between the start codon at 132 and the stop codon at 563. This region covers the entirety of the putative ORF. Since the start codon covers all of the coding potential, it is reasonable to conclude that a start at 132bp is the correct start. /note=SD (Final) Score: -1.954. This is the best final score on PECAAN. /note=Gap/overlap: There are no gaps or overlaps prior to this gene because it is the first gene in the genome. However, there is an 11bp overlap between this gene and the next. The length of this gene is reasonable, as it exceeds the 120bp threshold. /note=Phamerator: This gene is found in the 68548 pham as of February 6, 2023. This pham has 945 members. The pham this gene belongs to is present in other members of the cluster AZ. The phages used for comparison are phage Asa and phage Eraser. /note=Starterator: The auto-annotated start site #59 was found in 223 (23.6%) of 944 genes in pham and was called 100% of the time when it was present. 223 non-draft phage annotations called it. Start site #59 was manually annotated 195/877 times according to starterator. Start site #59 is the start at 132bp in VroomVroom, which is what Glimmer and Genemark both predicted. /note=Location call: A protein-coding gene (start 132, stop 563) was predicted by glimmer, GeneMarkS, and GeneMarkHost. There is sufficient evidence to call this a “real” gene. The most likely start candidate is at 132bp since both Glimmer and Genmark agreed on this start. /note=Function call: Both BLAST hits gave results for a small terminase subunit with good e-values and alignment. Based on this, there is sufficient evidence to support a small terminase subunit function call. /note=Transmembrane domains: There are no TMDs as observed in DeepTMHMM. This is not a membrane protein. The terminase small subunit of a phage does not interact with the cell membrane, so it makes sense that there is an absence of TMDs. CDS 553 - 2283 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="VroomVroom_2" /note=Original Glimmer call @bp 553 has strength 12.93; Genemark calls start at 553 /note=SSC: 553-2283 CP: no SCS: both ST: SS BLAST-Start: [terminase large subunit [Arthrobacter sp. B2a2-09]],,NCBI, q12:s1 98.0903% 0.0 GAP: -11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.248, -4.134629096801125, no F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Arthrobacter sp. B2a2-09]],,WP_269998154,85.8657,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,92.0139,100.0 SIF-Syn: Nguyen, Angelynn: [terminase, large subunit, upstream gene is terminase large subunit, downstream is portal protein, just like in phage Maureen.] /note=Primary Annotator Name: Berber-Pulido, Rodrigo /note=Auto-annotation: Both Glimmer and Genemark agree on the start site of 553 /note=Coding Potential: Both Host-trained and phagesdb Genemark confirm that there is high coding potential throughout the entire gene /note=SD (Final) Score: -4.135, lowest score out of all potential start sites /note=Gap/overlap: -11 (indicates a base overlap with previous gene, but this start site encompasses all of the coding potential.) /note=Phamerator:68546, a part of the AZ cluster, this pham has 1291 total m embers with 97 drafts, Other members such as Adumb2043_2, Yang_2, and Reedo_2 all share this gene and are in the same cluster. /note=Starterator: This starterator analysis was ran on 1/27/23, and it can be concluded that the start site 47 (553) is the correct start site. This is however, not the most annotated one as start site 61 was called most often. However, this gene did not have this start site. /note=Location call: Start 47 /note=Function call: Terminase, PhageDB Blastp and NCBI Blastp all share the common function of a large subunit terminase. There is a lot of study on terminases, thus I would imagine the reason that it is very conserved and studied. /note=Transmembrane domains: No transmembrane domain, makes sense since this function enzyme would occur inside the cell. /note= /note=Primary Annotator 2 Name: Nguyen, Angelynn /note=Auto-annotation: Both Glimmer and Genemark agree that the start site is 553. /note=Coding Potential: There is good coding potential throughout the ORF in the forward direction based on the host and self-trained genemark graphs. /note=SD (Final) Score: -4.135, this is the score that is closest to 0 meaning that this is the best final score. /note=Gap/overlap: There is an 11bp overlap with the previous gene and 21 base gap with the following gene. But this is acceptable since there is coding potential in this region and the gap is very small. /note=Phamerator: 68546. Date: 2/6/23. This was found in the AZ cluster. The pham has 1291 total with 97 drafts. This is conserved and found in Reedo_2 and Yang_2 which are also members of the AZ cluster. /note=Starterator: 68546 This analysis was run on 2/6/23 and it was confirmed that start number 47 is correct. However, this was far from the most common start number. The most common start site was 61, but this gene did not contain that start number. /note=Location call: All the evidence supports that the gene is real and the start site is at 553. Although this region is not the most conserved from the Starterator data, the auto annotation and manual annotations agree that this should be the start site. /note=Function call: Terminase protein, large subunit. PhageDB BLASTp have this function for most of their top hits with an e-value of 0. NCBI BLAST also agree on this function, based on its top 5 hits. But to elaborate, there is 98% coverage, 77% identity, and 0 e-value. HHpred`s top hit was also for the terminase protein with 100% probability, 92% coverage, and 5e-40 e-value. Lastly, the CDD indicated that there were two domains for a terminase protein with a very small e-value in the 10e-7. /note=Transmembrane domains: There are no transmembrane domains based on the DeepTMHMM. CDS 2305 - 3672 /gene="3" /product="gp3" /function="portal protein" /locus tag="VroomVroom_3" /note=Original Glimmer call @bp 2305 has strength 14.05; Genemark calls start at 2305 /note=SSC: 2305-3672 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage Tweety19] ],,NCBI, q4:s3 96.4835% 0.0 GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.274, -2.033982896655645, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage Tweety19] ],,YP_010678395,86.5639,0.0 SIF-HHPRED: Portal protein; Bacteriophage, SPP1, Portal Protein, Head completion proteins, Connector Complex, DNA Channel, VIRAL PROTEIN; 2.7A {Bacillus subtilis},,,7Z4W_D,91.2088,100.0 SIF-Syn: Portal protein. Upstream is NKF, downstream is terminase. Phages Maureen and Tweety19 have the most conserved synteny, with both upstream genes listed as terminase, large subunit, and both downstream genes listed as MuF-like minor capsid protein. However, for VroomVroom, the downstream genes are NKF, so this should be investigated further. /note=Primary Annotator #1 Name: Bursulaya, Bella /note=Auto-annotation: Glimmer and GeneMark were used, and both call the start at 2305. /note=Coding Potential: Coding potential is found in both the forward and reverse direction, but the forward direction is far more prominent and therefore it is a forward gene. The coding potential was similar in both the Host and Self Trained GeneMark. /note=SD (Final) Score: The final score is -2.034 which is also the best score on PECAAN /note=Gap/overlap: The gap is 21, which is very reasonable since it is not that large. /note=Phamerator: pham is 68544, which is conserved in other AZ clusters such as Adolin. Date was 2/5/2023 /note=Starterator: Start site 76 was called most often, in 295 out of 1436 non-draft genes. However, the most manually annotated start site is 92 (manual annotations of 7/1436), which corresponds to start 2305, which agrees with Glimmer and GeneMark /note=Location call: Based on all of the evidence compiled, this gene is real with a start site at 2305. Even though this start site is not the most conserved, it seems to have the most manual annotations. /note=Function call: The predicted function is a portal protein. E values from both NCBI BLASTp and PhagesDB BLASTp were very low (0) and all identities were higher than 75% (Tweety19 and VResidence). The query cover on the NCBI BLASTp were also very high, both greater than 96% for phages Tweety19 and Liebe. The top hits in both of the BLAST programs all listed the function as a portal protein. The CDD and HHpred hits were also informative, and both listed the function as a phage portal protein. /note=Transmembrane domains: There are no transmembrane domains for my protein, which makes sense because a phage portal protein should not insert itself into the bacterial membrane or else it might pump out the phage DNA out of the cell. No predicted TMDs by DeepTMHMM, so the protein is not a transmembrane protein. /note= /note=Primary Annotator #2 Name: Nguyen, Mya /note=Auto-annotation: Glimmer start: 2305, Genemark start: 2305 /note=Coding Potential: Good coding potential in forward direction, some coding potential in the reverse region. The chosen start site covers all the coding potential. /note=SD (Final) Score: -2.034, good score and is the least negative value. /note=Gap/overlap: 21, reasonable. There are no other start candidates and the length of the gene is acceptable since it is the longest out of all the candidate start sites. /note=Phamerator: 68544, date 2/5/23. The pham in which the gene is conserved also has other members of the same cluster. The function calls were consistent, being mostly portal proteins. /note=Starterator: Start number 92, which corresponds to start site 2305, is the most manually annotated gene and corresponds with Glimmer and Genemark. However, it is not the most conserved start site. /note=Location call: Real gene, start site at 2305. This is not the most conserved start across all over hits, however, this has good manual annotations. This start site is called by both Glimmer and Genemark as well. /note=Function call: Predicted function is portal protein, based on two hits from NCBI BLAST and PhagesDb Blast, which had low e-values of 0. Query cover for the two were also high, at over 96%. All BLAST programs, CDD and HHpred listed top hits as portal proteins as well. /note=Transmembrane domains: The absence of TMDs makes sense because because a portal protein would not TMDs or else the genome would not be packaged properly. The protein is not a transmembrane protein. CDS 3676 - 4473 /gene="4" /product="gp4" /function="hypothetical protein" /locus tag="VroomVroom_4" /note=Original Glimmer call @bp 3676 has strength 10.33; Genemark calls start at 3676 /note=SSC: 3676-4473 CP: yes SCS: both ST: SS BLAST-Start: [head maturation protease [Arthrobacter phage Tweety19] ],,NCBI, q1:s1 83.7736% 1.08324E-106 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.96, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[head maturation protease [Arthrobacter phage Tweety19] ],,YP_010678396,70.5882,1.08324E-106 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chawla, Esha /note=Auto-annotation: Glimmer Start Site: 3676, GeneMark Start Site: 3676. Both Glimmer and GeneMark called (auto-annotated) the same start site for this gene. The start codon at this site is ATG. /note=Coding Potential: There is high coding potential contained in the putative ORF. There was high coding potential observed in the first forward reading frame in both GeneMark Host and Self . Furthermore, from the GeneMark and Glimmer proposed start site (3676) to the proposed stop site (4473), there is fairly high coding potential throughout almost the entirety of the gene – the chosen start site, thus, covers all of the observed coding potential. As such, this gene is likely a real gene with a start site at 3676. /note=SD (Final) Score: -2.601. It is the best SD score of those listed on PECAAN. /note=Gap/overlap: 4 base pair gap. This gap is far too small for the addition of an entirely new gene. Moreover, this small of a gap is reasonable/acceptable for an inter-gene gap. Moreover, 4bp is a fairly well-conserved, known gap for ideal ribosomal binding. There is no coding potential upstream of the currently proposed start site. As such, no new gene needs to be inserted, and the auto-annotated/called start site does not need to be moved further upstream. /note=Phamerator: On the day of my investigation, 2/6/2023, this gene was found in Pham 67762. This Pham has 11 members, 4 of which are drafts. The gene is conserved in 5 other phages, 3 of which are drafts – these phages are also found in the same cluster as VroomVroom (AZ). Liebe and Maureen are examples of some of the non-draft phages that have this gene and are found in the same cluster, AZ. The phamerator and phams database called the function of this gene to be a head maturation protease. /note=Starterator: The auto-annotated start site is also the most manually annotated start site. It was manually annotated in 7 of 11 genes in this pham. The auto-annotated start site is site 3, which has a corresponding base pair of 3676. This start site choice is a reasonable choice, as it is well-conserved among the members of the pham AZ, which my gene belongs to. /note=Location call: Based on the above evidence, this is a real gene, as it is well-conserved in phamerator and has good coding potential. Moreover, the most likely start site is 3676, as it is well-conserved in starterator and covers all of the coding potential. /note=Coding Potential Drop-Down Menu (see end of PECAAN Notes Instructions): yes /note=Function call: Based on the collected data, we can conclude that there is not very conclusive evidence. Both NCBI and BLASTp suggest that this gene is likely a head maturation protease, specifically acting in the minor capsid. However, the identity values are all lower than 70%, which is not very convincing – NCBI has an identity value of 69.37%, and BLASTp has an identity value of 68%. On the flip-side, the e-values are fairly consistently low for genes found in the same cluster AZ, including Maureen, Tweety19, and Liebe, and because these phages also annotate this gene as a head maturation protease/minor capsid protein, I think this is a fairly good match. Therefore, at the present moment, the most likely function is a head maturation protease in the minor capsid, but further evidence needs to be presented for confirmation of this function. /note=Subsequently, analysis was done in Module 7. Based on all of the data I have collected from BLASTp, PECAAN, CDD, and HHpred at the culmination of Module 7, it appears there is a lack of agreement on the function of this gene. BLASTp and PECAAN seem to propose that Gene 4 encodes for a head maturation protease. However, CDD proposes that this gene assists with energy production and conversion as a pyruvate dehydrogenase. Finally, HHpred suggests the gene encodes for a toxin. The CDD and HHpred proposed functions, however, had relatively high e-values, and as such, I am not inclined to use them as much evidence in this functional analysis. While I currently think that the most likely function of this gene is that it is a head maturation protease, based on the BLASTp and PECAAN analysis, given the general amount of disagreement of the function based on the type of program used, I am hesitant to assign this gene any function conclusively yet. /note=Transmembrane domains: No transmembrane domains – unable to conclude if this is in-line with the function, as the current function of this gene is NKF. /note= /note=Primary Annotator #2: Okumura, Joey /note=Auto-annotation: Glimmer and Genemark used → agree on start 3676 with ATG codon /note=Coding Potential: start site covers good coding potential in first forward ORF → similar results in host and self trained GeneMark (some coding potential in third reverse ORF but worse than forward + does not cover entire coding region with suggested start site of 3676) /note=SD (Final) Score: best FS of -2.601 (z score of 2.96) /note=Gap/overlap: 3 = too small to fit another gene /note=Phamerator: Phamerator: 67762. Date 2/07/2023. Pham has 11 members, 4 are drafts. The gene is conserved in 2 nondraft phages and 3 drafts. These phages are also found in the same cluster as VroomVroom (AZ); found in Liebe (AZ) and Tweety19 (AZ). /note=Starterator: Start site 3 in Starterator was called in 3/7 non-draft genes in the pham. Start 3 is 3676 in VroomVroom. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on above evidence, this is likely a real gene with the start site at 3676. /note=Function call: Multiple PhageDB and NCI BLAST hits with varying functions (i.e. MuF-like minor capsid protein, head maturation protease, ADP-ribosyltransferase domain, and VIP2-like ADP-ribosyltransferase toxin). CDD had one hit of a pyruvate/2-oxoglutarate/acetoin dehydrogenase complex but e-Value was relatively high (6.15e-03). The HHpred hit e values were too high (>15). /note=Transmembrane domains: Not a transmembrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 4551 - 5138 /gene="5" /product="gp5" /function="scaffolding protein" /locus tag="VroomVroom_5" /note=Original Glimmer call @bp 4551 has strength 16.71; Genemark calls start at 4551 /note=SSC: 4551-5138 CP: yes SCS: both ST: SS BLAST-Start: [scaffolding protein [Arthrobacter phage Adolin]],,NCBI, q1:s1 100.0% 2.65721E-42 GAP: 77 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.96, -2.66371296573402, yes F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Arthrobacter phage Adolin]],,QHB36588,67.9558,2.65721E-42 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_g,54.359,97.3 SIF-Syn: scaffolding protein, upstream is portal protein and downstream is major capsid protein, just like in phages DrManhattan and Crewmate /note=Primary Annotator Name #1: Critzer, Nicole /note=Auto-annotation: Used both Glimmer and Genemark for auto-annotation. Both agree on the start site being 4551 with the ATG start codon. /note=Coding Potential: There is strong coding potential between the start and stop site, however, potential stops about 100bp before stop site. /note=SD (Final) Score:-2.664, This is not really relevant since we are only given one start site. /note=Gap/overlap: The gap is 77bp - this is good because its too small for another gene to be there (it`s <120 bp) and when looking at pham maps it doesn`t look like another gene is normally inserted in the space. /note=Phamerator: Gene is in pham 1850 which is in agreement with starterator, Date: 2/7/23, conserved in crewmate (AZ) and DrManhattan (AZ). /note=Starterator: Starterator gave the auto-annotated start site 17@4551 on 1/27/23 from database version 501. This makes sense given that our gene has the Most Annotated start site 17 which is called in 37 of the 40 non-draft genes in the pham. In addition to this the start site encompasses all the coding potential and has 37 manual annotations telling us a lot of other humans thought this start site was reasonable. /note=Location call:4551, gives us the LORF and encompasses all coding potential with least negative final score(only Final score) /note=Function call:scaffolding protein - If we look at the graphic summary provided by phagesdb the majority of hits are for genes that function as scaffolding proteins. In addition to this, for both phagesdb and ncbi BLAST the genes with the lowest e-values, highest percent identity match with the gene, and highest alignment function as scaffolding proteins. This agrees with the best HHpred hit (6BOX_g). Also if we look at HHpred’s amino acid sequence alignment there is a high confidence in predicting helix structures which is compatible with the structure provided for the scaffolding protein. Despite the high e-val for the HHpred hit, synteny suggests that the function call is logical because it is flanked by a major capsid protein and a gene right before the portal protein, just like in phages DrManhattan and Cremate. /note=Transmembrane domains: No hits, protein is inside membrane, so this is not a transmembrane protein. /note= /note=Primary Annotator #2 Name: Ortiz-Gomez, Diana /note=Auto-annotation: Glimmer and Genemark. Both indicate the start at 4551. /note=Coding Potential: Coding potential is seen in the forward strand which confirms that this is a forward gene. Genemark Self and Host also indicate this coding potential. /note=SD (Final) Score: -2.664. This is the only potential final score. /note=Gap/overlap: There is a gap of 77bp, which is not large enough for another gene. There is also no coding region upstream from this start site, so it’s the only and best start site. /note=Phamerator: Pham 1850. Date 2/6/2023. It is conserved, such as in Warda(AZ) and Yang(AZ). /note=Starterator: Start site 17 is found in 100% of genes in this pham and this correlates with basepair 4551 for our gene. This agrees with Glimmer and Genemark. /note=Location call: Evidence shows that this is a real gene with a start site at 4551 because this start is conserved and the only potential start. /note=Function call: Scaffolding protein. Phages from PhagesDB BLAST hits have functions as scaffolding proteins (3e-46 and 1e-40). Top NCBI BLAST hits have the scaffolding function as well (identities 47%+, e-values of 3e-42 and 4e-42). HHpred provides information that this might be a scaffold protein, but the e-value is slightly high. Despite this high e-value, synteny confirms that this is most likely the function because this scaffolding protein gene comes before the major capsid protein and after the portal protein. /note=Transmembrane domains: TMHMM does not predict any TMDs, therefore, this is not a membrane protein. CDS 5167 - 6108 /gene="6" /product="gp6" /function="major capsid protein" /locus tag="VroomVroom_6" /note=Original Glimmer call @bp 5167 has strength 18.69; Genemark calls start at 5167 /note=SSC: 5167-6108 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Arthrobacter phage Lizalica]],,NCBI, q1:s1 99.6805% 3.52521E-178 GAP: 28 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.737, -3.123974469270304, yes F: major capsid protein SIF-BLAST: ,,[major capsid protein [Arthrobacter phage Lizalica]],,UIW13491,87.9365,3.52521E-178 SIF-HHPRED: Major capsid protein; P22 Bacteriophage, VIRUS; 3.3A {Salmonella phage P22},,,5UU5_D,92.9712,100.0 SIF-Syn: There is good synteny for this gene, as it was compared to multiple other bacteriophages and found similar to genes in their phams. These bacteriophages include gene 7 in Tweety19, gene 7 in powerpuff, and gene 7 in Niobe. /note=Gene Stop@37126 /note=Primary Annotator Name: Dawson, Niels /note=Auto-annotation: Both glimmer and genemark agree on start site and stop site. The start codon is called at 36896. The coding potential is covered with this start site. /note=Coding Potential: Yes, the start site covers the entire coding potential. There is sufficient evidence to conclude this is a real gene. This is evidenced by glimmer and genemark calls and coding potential calls. There is also good synteny for this gene as discussed in the next box. /note=SD (Final) Score: -3.124, z-score is >2. These are good scores for the start site and are further evidence for its existence and for the correct choice of start site. /note=Gap/overlap: 28. This gap is the smallest reasonable possible gap for this gene. There is not enough room for reverse coding potential to exist and there is not enough space for a reasonable new gene to be inserted. Alternative gaps are much larger and more unreasonable. /note=Phamerator: 37126 - 5 members, 2 are drafts. 37126 - All the other phages are in AZ. I used liebe, marine and tweety19. /note=Starterator: Start site is at 36896. This is predicted by autoannotation and predicted by starterator.Date - 1/27/2023. 37126 - Start 2 was called in 3 of 3 non-draft genes. Start 2 was not listed in the vroomvroom genome on starterator. However, starterator called start 1 100% of the time when present. This start site (start site 1), 36896, was called once and was called by starterator for vroomvroom. This means that starterator did not call the most common start site for vroomvroom, but did call the glimmer/genemark/manually predicted start site, adding further evidence for the chosen start site. /note=Location call: This is a real gene and the start site is at 36896. Start site is at 36896. This is predicted by autoannotation and not predicted by starterator. However, the start found most commonly on starterator was not listed on vroomvroom. /note=Function call: This function is unknown. Although there were some hits on this gene from other phages that shared synteny for this gene, the functions of those genes are currently unknown, meaning that this gene’s function is also unknown. These hits were from phagesdb BLASTp and NCBI BLASTp. There were good e-values for hits, but only hypothetical proteins were found to exist on these hits. There were no hits on CDD. HHPRED had only hits with positive e-values, and therefore were insignificant. CDD and HHPRED provided no clues to functional characterization. /note=Transmembrane domains: 0 TMD hits on TMHMM, no transmembrane domains detected. The protein was labeled as outside. /note= /note=Primary Annotator Name: Pan, Crystal /note=Auto-annotation: glimmer and genemark both agree on the start and stop sites. The start codon is at 5167. /note=Coding Potential: This start site includes all of the coding potential. We don’t really see any reverse coding potential here. Both genemark and glimmer agree that there is good coding potential here. /note=SD (Final) Score: -3.124, least negative score, z score >2. These scores are good and indicate that this is a real gene and that the start site that is predicted is likely the real start site of this gene. /note=Gap/overlap: +28 nucleotides. This gap is not very big relative to many of the other gaps, so it would make sense to not add a gene in this gap. The gap at this start site seems to be the most reasonable of them all – once we get past this start site, the next start site has a significantly larger gap between the genes. /note=Phamerator: 274 members, 42 drafts. 56 in cluster AZ. Also from other clusters as well. Adolin and Adumb2043 were used as comparisons. /note=Starterator: start 8 called in 74/173 in 230 of the non-draft genomes called it. Start 8 in vroomvroom was autoannotated at 5167. The start site at 5167 is agreed on by auto-annotation, starterator (most called), as well as glimmer and genemark, so we can be fairly confident that this is the start site of the gene. /note=Location call: Auto-annotation and starterator predicted start site agree with 5167 and end at 6108. This is likely a real gene. /note=Function call: This is likely a major capsid or head protein. Both databases concur, and there is a lot of evidence. There is good synteny with many other capsid and head proteins of other Arthrobacter phages of both AZ and other clusters. /note=Transmembrane domains: There are 0 TMD hits, which means that there are no transmembrane domains. It makes sense for the absence of TMDs in the context of this hypothesized function of this gene, which is a major capsid protein. It does not function as a membrane where there are receptors for recognition, etc. as the tail would help with that. The capsid’s function is just to hold and protect all the genetic material of the bacteriophage. CDS 6184 - 6585 /gene="7" /product="gp7" /function="head-to-tail adaptor" /locus tag="VroomVroom_7" /note=Original Glimmer call @bp 6184 has strength 10.44; Genemark calls start at 6184 /note=SSC: 6184-6585 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail adaptor [Arthrobacter phage Janeemi]],,NCBI, q1:s1 98.4962% 1.02494E-67 GAP: 75 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.274, -1.953940808934884, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Arthrobacter phage Janeemi]],,UVK63529,80.4196,1.02494E-67 SIF-HHPRED: Hypothetical protein yqbG; ALPHA, GFT NMR, Structural Genomics, Protein Structure Initiative, PSI, NESG, SR215, Northeast Structural Genomics Consortium, Unknown; NMR {Bacillus subtilis} SCOP: a.229.1.1,,,1XN8_A,81.9549,99.2 SIF-Syn: Head-to-tail adaptor, upstream gene is major capsid protein, just like in phage Adolin. Downstream gene is NKF belonging to an orpham. /note=Primary Annotator #1 Name: Deal, Milena /note=Auto-annotation: Glimmer and GeneMark both show this gene at the same start (6184) and end (6585) sites. The start codon is ATG, which is a more common start codon. /note=Coding Potential: Glimmer and GeneMark both show high coding potential in the same ORF. The gene encompasses all the coding potential and there is high coding potential throughout the gene. /note=SD (Final) Score: -1.954 which is the best score. /note=Gap/overlap: There is a 75 bp gap upstream of the gene, which is on the larger side but not a cause for concern. A gene cannot fit between this gene and the upstream gene. There also isn`t great coding potential in this gap. This is also the longest ORF of the two candidate start sites. Also, in other AZ phage genomes, there is not another gene between this gene and the upstream gene that corresponds to gene 6 in VroomVroom, so overall, another gene does not need to be added. /note=Phamerator: As of 2/6/23, this gene is in pham 68870 with 64 members, 23 of them being draft genomes. Many other phages with a gene in this pham are in cluster AZ, like Adolin, Adumb2043, and Amyev. Also, other members of this pham note that this gene encodes a head-to-tail adapter. /note=Starterator: As of 2/6/23, the autoannotated start site is the most commonly called start site in other phage genomes. The start number called the most often in the published annotations is 7, it was called in 24 of the 40 non-draft genes in the pham. The auto-annotated start site (6184) is the same as the most called start site. /note=Location call: 6184 is the best start site for this gene. The Starterator results showed us that this start site was called the most common, Glimmer and Genemark both agreed on this start site, this start site has the better final score and Z-score out of the two candidate start sites, and finally, this start site (which is the longest ORF) encompasses all the coding potential and has high coding potential throughout the length of the gene. /note=Function call: Many other genes in this pham are labeled as head-to-tail adapters (which we can see in the NCBI and Phagesdb BLASTp results), and these phages are also in cluster AZ with Arthrobacter as their host. There were no CDD hits, but this is not concerning. For HHpred, hits included gp6/gp15/gp16 connector complex of bacteriophage SPP1 and Bacillus protein yqbG. Either of these is required by SEA-PHAGES to determine the function of this protein as head-to-tail adapter, so we have sufficient evidence in determining that our protein is the head-to-tail adapter. /note=Transmembrane domains: DeepTMHMM did not show any transmembrane domains. These results align with the predicted function of the protein because the head-to-tail adapter does not interact with the bacterial cell membrane. /note= /note=Primary Annotator #2 Name: Pisipati, Kirthana /note=Auto-annotation: Both Glimmer and Genemark agree on a start site of 6184. The start codon is ATG, which has a high probability. /note=Coding Potential: Both host trained and self trained Genemark show high coding potential throughout the gene, and all of the coding potential is covered by the start site. /note=SD (Final) Score: -1.954. This is the best score for the start sites. /note=Gap/overlap: The gap is 75bp, which is not enough room to add a new gene. The ORF is 402bp. /note=Phamerator: This gene is in pham 68870 as of 2/6/23, and almost all non draft genomes in this pham belong to cluster AZ (Adolin, JohnDoe, Tuck). Other members of this pham list the function as head to tail adaptor. /note=Starterator: Start site 7 is conserved among 24 of the 40 non draft genomes, which corresponds to 6184 bp. Start site 7 is the auto annotated site and the most commonly called site. /note=Location call: The gene is real and the best start site is 6184, according to annotations in other phage genomes in the same cluster. Also, this start encompasses all of the coding potential and has the best final score. /note=Function call: The predicted function is head-to-tail adaptor. Other genes from the same pham and cluster called this as a head-to-tail adaptor, which was also reflected in the BLASTp and NCBI blast output. Several hits with low e values suggested this function. While there were no hits from CDD, the HHpred hits correspond with the SEA-PHAGES requirement to call this gene as a head-to-tail adaptor: gp6 connector complex of bacteriophage SPP1 and Bacillus protein yqbG. /note=Transmembrane domains: There were no transmembrane domains shown on the DeepTMHMM output, which makes sense for the function of head-to-tail adaptor. CDS 6582 - 6683 /gene="8" /product="gp8" /function="hypothetical protein" /locus tag="VroomVroom_8" /note= /note=SSC: 6582-6683 CP: yes SCS: neither ST: NA BLAST-Start: GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.857, -2.8779978280336396, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Wang, Xinyi /note=Auto-annotation: No auto-annotation start sites are available. Newly added gene. /note=Coding Potential: There is a peak in coding potential within this designated area. /note=SD (Final) Score: -2.878 /note=Gap/overlap: 4 bp overlap with the previous gene stop codon, 13 bp overlap with the next gene (determined from Sequence map) /note=Phamerator: NA, newly added gene /note=Starterator: NA, newly added gene /note=Location call: 6582, real gene /note=Function call: Both phagesDB BLASTp and NCBI BLASTp don`t give confidential results with low enough e-value and reasonably high scores. Also, no significant CDB or HHpred result. The best hit in HHpred says it is an Envelope glycoprotein in the virus that can be related to HIV, which is not likely related to phage function. /note=Transmembrane domains: There is no TMD indicated by the website and PECAAN TMHMM, indicating that this is not a membrane-related protein. /note= /note=Primary Annotator Name: Wu, Grace /note=Auto-annotation: There is no auto-annotation /note=Coding Potential: There are good coding potentials for host-trained GeneMark and self-trained GeneMark in the forward directions. /note=SD (Final) Score: -2.878, this is a good score /note=Gap/overlap: -4 overlap with the previous gene /note=Phamerator: NA /note=Starterator: NA /note=Location call: 6582, this gene is a real gene /note=Function call: There is no CDD hit for this gene. In NCBI BLASTp and phageDB BLASTp, the coverage and probability are both low for all the top hits, and the e-values are high to indicate valuable results. In HHpred, gene function for envelope glycoprotein is given. After detailed analysis, the alignment and e-values are not strong enough to give gene 8 the function of envelope glycoprotein. In additional to that, this function is not approved by SEA-PHAGES, whidh further indicate that the function should be NKF. /note=Transmembrane domains: There is no transmembrane domains. CDS 6670 - 7011 /gene="9" /product="gp9" /function="head-to-tail stopper" /locus tag="VroomVroom_9" /note=Original Glimmer call @bp 6670 has strength 16.84; Genemark calls start at 6670 /note=SSC: 6670-7011 CP: yes SCS: both ST: SS BLAST-Start: [head closure Hc1 [Arthrobacter phage Liebe] ],,NCBI, q1:s1 97.3451% 1.82941E-37 GAP: -14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.881, -3.4175091270247986, yes F: head-to-tail stopper SIF-BLAST: ,,[head closure Hc1 [Arthrobacter phage Liebe] ],,YP_009817041,70.4348,1.82941E-37 SIF-HHPRED: SIF-Syn: Head-to-tail stopper, upstream gene is NKF belonging to an orpham. Downstream gene belongs to pham 68754 (2/23/2023), just like in phage YesChef. /note=Primary Annotator Name: Douglas, Katherine /note=Auto-annotation: 6670 by both Glimmer and Genemark /note=Coding Potential: Covers all the coding potential in the region. The self-guided Genemark has some reverse coding regions that overlap but, overall, the evidence indicates strong coding potential that does not overlap significantly with the genes around it. Additionally, there is strong synteny of this gene across other phages in the AZ cluster such as Adolin and Asa16 and it is sufficiently long enough to code for a protein. There are also many very strong BLAST hits within the AZ cluster. All of this suggests this is a real gene and has coding potential. The chosen start site covers all coding potential. /note=SD (Final) Score: -3.148 (least negative value. reasonable) /note=Gap/overlap: -14 (a bit of overlay but nothing too drastic. reasonable) /note=Phamerator: Pham 67302 as of 2/6/2023. Pham present in Adolin and Asa16, both in AZ cluster. Function called as a head-to-tail stopper. /note=Starterator: Reasonable start site that is conserved in 33/40 nondraft phages is start site #6 at bp 6670. /note=Location call: This is a real gene with a start site at 6670. There is coding potential and the gene has synteny. The start site is annotated in 33/40 of the nondraft phages and covers the entire coding region with a slight but acceptable overhang of 14bp. There are also good BLAST hits that share a cluster and function with each other. The RBS is -3.418 and the Z-score: 2.881, both of which are good scores. The start codon is ATG which has a high probability. /note=Function call: Based on BLASTp results in both Phagesdb and NCBI, this protein is a head-to-tail stopper. Other phages in the AZ cluster (Liebe, Maureen, Tuck) all called this protein as a head-to-tail stopper. There are also several other arthrobacter phages in both databases with strong E values that call the protein as a head-to-tail stopper. CDD had no hits. HHpred showed strong alignment of several hits including SPP1_16 with an Evalue of 5.9x10^-16. According to the SEA-PHAGES database, this is one of the requirements needed to assign the head-to-tail stopper function to a gene. /note=Transmembrane domains: 0 TMDs detected. This makes sense as a head-to-tail stopper is used for the assembly of the new phage capsids and stopping newly packaged DNA from slipping out of the capsid. It therefore does not need to interact with the bacterial membrane. /note= /note=Primary Annotator Name #2: Reyimjan, Diana /note=Auto-annotation: GeneMark and Glimmer both predicted this gene. Both determined the start site as 6670 with an ATG start codon. /note=Coding Potential: The gene has good coding potential in the first ORF. The start site covers all the coding potential. Coding potential is found in both self-trained and host-trained GeneMark. /note=SD (Final) Score: -3.418. This is the most positive SD value and therefore the best out of all the start sites. /note=Gap/overlap: -14. A bit large for an overlap, but acceptable. This overlap is also seen in non-draft genomes like Eraser and Maureen, so it is not concerning. /note=Phamerator: As of 2/7/23, this gene belongs to pham 67302. Other members of the cluster AZ have this pham, such as Eraser and Lego. The phams database show that the function called for this gene is the head-to-tail stopper, which is consistently called and is found in the approved function list. /note=Starterator: Start site #6 is conserved among members of the pham. This start site corresponds to 6670 in VroomVroom. There are 64 members of this pham, with 40 being non-draft genomes. There are 33/40 non-draft genomes call this start #. /note=Location call: Gathered evidence suggests that the gene is real. Not only does it have good coding potential, but it has good BLAST hits in non-draft phage genomes of the same cluster and there is synteny. The potential start site is 6670. Gap is now -14 (used to be 84) because gene was added, which is quite an overlap. However, such a gap is possible. /note=Function call: Based on informative BLAST hits from phages Liebe and Maureen (<1e-32) as well as HHPred hits with low e-values (<1e-12), this is a head-to-tail stopper gene. It has homology with phage SPP1`s gp16 protein, with a coverage of 97.3451% and e value of 1.14 e-16. CDD had no hits. /note=Transmembrane domains: 0 domains predicted, which is expected for head-to-tail stopper protein because its function is to keep the newly packaged phage genome in the caspid, and therefore does not need to interact with the cell wall. CDS 7020 - 7301 /gene="10" /product="gp10" /function="hypothetical protein" /locus tag="VroomVroom_10" /note=Original Glimmer call @bp 7020 has strength 17.62; Genemark calls start at 7020 /note=SSC: 7020-7301 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_REEDO_11 [Arthrobacter phage Reedo]],,NCBI, q1:s1 100.0% 1.69484E-18 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.733, -3.7234890726758456, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_REEDO_11 [Arthrobacter phage Reedo]],,UJQ86801,63.3663,1.69484E-18 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name #1: Estampa, Julia /note=Auto-annotation: Glimmer and GeneMark both call the gene and indicate that the start site is at 7020 bp. The start codon is ATG for both. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. Host-Trained and Self-Trained GeneMark both reflect similar high coding potential that is consistent with the ORF. The start site covers all the coding potential. The gene sequence length is 93 bp. /note=SD (Final) Score: The SD score is -3.723 and is the best from the list. /note=Gap/overlap: The gap is 8 bp long upstream of the gene, which is small and reasonable. The length of the gene is acceptable. /note=Phamerator: Pham: 68754. Date found: 02/06/23. It is conserved; found in phage Adolin (AZ), phage YesChef (AZ), phage Warda (AZ) /note=Starterator: Start site 27 in Starterator was manually annotated in 36 of the 82 non-draft genes in this Pham. Start number 27 (start @7020) has been found in 58 of 111 (52.3%) of genes in Pham. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Gathered evidence suggests this is a real gene with good coding potential and that the strongest candidate for the start site is 7020 bp. /note=Function call: NKF. The majority of phagesDB BLAST hits revealed “function unknown” for phages with E-values ranging from 7e-45 to 1e-06. NCBI BLAST hits revealed the same. While PhagesDB function frequency suggested “minor tail protein” and “head-to-tail connector protein,” the phages with such function are of a different Pham and cluster, and have very high E-values, and thus provide weak evidence for this gene’s function. No CDD hits returned. No relevant HHpred results with the top hit having a poor sequence alignment score of 27.48 and a high E-value of 1.5. Thus, it’s most reasonable to assume NKF. /note=Transmembrane domains: Since there are no predicted TMHs or TMDs returned from DeepTMHMM, it is not a membrane protein. /note= /note=Primary Annotator #2: Robles, Angel /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on the start site at 7020 bp. Both start codons are ATG. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.723 . This is the best from the list. /note=Gap/overlap: The gap is 8 which is small and reasonable. /note=Phamerator: 68754 on 02/07/23. Pham number 68754 has 111 members, 29 are drafts. It is found in Yang_11. /note=Starterator: The start number called the most often in the published annotations is 27, it was /note=called in 36 of the 82 non-draft genes in the Pham. Found in 58 of 111 ( 52.3% ) genes in Pham. Called 100.0% of the time when present. It is found in AEgle_11 (AZ), AGrandiflora_11 (AZ), /note=(Start: 27 @7020 has 36 MA`s), (34, 7053), (35, 7062), (37, 7080), (51, 7176), /note=Location call: A real gene with a start site at 7020. /note=Function call: Function unknown. The top phagesdb BLAST hits have an unkown function (E-value greater than< 10^-45), and the NCBI BLAST hits also have an unknown function.. HHpred had a hit for unknown function with 92.3% probability, 75% coverage, and E-value of 1.7. CDD had no relevant hits. /note=Transmembrane domains: TMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS 7298 - 7693 /gene="11" /product="gp11" /function="tail terminator" /locus tag="VroomVroom_11" /note=Original Glimmer call @bp 7298 has strength 9.88; Genemark calls start at 7298 /note=SSC: 7298-7693 CP: yes SCS: both ST: NI BLAST-Start: [tail terminator [Arthrobacter phage Lizalica]],,NCBI, q2:s4 99.2366% 1.34736E-44 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.532, -4.3934175870462076, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage Lizalica]],,UIW13496,71.5328,1.34736E-44 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,96.9466,99.0 SIF-Syn: Tail terminator, upstream gene is in pham 68754, downstream is major tail protein, just like in phage Adumb2043. /note=Primary Annotator #1 Name: Gowdy, Griffin /note=Auto-annotation: Glimmer & Genemark; same start site @ 7298; start codon: GTG /note=Coding Potential: Reasonable, agreed upon between self- and host- trained algorithms. Chosen start included. /note=SD (Final) Score: Final score = -4.393. This is the best possible score. However this score is likely irrelevant, as stop@7693F is likely transcribed with its upstream neighbor. /note=Gap/overlap: There is an overlap of -4, which is reasonable for a polycistronic operon. /note=Phamerator: Pham 68868; 2/6/23. This pham is well conserved in clusters AZ and EH. Currently, there are 40 non-draft members of this pham, including in phages Powerpuff, Reedo, DrSierra, and Amyev. The function tail-terminator is consistently called for this gene. /note=Starterator: This gene does not share the most conserved start site (Starterator site #2), however, start at 7298 (Starterator site #4) falls very close. Per Starterator, 20/40 call site #2. While site #4 was not the most annotated start, Starterator data was still useful in deciding upon start #4 due to its proximity to #2. /note=Location call: All together, there is strong evidence to support this gene being real, and that the start site is at 7298. This start site includes all coding potential, results in a favorable overlap, is nearest to the most annotated start in Starterator, and results in the longest ORF. /note=Function call: Predicted function is tail terminator, multiple phagesDB and NCBI BLASTp hits with low e-values (less than e-35) and % identities > 35%. My rationale for this function is synteny with other tail terminators, frequent BLASTp hits with other tail terminators, and an HHpred alignment with structure 5A21_G as is required by SEA-PHAGES per case study “portal and head-to-tail connectors” from the 2018 faculty meeting. /note=Transmembrane domains: None per DeepTMHMM. This supports the function tail terminator, as these proteins have no need to associate with membranes (only other proteins). /note= /note=Primary Annotator #2 Name: Rodriguez, Justin /note=Auto-annotation: Glimmer and GeneMark called the same start site of 7298. The start codon is GTG. /note=Coding Potential: The gene has reasonable coding potential for both Glimmer and GeneMark. It covers the majority of the ORF in both cases. The coding potential starts soon after the start site. /note=SD (Final) Score: -4.393 and it is the best one /note=Gap/overlap: -4 which is less than a maximum 50 bp gap. The overlap is representative of an operon. /note=Phamerator: 68868, 2/7/2023. 51 other AZ cluster phages were in the pham. Adolin and Cassia are examples. The function of tail terminator was called for the majority of pham members, but not for VroomVroom. /note=Starterator: The start site is (4, 7298). There is a reasonable start site but it is not found in any other phages of the pham. There are 63 other members in this pham and 47/64 call the start number of 2. Starterator overall is somewhat uninformative at least in regard to VroomVroom because it does not call the most-annotated start site. VroomVroom`s recommended start site is close to to the most annotated one so that is helpful. /note=Location call: Even though starterator isn’t very informative, this is most likely a real gene taking other things into consideration like coding potential and calls from Glimmer and GeneMark. The start site at 7298 seems most likely. /note=Function call: The predicted function is a tail terminator. PhagesDB and NCBI Blast both supported the function with significant hits with e-values smaller than 1e-3 and identities bigger than 35%. HHpred showed that the best hit overall was for a tail terminator gene, and other hits are related even though they don’t code for tail proteins. This ORF also aligns with 5A21_G (not shown in my best PDB hits but still 99% probability) which the Approved Functions List says is required for it to code for a tail terminator. /note=Transmembrane domains: The absence of TMDs makes sense if this protein is a tail terminator, meaning it would be either on the inside or outside of the cell but not within the membrane. In pham maps there is a gene that codes for a major tail protein downstream of this one. Other phages in the AZ cluster show synteny with this such as DrSierra. Therefore, the genome location of this tail terminator makes sense. CDS 7705 - 8259 /gene="12" /product="gp12" /function="major tail protein" /locus tag="VroomVroom_12" /note=Original Glimmer call @bp 7705 has strength 16.56; Genemark calls start at 7705 /note=SSC: 7705-8259 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage Janeemi]],,NCBI, q1:s1 97.8261% 9.77604E-85 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.285, -1.993391246735709, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage Janeemi]],,UVK63534,80.3279,9.77604E-85 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_J,90.7609,98.2 SIF-Syn: Major Tail protein, the upstream gene is tail terminator, downstream is tail assembly chaperone, as seen in phage Janeemi. /note=Primary Annotator Name: Hamid, Bilal /note=Auto-annotation: Genemark and Glimmer agree on 7705 /note=Coding Potential: Very high coding potential seen in both host and self-trained Genemark. matches ORF. /note=SD (Final) Score: -1.993 is the best final score. /note=Gap/overlap: A gap of 11 provides enough space for a separate SD sequence making sense for a non-operon gene. /note=Phamerator: 02/07/23 - pham is 65730 with 51 other cluster AZ phages of the 118 phages represented in this Pham (26 are drafts). Within the cluster, it is always called as a major tail protein such as by phages DrManhattan and Janeemi. /note=Starterator: Indicates start site 8 @7705, the auto annotate site, is likely the best start site as it minimizes the gap between the previous gene and the start of this one. 88 of 90 MAs for non-drafts called this site Start site 6 was called the other 2 times (100% of the times it appeared, but this gene does not have that start site. /note=Location call: 7705 is most likely the best location call based on starterator and phamerator data. /note=Function call: Based on the consistent and low e-value results from the two sources, I believe we have adequate data to hypothesize the function. CDD did not provide any hits for this gene. I believe the function is a major tail protein based on the HHpred blast results. Additionally, all genes called in this pham were major tail proteins consistent with the best blast data in addition to the strongest HHpred Hits from both PDB and Pfam. /note=Transmembrane domains: 0 transmembrane domains were determined by DeepTMHMM /note= /note=Primary Annotator Name: LastName, FirstName /note=Auto-annotation: /note=Coding Potential: /note=SD (Final) Score: /note=Gap/overlap: /note=Phamerator: /note=Starterator: /note=Location call: /note=Function call: /note=Transmembrane domains: CDS 8366 - 8632 /gene="13" /product="gp13" /function="tail assembly chaperone" /locus tag="VroomVroom_13" /note=Original Glimmer call @bp 8366 has strength 17.77; Genemark calls start at 8366 /note=SSC: 8366-8632 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage VResidence]],,NCBI, q1:s1 100.0% 4.63427E-29 GAP: 106 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.285, -2.0111200136961407, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage VResidence]],,UYL87619,73.8636,4.63427E-29 SIF-HHPRED: Phage_TAC_10 ; Phage tail assembly chaperone,,,PF10963.11,79.5455,95.0 SIF-Syn: Tail assembly chaperone, upstream gene is second tail assembly chaperone, downstream is a major tail protein, just like in phages Liebe and Maureen /note=Primary Annotator #1: Hernandez, Edgar /note=Auto-annotation: Glimmer and GeneMark both display the same start site #8366, with the start codon AGT. /note=Coding Potential: There is a reasonable amount of coding potential present in both the Host-Trained and Self GeneMark. The chosen start site includes all the coding potential. /note=SD (Final) Score: There’s an SD Final Score of -2.011, which is indicative of a higher sequence match Meanwhile, the Z-score is 3.285, which is good since anything higher than 2 indicates that the RBS was above the mean. /note=Gap/Overlap: A gap of 106 indicates that there’s no overlap present because it is relatively small, and it also doesn’t include other potential start sequences. The gap seems to be somewhat conserved when compared to the other genes in the genome. /note=Phamerator: The gene is located in Pham 27587, and there are other AZ cluster group members like Powerpuff and Rowa, which were used to compare synteny with VroomVroom. The function associated with the gene is a tail assembly chaperone. /note=Starterator: There’s strong evidence suggesting that the start site 5 at #8366 is conserved across all members of Pham 27587 because 38 out of 45 final genes have claimed it as a real start site. /note=Location call: Based on all the pieces of evidence gathered from Pharmerator, Starterator, synteny comparison, and coding potential, the possible start site for the gene is #8366. /note=Function call: PhagesDB and NCBI BlastP both called for significant hits with tail assembly chaperone as the gene function. Hits Maureen_13 and Liebe_13 yielded significant low E-values when examined with the AZ cluster. Additionally, CDD did not provide any useful information seeing as no hits were found. However, PDB HHpred suggested a hit with a 95% probability that also possessed a relatively small E-value. Therefore, the gene functions as a tail assembly chaperone. /note=Transmembrane Domains: TMHMM predicted 0 transmembrane domains and as a result, the gene is definitely not a membrane protein since 2 transmembrane domains are required. /note= /note=Primary Annotator #2: Sacristan, Ariana /note=Auto-annotation: Glimmer and Gene Mark agree on 8366 as a start site (ATG) /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF that encompasses the chosen start site. /note=SD (Final) Score: This is the best Final Score of -2.011 and is supported by a high Z-score of 3.285. /note=Gap/overlap: The 106 gap is reasonable as it is too small to indicate a missing gene and does not contain other possible start sequences that contain coding potential or ORFs. This start site also exhibits the true LORF. /note=Phamerator: As of 02/06//2023 this gene is located in Pham 27587. This Pham has several other AZ cluster group members, such as Maureen and Liebe, which I used to compare against VroomVroom. The function consistently called in Phamerator for this gene was tail assembly chaperone. /note=Starterator: There is a reasonable start site choice that is conserved among the Pham 27587 members. The start site number conserved in this pham is 5, which begins at 8366, and is called by 38/45 non-draft genes. /note=Location call: The gathered evidence suggests that this is a real gene that most likely starts at 8366. /note=Function call: The predicted function of this ORF is a tail assembly chaperone. NCBI and Phages DB BLASTp provided several strong hits, the top two hits from each database have tail assembly chaperone as the called function and with corresponding E-values below 5e-25. HHPred also had a hit for a tail assembly chaperone with a probability of 95%, a coverage of 80%, and an E-value of 0.7. There were no hits on CDD. /note=Transmembrane domains: There were no predicted TMDs by TMHMM, therefore it is not a membrane protein. CDS join(8366..8626,8626..8973) /gene="14" /product="gp14" /function="tail assembly chaperone" /locus tag="VroomVroom_14" /note= /note=SSC: 8366-8973 CP: yes SCS: neither ST: NI BLAST-Start: [tail assembly chaperone [Arthrobacter phage VResidence]],,NCBI, q1:s1 97.5247% 8.01412E-77 GAP: -267 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.285, -2.0111200136961407, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage VResidence]],,UYL87620,71.2195,8.01412E-77 SIF-HHPRED: SIF-Syn: Tail assembly chaperone; upstream gene is tail assembly chaperone preceded by major tail protein, downstream is tape measure protein followed by minor tail protein, just like in phages Crewmate, Reedo, and Adumb2043. Overall, this synteny supports the functional call of this gene as a tail assembly chaperone. /note=Primary Annotator #1: Hoang, Ryan /note=Auto-annotation: Glimmer states that the start site is likely at 8,701 bp. GeneMark does not have a start site. /note=Coding Potential: Most of the ORF has a very high coding potential on both the GeneMark self and host, which indicates that the gene is an actual real gene. There is coding potential only in the 1st open reading frame for the first gene. The one downside to this coding potential is that the current start site does not encode for all the coding potential. However, as it is the only start site that encapsulates most of the coding potential, we`d have to go with this one. /note=SD (Final) Score: The final score is -5.143. This is not the highest final score, with another start site having a final score of -4.720. However, given the gene`s uniqueness, the large amount of coding potential present at the original start site, the fact that there would be a large gap at the alternative start site, and the fact that the original start site has a larger Z-score than the alternative, we`d continue with this start site. /note=Gap/overlap: 68 bp gap from the upstream gene 13, and 10 bp gap for the downstream gene, 15. While the gap is rather large from the upstream gene 13, the gap for the downstream gene really isn’t too large. Furthermore, there are no other suitable start codons that would decrease the 68bp gap, indicating that the selected start site is accurate. /note=Phamerator: The pham the phage was in was 61451. The pham was run on 2/6/2023. The pham that the gene had was indeed conserved in other members of the AZ cluster, which VroomVroom belongs. I used Acela as the main phage of comparison. Furthermore, all the members of the pham were indeed draft phages, indicating that the phamerator function might not be the most helpful thing to use.The phamerator did not have a function for this gene. /note=Starterator: There were other start sites that were conserved amongst other members of the pham. The start site was 3 and 4 for the other members of the pham. Unfortunately, VroomVroom doesn’t actually contain this start site. There were 4 other members of the pham, with 2 calling start site 3 and 2 others calling start site 4. Overall, Starterator was uninformative because most of the other members of the pham were draft annotations, and also VroomVroom simply did not contain the start site that the other members of the pham had. /note=Location call: There is indeed enough evidence to suggest that there is indeed a correct start site here. Despite some coding potential prior to this gene, the uniqueness of tail assembly chaperones might explain for the coding potential seen prior to this gene. Furthermore, the gap of 68 bp is reasonable enough. Overall, this location start call should be accurate. Additional information suggests we should keep the original start site. Indeed, there is some good coding potential that is missed as a result of there being no conserved start site at 3 or 4. However, overall, the gene does appear to be real, and the auto-annotated start site at 8701 appears to be correct due to there being no additional information from Starterator or Phamerator that provides us with alternative information. /note=Function call: Based on the information from NCBI Blast and the PhagesDB Blast, the function of this protein appears to be a tail assembly chaperone protein. This information is not corroborated with CDD and HHpred, however. Blast results had low E-values as well at 1e-26 and 1e-25. On the other hand, CDD gave no hits back, and HHpred had hits that only gave information back that suggested that the protein was a NKF protein. Still, based on NCBI Blast and PhagesDB Blast, I think we can still call this as a TAC protein. Additionally, its position amongst other genes also suggests this pattern of synteny, which lends evidence to the idea that this is a tail assembly chaperone. Furthermore, Blast results had low E-values as well at 1e-26 and 1e-25. We cannot call it as a membrane protein due to its lack of transmembrane domains. /note=Transmembrane domains: There were no transmembrane domains, and as a result, we cannot call it as a membrane protein. /note= /note=Primary Annotator #2: Scriven, Savannah /note=Auto-annotation: Glimmer calls the gene at start site 8701bp. GeneMark does not call the gene. /note=Coding Potential: Most of the ORF has high coding potential on the forward strand in both GeneMark Self and Host, indicating that this is a real forward gene. Start site does not include all coding potential; high coding potential upstream of start site. /note=SD (Final) Score: -5.143. This is not the best final score on PECAAN, but the Z score is highest at 2.362. While the alternate start site has a higher final score, the alternate start site would introduce a 290bp gap which is not favorable. /note=Gap/overlap: There is a reasonable gap of 68bp and gene has a reasonable length. For now, called start site is the most reasonable option as the alternate site has a gap of 290bp, which is too large to justify. however, there is synteny and coding potential evidence to suggest that this gene could be combined with the previous gene @stop 8632. /note=Phamerator: Pham 61451; entirely comprised of draft AZ phages (e.g. TforTroy, Wildwest, Cassia). No function called. /note=Starterator: Not informative. This gene does not have the most commonly annotated start site. /note=Location call: Above evidence suggests this is a real gene at start site 8701. As this start site does not cover all coding potential and synteny data shows this gene may be longer, it is possible this gene should be combined with gene @stop 8632. /note=Function call: Predicted function is tail assembly chaperone based on hits from NCBI Blastp and phagesDB BLAST. Top 2 hits had high query coverage (>96%), medium % identity (57%), and low E-values (1e-26 & 2e-25). CDD found 0 hits and HHpred had uninformative hits, lowest e-value was 0.32. Based on high synteny (see box below), this gene is likely a tail assembly chaperone. /note=Transmembrane domains: 0 TMDs predicted by deep TMHMM. CDS 8983 - 11226 /gene="15" /product="gp15" /function="tape measure protein" /locus tag="VroomVroom_15" /note=Original Glimmer call @bp 8983 has strength 11.93; Genemark calls start at 8983 /note=SSC: 8983-11226 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Arthrobacter phage Tweety19]],,NCBI, q1:s1 93.5743% 0.0 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.126, -4.325675514574479, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Arthrobacter phage Tweety19]],,QNO12677,67.7207,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_AF,83.1325,99.7 SIF-Syn: Tape measure protein. Upstream gene is major tail protein, downstream is minor tail protein, just like in phage London. /note=Primary Annotator Name: Kim, Cindy /note=Auto-annotation: Both Glimmer and GeneMark call the same start site at 8983 bp. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene on both the Host and Self GeneMarks. /note=SD (Final) Score: -4.326. It is not the best but still one of the most favorable scores. /note=Gap/overlap: Gap: 0 bp. There is no overlap between upstream and downstream genes. This is reasonable and conserved in other phage genomes (YesChef, Powerpuff). /note=Phamerator: 2/3/23: Pham68667. It is conserved, found in Aquarius and AppleCider, which are in the same AZ cluster as VroomVroom. /note=Starterator: 75 of 155 non-draft genes call start site 15, which is not found in this gene. Start site 17 was a better fit and was most manually annotated, corresponding to a start site at 8983 bp which is in agreement with Glimmer and GeneMark predictions. /note=Location call: Based on the above evidence, this is most likely a real gene with a likely start site at 8983 bp. /note=Function call: Tape measure protein. PhagesDB BLAST gave around 7 hits with E values of 0, and NCBI Blastp also yielded at least 7 hits with E values of zero and identities of 55-59%. There were no significant CDD hits, but on HHpred there was a significant hit from 6V8I_AF (99.7% probability, e value of 4.8e-9, 83.1325% coverage). /note=Transmembrane domains: DeepTMHMM predicted zero TMDs. Therefore, this gene cannot be said to be a membrane protein. /note= /note=Primary Annotator #2 Name: Smith, Steven /note=Auto-annotation: Both GeneMark and Glimmer called the same start site at 8939 /note=Coding Potential: The Host-trained and the Self-trained GeneMark both show good coding potential in the region before the stop codon. /note=SD (Final) Score: -4.326, while it is not the best final score shown it is still very good. /note=Gap/overlap: There is a 9 bp gap which is very similar to other gaps in the genomes of other AZ cluster phages. /note=Phamerator: 2/7/23. Pham 68667 which is found in multiple other AZ cluster phages like Amyev and Asa16. /note=Starterator: The most frequently called non-draft start site is 15, however our gene does not contain this start site. The called start site is 17 which has 29 of 155 MA`s and it has been called 100% of the time when present. /note=Location call: Looking at the above evidence this call is most likely a functional gene starting at 8983. /note=Function call: Tape measure protein. CDD and HHpred both gave strong results and had multiple hits of very low e-values that were tape measure proteins. /note=Transmembrane domains: There were no TMDs predicted by DeepTMHHM so this cannot be a transmembrane domain. CDS 11238 - 12098 /gene="16" /product="gp16" /function="minor tail protein" /locus tag="VroomVroom_16" /note=Original Glimmer call @bp 11238 has strength 12.86; Genemark calls start at 11238 /note=SSC: 11238-12098 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Arthrobacter jiangjiafuii] ],,NCBI, q1:s4 99.3007% 6.0685E-80 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.274, -1.953940808934884, yes F: minor tail protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter jiangjiafuii] ],,WP_210231523,63.8889,6.0685E-80 SIF-HHPRED: HYPOTHETICAL PROTEIN 19.1; VIRAL PROTEIN, DISTAL TAIL PROTEIN; 2.95A {BACILLUS PHAGE SPP1},,,2X8K_C,98.951,100.0 SIF-Syn: Thomas: Minor tail protein, upstream gene is tape measure protein, downstream is minor tail protein, just like phage Adumb2043_16. Kaemin: Based off of the synteny this is a minor tail protein, with a gene upstream being a tape measure protein. Downstream of the gene is also usually another minor tail protein, similar to phage Adumb2043_16. /note=Primary Annotator #1 Name: Kretschmer, Thomas /note=Auto-annotation: Both GLIMMER and GENEMARK estimated the start site to be 11238. The start codon is expected to be ATG /note=Coding Potential: The gene has reasonable coding potential, and the expected start site covers all of this coding potential /note=SD (Final) Score: This start site has the highest Final score, -1.954 /note=Gap/overlap: The gap with the upstream gene is 11. This is very reasonable. The length of the gene is also reasonable, and this is the start site that leads to the longest ORF with no overlap. /note=Phamerator: 2/10/2023: 67415. This Pham is conserved in most phages in this cluster. Adumb2043_17, Amyev_17, and Aoka_18 all share this conserved. This is a minor tail protein pham /note=Starterator: There is a reasonable start site but it is not conserved within other phages in the cluster. The start site is 11238 and corresponds to ATG. Start 4: 2 of 44 call this start site. /note=Location call: This suggests the gene is real, and starts at 11238. /note=Function call: Minor Tail Protein /note=Transmembrane domains: None. This makes sense considering the tail protein function of the gene. This is marked as an "outside" gene. /note= /note=Primary Annotator #2 Name: Kaemin Tosasuk /note=Auto-annotation: Both GLIMMER and GENEMARK agree that the start site is at 11238. The start codon is expected to be ATG. /note=Coding Potential: The gene has reasonable coding potential on both the host-trained and self-trained gene mark. /note=SD (Final) Score: This start site has a score of 3.274, which is the highest possible final score. /note=Gap/overlap: There is a gap upstream of the gene which is 11 bp long. This is very reasonable. This start site leads to the longest ORF with no overlap. /note=Phamerator: 2/10/2023: Pham 67415. The gene is conserved in most phages in this cluster, includling Adumb2043_17, Amyev_17, and Aoka_18. This is a minor tail protein Pham. /note=Starterator: Start site 11238 is a reasonable start site but it is not conserved within other phages in the cluster. The start site is 11238 and corresponds to ATG. Start 4: 2 of 44 call this start site. /note=Location call: This suggests the gene is real, and starts at 11238. /note=Function call: Minor tail protein. As this gene is found conserved in most phages within this cluster. Furthermore, all of the hits obtained from BlastP, CDD hits, and HHpred, indicate that this is a minor tail protein. /note=Transmembrane domains: There were no transmembrane domains indicating that this is not a membrane protein. This makes sense as this protein is most likely a minor tail protein, therefore, it would be an "outside" gene. CDS 12110 - 13087 /gene="17" /product="gp17" /function="minor tail protein" /locus tag="VroomVroom_17" /note=Original Glimmer call @bp 12110 has strength 12.56; Genemark calls start at 12110 /note=SSC: 12110-13087 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Brevibacterium phage LuckyBarnes] ],,NCBI, q1:s1 99.6923% 8.37013E-60 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.285, -1.993391246735709, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Brevibacterium phage LuckyBarnes] ],,YP_009792200,53.7143,8.37013E-60 SIF-HHPRED: Receptor Binding Protein; beta sandwich domain, phage receptor binding protein, Lactococcus lactis pellicle cell wall polyphosphosaccharide, VIRAL PROTEIN; 1.75A {Lactococcus phage 1358},,,4L9B_A,51.6923,99.3 SIF-Syn: Upstream gene is minor tail protein, downstream is minor tail protein, just like in many other phages such as Kaylissa, Amyev, Yang, Tuck, and Powerpuff. /note=Primary Annotator #1 Name: Le, Vivian /note=Auto-annotation: Both Glimmer and GeneMark call the start at 12110. /note=Coding Potential: Reasonable coding potential is found for both GeneMark self and host. The start site also covers all the coding potential. There was no overlap either. /note=SD (Final) Score: -1.993. It is the best final score on PECAAN. /note=Gap/overlap: 11 bp. This is a reasonable gap and there is no coding potential in the gap to suggest that a new gene should be added. /note=Phamerator: The pham number as of 02.07.2023 is 49124. The gene is conserved in Yang (AZ) and YesChef (AZ). /note=Starterator: The start number 4 was called in 24/24 non-draft genes in the pham. Start 4 is 12110 in VroomVroom. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 12110. Starterator agrees with Glimmer and Genemark. /note=Function call: Based on the PhagesDB BLASTp and NCBI BLASTp, there is strong evidence for the function to be a minor tail protein. There were no strong CDD or HHpred hits, however, based on the BLAST hits, e-scores, and synteny, there is strong evidence for minor tail protein. The protein also seems to be involved in receptor binding based on the evidence with HHpred, where it is aligned with a receptor binding protein. /note=Transmembrane domains: There were 0 predicted TMD`s. The topology graph also showed the gene/protein to only be on the outside. This makes sense, because we predicted that the gene is minor tail protein involved with receptor binding. If this is the case then, it would not need to cross the membrane and would need to be on the outside to bind with other cells. /note= /note=Primary Annotator #2 Name: Tran, Krysten /note=Auto-annotation: Glimmer and GeneMark; both agree on the same start site (12110); start codons called - ATG and GTG /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF and the start site does cover all the coding potential. /note=SD (Final) Score: -1.993 is the most favorable score on PECAAN. /note=Gap/overlap: There is a gap of 11 bp, which is a reasonable gap and not large enough to indicate a new Phamerator: The gene is found in the pham 49124 as of 02/08/23. The gene is conserved in other members of the cluster AZ, such as Tuck and AEgle. /note=Starterator: The start number 4 was called in all 24 of the 24 non-draft genes in the pham. Start number 4 is at position 12110 in this phage. This was the start site with the most manual annotations and is consistent with the start sites called by both Glimmer and GeneMark. /note=Location call: Based on all the evidence gathered, the start site for this gene is likely at 12110 (start codon ATG). /note=Functional call: Based on the data from PhagesDB BLASTp and NCBI BLASTp in addition to the PDB hit from HHpred, I hypothesize the function of the gene is a minor tail protein. The PDB hit from HHpred was strong because it had a high probability of 99.34, a high percent coverage of 51.6923%, and a low E-value of 3.7e-11. /note=Transmembrane domains: The hypothesized function for the gene is a minor tail protein, specifically one that may be involved in recognizing receptors on the host cell. Therefore, it makes sense that there are no TMDs for this ORF. Since the specific hypothesized function of the minor tail protein is a receptor binding protein, this phage most likely infects by binding to the cell receptor, so it does not need to cross a membrane to infect the cell, and would not have a transmembrane domain. CDS 13087 - 14322 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="VroomVroom_18" /note=Original Glimmer call @bp 13087 has strength 13.99; Genemark calls start at 13087 /note=SSC: 13087-14322 CP: yes SCS: both ST: NI BLAST-Start: [Tail protein [Brevibacterium phage Rousseau]],,NCBI, q2:s4 99.0268% 1.57411E-85 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.235, -6.166667396375102, no F: minor tail protein SIF-BLAST: ,,[Tail protein [Brevibacterium phage Rousseau]],,CAH1193710,61.3941,1.57411E-85 SIF-HHPRED: Tail protein, 43 kDa; tail protein, structural genomics, PSI, MCSG, Protein Structure Initiative, Midwest Center for Structural Genomics, UNKNOWN FUNCTION; 2.1A {Neisseria meningitidis MC58} SCOP: b.106.1.1,,,3D37_A,93.1873,99.4 SIF-Syn: Minor tail protein, upstream and downstream genes are also minor tail proteins, just like in phage Lizalica. /note=Primary Annotator #1 Name: Li, Anna /note=Auto-annotation: Both Glimmer, GeneMark; agree at the same start site (site #: 13087); start codon called: ATG and GTG /note=Coding Potential: Yes. gene has reasonable coding potential. Chosen start site covers all of coding potential. /note=SD (Final) Score: -6.167 (not best, but irrelevant for start call because of -1bp overlap suggesting gene is part of an operon) /note=Gap/overlap: -1bp, suggesting that the gene is part of an operon; length of gene given overlap is reasonable /note=Phamerator: 67210 as of 2023-02-06, other non-draft members of subcluster (AZ) contained this pham (i.e. Adumb2043, Amyev) /note=Starterator: Most manually annotated start site 14 in 25/62 non-draft genomes. Start site not present in VroomVroom. No manually annotated start sites in VroomVroom. The start site 11 (@13087) called in VroomVroom is present in Emotion. /note=Location call: Likely a real gene with start site @13087 based on agreeing data between Starterator, coding potential and synteny /note=Function call: minor tail protein: Both PhagesDB and NCBI BlastP call hits with minor tail protein function with significantly small E-values (<10^-68) and within phages in the same cluster as VroomVroom (AZ). CDD does not provide useful information; only PDB HHpred hits were useful in suggesting tail protein function between genes 3D37, 3CDD and 3GS9 with acceptable E-values (<10^-8), high probabilities (>99%) and high gene coverage (>90%). SEA-PHAGES requirement of the presence of collagen-like or glycine-rich proteins is not fulfilled, but the gene is within a portion of the genome that also codes for minor tail proteins. /note=Transmembrane domains: TMHMM does not call any transmembrane domains in this region of the genome. /note= /note=Primary Annotator Name: Unanwa, Nnaemeka /note=Auto-annotation: Start site at 13087 in both Glimmer and Genemark. ATG is predicted, which is a common start codon. /note=Coding Potential: Yes, there is coding potential found where the gene is located. The Glimmer and GeneMark start site covers the entire coding potential. /note=SD (Final) Score: -6.167. While this is a low score, it is likely part of an operon due to the -1 BP overlap. /note=Gap/overlap: -1 BP, which indicates that this gene is likely part of an operon. This means that this overlap is reasonable. /note=Phamerator: Pham 67210 on 1/27/23. Conserved in Warda_19 and YesChef_19. /note=Starterator: The most annotated start site is site 14, but this start site does not exist in our phage`s gene.There are no manual annotations for any of the other start sites on our phage`s gene. The start site 11 (@13087) is also present in Emotion_18. /note=Location call: This is most likely a real gene at start 13087 due to synteny, coding potential, following annotation guideline rules, etc. /note=Function call: This gene seems to be a minor tail protein gene. Evidence from phagesDB BLAST search include LuckyBarnes_26 and Janeemi_20, which have are known to be minor tail protein genes and have an e-value of 5e-71 and 4e-68, respectively. These are strong e-values. Evidence from NCBI BLASTp includes Tail protein [Brevibacterium phage Rousseau] and minor tail protein [Brevibacterium phage LuckyBarnes], with e-values of 2e-85 and 2e-82 respectively (which are also strong hits). CDD does not yield any useful information. HHpred hits 3D37_A and 3CDD_C (e-values of 1.1e-9 and 1.2e-9, respectively. Both have >99% probability) have results that relate to tail protein function. All of this evidence leads me to believe that this gene encodes for a minor tail protein. /note=Transmembrane domains: No transmembrane domains detected, meaning that this is not a membrane protein. CDS 14338 - 17349 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="VroomVroom_19" /note=Original Glimmer call @bp 14338 has strength 12.84; Genemark calls start at 14338 /note=SSC: 14338-17349 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage Powerpuff]],,NCBI, q497:s502 50.349% 3.27844E-65 GAP: 15 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.347, -4.008839636156049, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage Powerpuff]],,QGZ17318,27.8788,3.27844E-65 SIF-HHPRED: SIF-Syn: #2 Primary Annotator Name: Vajragiri, Shreya This minor tail protein is downstream of 3 other minor tail proteins preceded by the tape measure protein. It is upstream of NKF genes. There are few comparators, but even in different Phams, most phages call that the 4th gene downstream of the tape measure protein is the minor tail protein. /note=#1 /note=Primary Annotator Name: Li, Mulin /note=Auto-annotation: Called by both Glimmer and GeneMark, which agree on the same start codon positon - 14338 /note=Coding Potential: Coding potential is high throughout the called ORF, and start code site is covered. /note=SD (Final) Score: The SD score is -4.009. This is the second best RBS score but has a more reasonable upstream gap of the gene. /note=Gap/overlap: The upstream gap is reasonable (15 bp) and the length of the gene is reasonable. /note=Phamerator: This gene product belongs to Pham 61615, which is conserved among three phages. One belongs to AZ cluster and the other one belongs to a Singleton. All three phage genomes are draft genomes. /note=Starterator: The starterator ( Date 1/27/2023) reported Start1 as the most frequent start sites, which however has not been shown in published phage genome. This start site is supported by strong RBS score and coding potential, which I believe is a good start site for VroomVroom. /note=Location call: This gene is a real gene and has a start site of 14338, which is supported by general gene rules and RBS scores. /note=Function call: Minor Tail Protein. While this gene is reported to be minor tail proteins by PhageDB blasts and function frequency, no relevant crystal structure can be found with HHpred or CDD analysis. Neither do TmHmm and TopCons predict transmembrane domains. This gene does show synteny with phage genomes from the same cluster. The minor tail protein is usually a large protein and downstream of tape measure protein, which fits the description of this protein. /note=Transmembrane domains: Neither TmHmm nor TopCons reports transmembrane domains. /note=Secondary Annotator Name: /note=Secondary Annotator QC: /note= /note=#2 /note=Primary Annotator Name: Vajragiri, Shreya /note=Auto-annotation: Glimmer and GeneMark both agree there is a gene, and both call the start site to be 14338. /note=Coding Potential: Host-trained GeneMark shows good coding potential for most of the gene (except for one dip toward the end). Self-trained GeneMark has a lot more dips but in general still shows coding potential. Coding potential is on forward strand only, therefore it is a forward gene. /note=SD (Final) Score: -4.009. It’s the 2nd best RBS score, but best overall accounting for coding potential. /note=Gap/overlap: 15bp. This is pretty small; there is no coding potential in this gap. /note=Phamerator: 61615. Date 1/27/2023. It is conserved in Emotion (AZ) and Gilgamesh (Singleton). All members are drafts so there were no good comparisons. /note=Starterator: Start site 1 was automatically annotated for ⅔ of the Pham members. There are no manual annotations for start site 1. This corresponds to start position 14338. Agrees with Glimmer and GeneMark. /note=Location call: Evidence supports that this gene is real and the start site is 14338 (ATG). /note=Functional call: This protein is a minor tail protein, supported by PhagesDB and especially synteny, which says that the few genes downstream of tape measure protein are usually minor tail proteins. However, HHPred and CDD do not support this; the e-values are very high. Generally none of the data supports the call except for synteny and BLAST. SEA-PHAGES requirements say that minor tail protein function can be called if supported by synteny alone. /note=Transmembrane domains: DeepTMHMM does not call any transmembrane domains. It does say that it is an ‘outside’ protein. /note=Secondary Annotator Name: /note=Secondary Annotator QC: CDS 17419 - 17760 /gene="20" /product="gp20" /function="hypothetical protein" /locus tag="VroomVroom_20" /note=Original Glimmer call @bp 17419 has strength 14.2; Genemark calls start at 17455 /note=SSC: 17419-17760 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_SILENTRX_36 [Arthrobacter phage SilentRX]],,NCBI, q1:s1 92.0354% 3.44052E-17 GAP: 69 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.785, -3.4897410997597818, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_SILENTRX_36 [Arthrobacter phage SilentRX]],,QWY82776,57.265,3.44052E-17 SIF-HHPRED: SIF-Syn: /note=Primary Annotator: Name: LIM, MADELEINE /note=Auto-annotation: Yes (Glimmer: 17419, Genemark: 17455) /note=Coding Potential: High (First Reading Frame in forward direction) /note=SD (Final) Score: -3.490 (highest possible score among potential start sites) /note=Gap/overlap: 70 (with low probability for gene in between) /note=Phamerator: Pham Number: 2534 (As of 2/17/23) /note=Starterator: Start: 11 @17419 has 19 MA`s /note=Location call: 17419 /note=Function call: BLAST results return "hypothetical protein", thus providing no insight into the gene`s possible function. /note=Transmembrane domains: 0, appears to be an internal protein /note= /note=Primary Annotator: Name: Vanderpool, Lauren /note=Auto-annotation: Both Glimmer and Genemark, though they disagree on the exact start site. (17419 for Glimmer, 17455 for Genemark) /note=Coding Potential: There is reasonable coding potential in the forward direction, without any apparent overlap. /note=SD (Final) Score: -3.490 : This is the best final score given /note=Gap/overlap: 70 bp, which is over the recommended limit, but there is not enough room for an additional gene in that space. /note=Phamerator: Pham 2534 (as of 2/6/2023) /note=Starterator: Start: 11 at 17419 (19/26) /note=Location call: 17419 /note=Function call: The results showed "hypothetical protein", which does not yield any information about what the function of this gene could be /note=Transmembrane domains: CDS 17771 - 18499 /gene="21" /product="gp21" /function="endolysin" /locus tag="VroomVroom_21" /note=Original Glimmer call @bp 17771 has strength 19.89; Genemark calls start at 17771 /note=SSC: 17771-18499 CP: yes SCS: both ST: NI BLAST-Start: [LysM peptidoglycan-binding domain-containing protein [Renibacterium salmoninarum] ],,NCBI, q1:s9 72.314% 1.01597E-54 GAP: 10 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.285, -1.9310779259753799, yes F: endolysin SIF-BLAST: ,,[LysM peptidoglycan-binding domain-containing protein [Renibacterium salmoninarum] ],,WP_012245948,41.3669,1.01597E-54 SIF-HHPRED: N-acetylmuramoyl-L-alanine amidase; amidase, zinc binding, cell wall degradation, endolysine, hydrolase; HET: PO4, GOL; 1.21A {Clostridium intestinale},,,6SSC_A,49.5868,99.5 SIF-Syn: /note=Primary Annotator #1 Name: Martin, Kyle /note=Auto-annotation: Both Glimmer and GeneMark were used in this case, and both start at 17771. /note=Coding Potential: The Coding potential is in both the forward and reverse direction. The coding potential is similar in both the Host and Self-Trained GeneMark. /note=SD (Final) Score: -1.931 (best of all candidates) /note=Gap/overlap: The gap is 10 BP, which indicates that there is no overlap and minimal space between this gene and the one before it. The length of the gene is also reasonable. /note=Phamerator: The Pham number is 48120. The analysis was run on 01/27/23. The Pham was present in other members of the cluster AZ and was compared against phage Emotion. The function was unknown. /note=Starterator: The start site choice conserved among the members is 17771. The start site number 2 corresponds to the BP position of 17771. However, the Staterator is uninformative since the 2 members of the Pham both are draft annotations despite agreeing on the start site. /note=Location call: This is a real gene with a likely start site of 17771. /note=Function call: The HHpred data supports the conclusion that the gene is an N-acetylmuramoyl-L-alanine amidase and the CDD data supports the conclusion that the gene is a peptidoglycan recognition protein. The gene encodes for some protein that binds to peptidoglycan walls of bacteria and degrades them. Which would fall under endolysin. /note=Transmembrane domains: Zero /note= /note=Primary Annotator #2 Name: Vu, Thomas /note=Auto-annotation: Glimmer and GeneMark agree that the start site is at 17771. The start codon is ATG. /note=Coding Potential: There is high coding potential throughout the suspected region of the real gene. The self-trained and host-trained GeneMark are consistent with each other on agreeing that this is a real gene. The start site covers all the coding potential. /note=SD (Final) Score: -1.931 (best of all candidates) /note=Gap/overlap: Gap of 10 BP. It indicates that there is no overlap and minimal space between this gene and the one before it. The length of the gene is also reasonable. /note=Phamerator: The pham number is 48120. The analysis was run on 01/27/23. The pham was present in other members of the cluster AZ and was compared against phage Emotion. The function was unknown. /note=Starterator: The start site choice conserved among the members is 17771. The start site number of 2 corresponds to the BP position of 17771. However, the Staterator is uninformative since the 2 members of the pham both are draft annotations despite agreeing on the start site. /note=Location call: This is a real gene with a likely start site of 17771. /note=Function call: The HHpred data support the conclusion that the gene is likely an N-acetylmuramoyl-L-alanine amidase. The CDD data and 2 NCBI Blast hits support the conclusion that the gene is a peptidoglycan recognition protein. Ultimately, both datasets agree that the gene encodes for some protein that binds to peptidoglycan walls of bacteria and degrades them. For example, the best PDB hit contained an e-value of e-12, almost 50% coverage, and a probability of 99.53;it suggested that the gene function was an amidase with peptidoglycan-wall degrading abilities. Under the Approved Functions list, this would best fall under the criteria of an endolysin. /note=Transmembrane domains: No TMDs were predicted by TMHMM or TOPCONs, so no transmembrane function inferred. CDS 18499 - 18783 /gene="22" /product="gp22" /function="minor tail protein" /locus tag="VroomVroom_22" /note=Original Glimmer call @bp 18496 has strength 6.57; Genemark calls start at 18499 /note=SSC: 18499-18783 CP: yes SCS: both-gm ST: NI BLAST-Start: [membrane protein [Arthrobacter phage Lizalica] ],,NCBI, q1:s1 96.8085% 4.95738E-25 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 0.542, -7.660966145662663, no F: minor tail protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Lizalica] ],,YP_010677587,68.5393,4.95738E-25 SIF-HHPRED: DNA stabilization protein; Viral protein, HK620, Tail Needle, Membrane penetration; 2.202A {Salmonella phage HK620},,,5BU8_A,47.8723,96.1 SIF-Syn: Membrane protein, downstream gene belongs to pham 42320, just like in phage Adolin. Upstream gene in VroomVroom is an endolysin which differs from phage Adolin`s minor tail protein. /note=AF: Called as a minor tail protein after discussion with Debbie (following investigation into potential tail needle function) /note= /note=Primary Annotator #1 Name: Martinez, Daniela /note=Auto-annotation: Glimmer and Genemark both called the gene. Glimmer called a start site at 18496 and Genemark called it at 18499. The start codon at 18496 is GTG and the start codon at 18499 is ATG. /note=Coding Potential: This gene has reasonable coding potential predicted within the putative ORF. The region between 18496 and 18783 contains good coding potential which suggests this is a real gene. Both start sites predicted by Glimmer and Genemark cover all of the coding potential. /note=SD (Final) Score: The final score for the start site at 18496 is -7.497. This is not the best final score on PECAAN. However, since it is likely this gene is part of an operon, the final score is not important. /note=Gap/overlap: There is an overlap of -4bp for the start site at 18496 and -1bp for the start site at 18499. /note=Phamerator: As of February 7, 2023, this gene belongs to the 68918 pham. /note=Starterator: Start site #15 corresponds with a start at 18496. Start 15 is found only in 1 out of 51 genes in the pham and there are no mannual annotations of this start. Based on this information, this start is probably incorrect. The start at 18499 corresponds with a start site #16 on starterator. This start is more common and is found in 6/51 genes in the pham with 4 manual annotations. Start #16 was also called 66.7% of the time. /note=Location call: This is a real gene based on substantial coding potential. The gene starts at 18499 and stops at 18783. 18499 is the most reasonable start site due to the -1 gap. /note=Function call: Based on the BLAST results it is unclear whether this gene is a tail needle protein or a membrane protein. A majority of the top hits from the phagesDB BLAST contained unknown functions. The second top hit on the NCBI BLAST was also an unknown function. The phageDB BLAST results had slightly lower e-values than the ones found on the NCBI. However, since both databases are different and the e-values are all below the preferred 10^-6, I believe the true function of this gene is either a tail needle protein or a membrane protein. Further analysis from HHpred indicates similar results. However, there is an important domain missing from our protein which means we can`t confidently call this a tail needle protein. The membrane protein predicted by HHpred is a haemolysin, which does not make sense in the context of a phage genome. Since our TMHMM results determined the existence of one transmembrane domain, we agree to call this gene a membrane protein at minimum. /note=Transmembrane domains: Our TMHMM results indicate the presence of one transmembrane domain. This gene is a membrane protein with unknown function. /note= /note=Secondary Annotator Name: Wang, Xinyi /note=Auto-annotation: Glimmer shows start at 18496, and GeneMark shows start at 18499. /note=Coding Potential: Strong in the forward direction, weak in the reverse direction. /note=SD (Final) Score: -7.497 for Glimmer start at 18496. This is not the best final score, but may be inconsequential since this gene is probably part of an operon. /note=Gap/overlap: 4 bp overlap with the previous gene stop site, 0 bp gap from the next gene start site. /note=Phamerator: Pham 68918, 2/8/2023 /note=Starterator: I agree with the original autoannotated start site at 18496 although it doesn’t agree with the most conserved start pointed out by the starterator report. The most conserved start is called in about 25% of the non-draft genes, which shows that there is great variation in the gene. Besides, the suggested most common start at 18499 has a less convincing Z-score and Final score than the autoannotated start. /note=Location call: start at 18499, real gene. After checking with Dr. Friese, changed to 18499 due to previous wet-lab experience with operon binding that has a 1bp overlap. /note=Function call: Based on the NCBI BLASTp result, It is hypothesized that the gene encodes the function of membrane protein. However, HHpred best PDB hit suggests that the tail needle protein is also a candidate function call for the gene. After checking with Dr. Friese, we figured out that our protein partially meets the suggested protein in HHpred, but lacks a key domain for it to become a tail needle protein. Therefore, the functional call is determined to be membrane protein temporarily. /note=Transmembrane domains: There is 1 TMD according to TmHmm in PECAAN and the website, which corresponds with the hypothesis that the gene codes for a membrane protein. CDS 18784 - 19068 /gene="23" /product="gp23" /function="membrane protein" /locus tag="VroomVroom_23" /note=Original Glimmer call @bp 18784 has strength 13.82; Genemark calls start at 18784 /note=SSC: 18784-19068 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage Iter]],,NCBI, q3:s2 73.4043% 7.42489E-29 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.945, -4.839397248555723, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage Iter]],,URQ05011,75.0,7.42489E-29 SIF-HHPRED: SIF-Syn: /note=Primary Annotator #1 Name: Nguyen, Angelynn /note=Auto-annotation: Both Glimmer and Genemark note the start site to be 18784. /note=Coding Potential: There is good coding potential based on the host and self-trained genemark graphs. In this ORF there is only coding potential in the forward direction. /note=SD (Final) Score: -4.839, this is the score that is closest to 0 and therefore it is the best final score. /note=Gap/overlap: 0, No gene needs to be added here. /note=Phamerator: 42320, date 2/6/23 this was found in the AZ cluster. The pham has 51 members total with 23 drafts. It is conserved. /note=Starterator: This analysis was run on 2/6/23 and it was confirmed that the start number 6 is correct. Although, this was far from the most common start number, it was conserved in genes that are most similar to vroomvroom (like emotion). /note=Location call: All the evidence supports that the gene is real and the start site is at 18784. Although this region is not the most conserved from the Starterator data, the auto annotation and manual annotations agree that this should be the start site. /note=Function call: Membrane protein. There was no hit in the NCBI CDD. In the HHpred, the hits with the lowest e-values were phage holins with really high e-values of 2.9, which is not reliable. However, this hit did have 58% coverage and 92% probability. Moreover, HHpred also had potential membrane protein hits and note that holins are membrane proteins. Considering that this is a transmembrane protein based on the DeepTMHMM, it is most likely that the function of this gene is the membrane protein. /note=Transmembrane domains: This is a transmembrane protein since there were confident hits in the DeepTMHMM. /note= /note=Primary Annotator #2 Name: Wu, Grace /note=Auto-annotation:Both glimmer and Gene mark agreed that the start is at 18784 /note=Coding Potential: There is ORF in the forward direction at both host-trained and self-trained GeneMark. This gene demonstrates high coding potential. /note=SD (Final) Score: At start 18748, the final score is -4.839. This is the closest score to zero compared to other hypothetical starts. /note=Gap/overlap: There is no overlap or gap with other genes at start 18748. /note=Phamerator: 42420, this gene is found in cluster AZ. There are in total 51 genes in this Pham and 23 of them are draft genes. /note=Starterator: Pham 42320, start 5, which is not conserved in majority of the members in the pham. The most conserved is at Start 6 for this Pham, 21/28 nondraft gene called this start. /note=Location call: start 18748(ATG) is the best start site for gene 22. Even though this gene does not have the most called start, because not every non-draft gene is called start 6, we can neglect this variation. Start 18748 has the highest final score, no overlap or gap with other genes, and z-score of 1.945 that is extremely close to 2. Because of the evidences show in Phamerator and Starterator, it is confirmed that start 18748 is the best location call. /note=Function call: Membrane Protein. There is no hit found in CDD, and in HHpred the highest hits lead to two functions: holin and membrane protein. After discovery in TmHmm, since there are 2 TMDs found in this gene, the criteria for membrane protein is met, and the hypothetical function is membrane protein. /note=Transmembrane domains: 2 CDS 19136 - 19384 /gene="24" /product="gp24" /function="hypothetical protein" /locus tag="VroomVroom_24" /note= /note=SSC: 19136-19384 CP: yes SCS: neither ST: NA BLAST-Start: GAP: 67 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.812, -5.502226811307842, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=AF: Added gene. Gene also found in Emotion. Overlaps a lot of CP in the reverse direction, but went with F gene because it doesn`t interrupt many F genes in a row. /note= /note=Primary Annotator Name: Zorawik, Michelle /note=Auto-annotation start source: Neither GeneMark or Glimmer call a start site. The gene was added manually. /note=Coding Potential: The ORF has reasonable coding potential in the second strand of the forward direction that is covered by the chosen start site. Coding potential is found in host- and self-trained GeneMark. There is overlapping coding potential in the third strand of the reverse direction, however, it appears within a stretch of forward genes, thus the ORF called for this area was excluded. /note=SD (Final) Score: -5.502. It is not the best SD score but other start sites with better Final scores would not result in reasonable gaps. /note=Gap/overlap: 67bp. The gap is relatively large, however, it is the only reasonable one for this gene. The genes placed within this genomic area in other phages have similar gaps to the (conserved) upstream gene. /note=Phamerator: The gene is part of an orpham as of 3/8/23. It is not found in any other phage genomes. /note=Starterator: There is no Starterator report for this Pham. /note=Location call: Based on the above evidence, this gene is real and the most likely start site is 19136. This determination was made based on the presence of reasonable coding potential in GeneMark host & self in the forward direction, as well as on this genomic area appearing to commonly contain orphams. /note=Function call: NKF. There are no PhagesDB and NCBI BLAST or reasonable HHpred or CDD hits that would suggest a function. /note=Transmembrane domains: Neither TMHMM, TOPCONS, or Deep TMHMM predicted the presence of transmembrane domains, therefore this is not a membrane protein. CDS 19454 - 19903 /gene="25" /product="gp25" /function="hypothetical protein" /locus tag="VroomVroom_25" /note=Original Glimmer call @bp 19454 has strength 15.6; Genemark calls start at 19454 /note=SSC: 19454-19903 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein HOU48_gp62 [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 42.2819% 3.25472E-18 GAP: 69 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.96, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU48_gp62 [Arthrobacter phage DrManhattan] ],,YP_009815405,70.3125,3.25472E-18 SIF-HHPRED: SIF-Syn: /note=Primary Annotator #1: Okumura, Joey /note=Auto Annotation: Glimmer and Genemark used → agree on start site 19454 with ATG start codon /note=Coding Potential: start site covers good coding potential in second forward ORF and appears to align with start site 19454 → very similar results in host and self trained GeneMark /note=SD (Final) Score: best FS of -2.601 (z score of 2.96) /note=Gap/overlap: 1 = too small to fit another gene /note=Phamerator: 61872. Date 2/07/2023. Gene is orpham. /note=Starterator: NA /note=Location call: Based on above evidence, this is likely a real gene with the start site at 19454. /note=Function call: Multiple PhageDB and NCBI BLAST hits with unknown functions. In PhagesDB, there were some hits that had the function of a minor tail protein but the e values were greater than 1. No CDD hits and the HHpred hits had high e values (lowest e value was 38). /note=Transmembrane domains: Not a transmembrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: /note= /note=Primary Annotator #2: Barden Sophia /note=Auto-annotation: Glimmer and GeneMark both call the start at 19954 with a Start codon of ATG. Strong indication that this gene exists. /note=Coding Potential: The coding potential in this ORF is located on the forward strand, indicating that the orientation on this gene is Forward. Coding potential is found from 19454 to 19903 in both Host-trained and Self GeneMark. /note=SD (Final) Score: Best Gene candidate with a Final Score of -2.61 and a Z score of 2.96. /note=Gap/overlap: There is a 1 bp gap, this is not a notable gap. /note=Phamerator: As of 2/7/2023, this gene was found in Pham 61872. This Pham represents 1 member. Gene is likely orpham. /note=Starterator: None /note=Location call: Considering the evidence above, we conclude this gene is real, with a start site at 19454. /note=Function call: No significant BLAST hits recorded in either database. Majority of Phage DB and NCBI BLAST hits had unknown functions. Hits that had identified functions were bad candidates, with e values greater than 1. No significant CDD and HHpred hits, all potential candidates had very large e-values. /note=Transmembrane domains: No evidence of Transmembrane Domains present from TMHMM prediction. We conclude that this is not a membrane protein. /note=Secondary Annotator Name: Okumura, Joey /note=Secondary Annotator QC: CDS 20017 - 20631 /gene="26" /product="gp26" /function="deoxynucleoside monophosphate kinase" /locus tag="VroomVroom_26" /note=Original Glimmer call @bp 20017 has strength 19.95; Genemark calls start at 20017 /note=SSC: 20017-20631 CP: yes SCS: both ST: SS BLAST-Start: [deoxynucleoside monophosphate kinase [Arthrobacter phage Lego] ],,NCBI, q5:s6 86.2745% 4.61224E-63 GAP: 113 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.037, -2.442961286954254, yes F: deoxynucleoside monophosphate kinase SIF-BLAST: ,,[deoxynucleoside monophosphate kinase [Arthrobacter phage Lego] ],,QIN94424,65.6863,4.61224E-63 SIF-HHPRED: DEOXYNUCLEOSIDE MONOPHOSPHATE KINASE; TRANSFERASE, PHOSPHOTRANSFERASE; HET: DGP, OCS; 2.0A {Enterobacteria phage T4} SCOP: c.37.1.1,,,1DEK_A,90.6863,99.7 SIF-Syn: deoxynucleoside monophosphate kinase; upstream is a membrane protein and downstream is an NFK gene followed by an exonuclease gene just like in phages Yang and DrManhattan /note=Primary Annotator #1 Name: Ortiz-Gomez, Diana /note=Auto-annotation: Glimmer and Genemark. Both indicate the start at 20017. /note=Coding Potential: Coding potential is seen in the forward strand which confirms that this is a forward gene. Genemark Self and Host also indicate this coding potential. /note=SD (Final) Score: This start site has the best final score of -2.443. /note=Gap/overlap: The gap is 113 which is pretty large, but there is no coding potential in this area so it is not likely that there are any genes here. /note=Phamerator: Pham 58367. 2/7/2023. Conserved, found in YesChef(AZ) and Yang (AZ). /note=Starterator: Start site 40 is conserved, and corresponds to 20017. 57/210 call the site. /note=Location call: The evidence shows that this is a real gene and the most likely start site is at 20017. /note=Function call: Deoxynucleoside monophosphate kinase. The top two phages from PhagesDB BLAST have this function (4e-54 and 6e-56). Even though there’s variation in the functions of top phages in NCBI BLAST, there’s good evidence for this function (5e-63 and 56% identity). The CDD and the PDB HHpred hits show low e-values and all agree that this is the most likely function. /note=Transmembrane domains: TMHMM does not predict any TMDs, therefore, this is not a membrane protein which makes sense because this enzyme occurs inside the cell. /note= /note=Primary Annotator 2 Name: Berber-Pulido, Rodrigo /note=Auto-annotation: Both Glimmer and Genemark agree that the start site is 20017 /note=Coding Potential: Both Host-trained and phagesdb genemark agree that the coding potential is very high in this gene and throughout the entire gene. /note=SD (Final) Score: -2.443, This is the best final score out of all potential start sites /note=Gap/overlap: 113, which has a high gap but I would not worry about it since it still is not a big enough gap for another gene to fit. /note=Phamerator: 58367, 2/8/23, found in Tbone_24, Tweety19_25. Phage is a part of AZ cluster along with other phages mentioned /note=Starterator: Start site 40 (40, 20017) had the most MAs, thus I would suggest to keep this as the start site as the auto annotation agreed. It was not the most annotated start site but this is great evidence that start site 40 was the correct start site for this gene. /note=Location call: The evidence shows that the start site is 20017 /note=Function call: Deoxynucleoside monophosphate kinase has the highest evidence of it being the function for this gene. PhagesDB BLAST specifically has good e values for this function along with other evidence of other genes from different genomes sharing this function. /note=Transmembrane domains: No hits on TMHMM on TMDs, thus not a membrane protein, makes sense because enzyme occurs in the cell. CDS 20734 - 21327 /gene="27" /product="gp27" /function="hypothetical protein" /locus tag="VroomVroom_27" /note=Original Glimmer call @bp 20734 has strength 14.04; Genemark calls start at 20734 /note=SSC: 20734-21327 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Pseudarthrobacter siccitolerans]],,NCBI, q5:s1 97.9695% 1.12793E-59 GAP: 102 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.9, -6.857740919865087, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Pseudarthrobacter siccitolerans]],,WP_050053699,68.3673,1.12793E-59 SIF-HHPRED: SIF-Syn: There is good synteny between this gene and other genes (especially those with unknown functions as of yet). Because of the synteny, we can conclude that this is in fact a real gene. The function, however, cannot be deduced through synteny with other genes. /note=Secondary Annotator Name: Bursulaya, Isabelle /note=Auto-annotation: Glimmer and GeneMark were used, and both call the start at 20734. /note=Coding Potential: Coding potential is found in mainly the forward gene. The coding potential was similar in both the Host and Self Trained GeneMark. /note=SD (Final) Score: Final score is -6.858, which is not the best score on PECAAN /note=Gap/overlap: The gap is 102, which is fairly large but overall still reasonable especially because other phages seem to contain that same gap. /note=Phamerator: pham is 1819 as of 2/6/2023. Gene is conserved in Adolin and Asa16, both of which are in the same cluster (AZ) as VroomVroom /note=Starterator: Start site 2 has 28 manual annotations out of 41 non-draft genes which corresponds to start site 20734, which agrees with both Glimmer and GeneMark. /note=Location call: Based on the evidence compiled, this gene is most likely real and in the correct start site, despite being a little suspicious. It most likely starts at position 20734. /note=Function call: The function of this gene/protein is unknown. Both PhagesDB BLASTp and NCBI BLASTp did not yield very good results. The top hits for both listed the function as unknown, or a hypothetical protein. The E values for phages that had the gene with the function listed (as a capsid maturation protein) were very high (1.5) and therefore cannot be trusted. The results from PhagesDB BLASTp contained a score of 31.2 and identities of 24% for the phage Kareem, which had a function listed for the protein. However, other phages, such as DrManhattan, claimed the function was unknown and had an E score of 5e-47 with a score of 185 and identities of 51%. Similar results are seen in the NCBI BLASTp. Therefore, the true function of the gene cannot be stated with certainty. CDD and HHpred also did not give the most informative hits, with CDD giving no hits at all and HHpred claiming that the protein function was an ATP synthase. However, the E value was not acceptable (22) and it does not make sense for a phage to have an ATP synthase. Therefore, there is NKF. /note=Transmembrane domains: No predicted TMDs by DeepTMHMM, so the protein is not a transmembrane protein. /note= /note=Primary Annotator Name: Pan, Crystal /note=Auto-annotation: both glimmer and genemark agree with the start site at 20734. /note=Coding Potential: There is good coding potential in the F strand of the gene and the start site covers the coding potential. This is good evidence that this is a real gene. /note=SD (Final) Score: -6.858, z score <2. These are not very good scores, which make this gene a bit iffy, however, combined with the other evidence, this seems to still be a real gene. /note=Gap/overlap: +102 bp, not a very small gap, but this gap has synteny with other phages. There doesn’t seem to be coding potential in the gap, so even though the gap is kind of big, it would not make sense to add a gene in this gap. /note=Phamerator: pham 1819. 65 members. Many in cluster AZ, but are from other clusters as well. Adolin and Adumb2043 were used as comparisons. /note=Starterator: start 2 called 28/41 non-draft genes. Start 2 in vroomvroom was auto-annotated at 20734, which is the same as the start that glimmer and genemark predicted. /note=Location call: Auto-annotation and starterator predicted start site agree with the start site at 20734. The final score seems to be a little iffy as well as the z score, but along with all the evidence, this seems to be a real gene. The start site at 20734 is the best possible start site that we can choose, as all of the other start sites would have even greater gaps in between, though the z scores become better. /note=Function call: There is no known function in this gene. Both the databases predict that this either has an unknown function or is a hypothetical protein, and the e-values for the hypothetical functions indicate that the data is unreliable. HHPred did say that this was an ATP Synthase, but this is the only hit for ATP synthase and the E value is not acceptable, and it wouldn’t really make sense either since ATP synthase exists in the mitochondria and the bacteriophage shouldn’t have a mitochondria. This protein should be marked NKF. /note=Transmembrane domains: There are no transmembrane domains in this gene, so this gene is not a transmembrane protein. CDS 21561 - 22400 /gene="28" /product="gp28" /function="Cas4 exonuclease" /locus tag="VroomVroom_28" /note=Original Glimmer call @bp 21561 has strength 14.11; Genemark calls start at 21492 /note=SSC: 21561-22400 CP: yes SCS: both-gl ST: SS BLAST-Start: [exonuclease [Arthrobacter phage Amyev]],,NCBI, q1:s1 98.2079% 5.54276E-172 GAP: 233 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.96, -4.669046746593814, yes F: Cas4 exonuclease SIF-BLAST: ,,[exonuclease [Arthrobacter phage Amyev]],,UIW13443,93.0909,5.54276E-172 SIF-HHPRED: DUF3799 ; PDDEXK-like domain of unknown function (DUF3799),,,PF12684.10,74.552,99.0 SIF-Syn: /note=Primary Annotator #1 Name: Pisipati, Kirthana /note=Auto-annotation: Glimmer lists a start site of 21561, while Genemark lists a start site of 21492. The start codon is GTG, which has a high probability. /note=Coding Potential: Host trained and self trained Genemark show fairly high coding potential in the third reading frame that extends beyond the putative ORF. There is some coding potential before the suggested start but it isn`t very high. /note=SD (Final) Score: The final score for start site 21561 is -4.669, which is the best SD score. /note=Gap/overlap: The gap is 233bp, which is a little large but is still reasonable, especially because the gap exists in other phage genomes and is too small to insert a gene. The start site with a longer ORF has a worse Z value and final score. The length of the gene at the autoannotated start site is 840bp. /note=Phamerator: This gene is in pham 67087 as of 2/8/23, and there are several members of this pham that belong to cluster AZ (Amyev, Kaylissa, Tweety19). This pham has 147 members, 36 of which are draft genomes. /note=Starterator: Start site 37 is conserved in 42 out of 104 non draft genomes, which corresponds to 21561bp. Start site 37 is the autoannotated start site, and the most manually annotated site. /note=Location call: The gene is real, and the most likely start site is 21561, according to other annotated phage genomes. This start site also has the best RBS final score and Z value, although it does not include all of the coding potential. /note=Function call: The predicted function is exonuclease. Other genes from the same pham and cluster called this as an exonuclease or Cas4 family exonuclease. BLASTp had several hits with low e values that suggested this function, including phages Amyev, Adumb, and BaileyBlu. While there were no hits from CDD, the HHpred output had hits with low e values suggesting a function of DNA exonuclease. There were no specifications from SEA-PHAGES to call this function. /note=Transmembrane domains: There were no transmembrane domains according to DeepTMHMM, which makes sense for the function of exonuclease, which would stay with DNA and not interact with bacterial cell membranes. /note= /note= /note=Primary Annotator #2 Name: Chawla, Esha /note=Auto-annotation: Glimmer Start Site: 21561, GeneMark Start Site: 21492. While both Glimmer and GeneMark called potential start sites, the start sites called between these two programs is slightly different – the proposed/auto-annotated/called start sites are approximately 69 base pairs apart. However, regardless of the start site, the start codon is, interestingly, the same: GTG. /note=Coding Potential: There is high coding potential contained in the third forward reading frame in both GeneMark Host and Self. Furthermore, from the GeneMark proposed start site of 21492 and Glimmer proposed start site of 2161, there is fairly high coding potential throughout almost the entirety of the gene, until the stop codon is reached at 22400. Thus, the chosen start site does cover all of this coding potential. As such, this gene is likely a real gene with a start site at either 21492 or 21561. /note=SD (Final) Score: For the currently best-proposed start site at 21561, which has the best SD score at -4.669. However, considering this SD score is not very close to 0 and the large inter-gene gap (discussed below), I think it is worth considering if a new start site needs to be proposed, which would also result in a better SD score. /note=Gap/overlap: For the currently best-proposed start site at 21561, the gap is 234 base pairs. This gap is not very reasonable to have as just empty space, as this gap is large enough to include an entirely new gene. While this inter-gene gap is large, the coding potential in this region is low, making it unlikely an entirely new gene should be inserted. As such, if this gap cannot be addressed by adding a new gene, the start site of this gene, at least, needs to be further upstream to make this gap slightly smaller. However, there are currently no proposed start sites that have a better SD or z-score, suggesting that we may need further primary bioinformatic analysis to discover this potentially new, further-upstream start site. /note=Phamerator: On the day of my investigation, 2/6/2023, this gene was found in Pham 67087. This Pham has 144 members, 40 of which are drafts. This gene is conserved in many other cluster AZ phages, including Adolin, Adumb2043, and Asa16. The phamerator and phams database called the function of this gene to be an exonuclease. /note=Starterator: The auto-annotated start site of 2156 is also the most manually annotated start site. It was manually annotated in 7 of 11 genes in this pham. The auto-annotated start site is 37, which has a corresponding base pair of 21561. This start site choice is a reasonable choice, as it is well-conserved among the members of the pham AZ, which this gene belongs to. /note=Location call: Taken together, the evidence is currently suggesting that this gene is real and has a start site of either 21492 or 21561. This gene is likely real, as it is decently well-conserved in phamerator and has good coding potential. However, there is currently some discrepancy and uncertainty in regards to the start site. Of these two proposed start sites, 21561 is the more likely, as there is a better SD and z-score for this start site. However, with this start site, a very large inter-gene gap of 234 base pairs is found. Thus, 21561 is currently the best proposed start site, on the basis of the SD and z-score, but further primary bioinformatic analysis should be done on this gap to determine if there are even better start sites, so that this inter-gene gap is better addressed. Based on Module 4, the auto-annotated start site of 2156 is also the most manually annotated start site. It was manually annotated in 7 of 11 genes in this pham. /note=Function call: Based on the collected data, we can conclude that there is decently strong evidence that this gene is likely an exonuclease, potentially specifically acting in the Cas4 family. Because the e-values are fairly consistently low for genes found in the same cluster AZ, including Amyev (e-137), Baileyblue (e–171), and Adumb2043 (e-136), and because these phages also annotate this gene as an exonuclease, I think this is a strong match. Therefore, I am fairly confident that this gene is a protein in the minor capsid, specifically acting as a head maturation protease. It is important to note that the identity values, however, are also fairly high, at higher than 80%. Thus, considering how small the e-values and how large the identity values are, I am fairly confident that the gene is an exonuclease. /note=Overall, I currently hypothesize that the function for this ORF is an exonuclease. BLASTp and PECAAN showed that the most likely function is an exonuclease, and this was determined with a very small e-value. Moreover, while CDD did not propose any functions, HHpred also proposed that this ORF is likely an exonuclease, again with a very small e-value and a very high probability. As such, considering 3 different databases – BLASTp, PECAAN, and HHpred – are in agreement about this ORF’s function with low e-values and high probability, I think the function of the ORF is it is an exonuclease. /note=Transmembrane domains: No transmembrane domains – the absence of TMDs does make sense in the context of the hypothesized function (exonuclease) for this gene. Considering the exonuclease needs to remain inside the cell, specifically inside the nuclease, in order to perform any necessary genome editing, the lack of TMDs does make sense in the context of this hypothesized function. CDS 22410 - 22715 /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="VroomVroom_29" /note=Original Glimmer call @bp 22410 has strength 14.3; Genemark calls start at 22410 /note=SSC: 22410-22715 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Microbacterium resistens] ],,NCBI, q29:s4 71.2871% 7.04521E-6 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.285, -2.0111200136961407, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Microbacterium resistens] ],,WP_231820071,52.1127,7.04521E-6 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name #1: Reyimjan, Diana /note=Auto-annotation: Both Glimmer and Genemark have auto-annotated this gene. They agree on the same start site, which is 22410. The start codon called is ATG. /note=Coding Potential: The gene has good coding potential predicted within the 3rd ORF for both self-trained and host-trained Genemark. The chosen start site covers all the coding potential. /note=SD (Final) Score: -2.011. This is the least negative score and therefore the best. /note=Gap/overlap: 9bp, which is an acceptable gap. There is no coding potential before this gap that would suggest absence of a gene. /note=Phamerator: This gene is in pham 68264, which is an orpham, so no evidence from phamerator can support this or any start site. /note=Starterator: Starterator is uninformative because this gene doesn`t belong to a pham. /note=Location call: While this gene is not found in any other phage genomes, the region has promising coding potential. The start site of 22410 was predicted by both Glimmer and GeneMark and has a good final score. There are no large non-coding gaps before the gene, which is encouraging because phage genomes are tightly packed. /note=Function call: BLAST, CDD, and HHpred could not yield any evidence that points to a known function for this gene. It’s possible that this gene came from a prophage due to a NCBI hit that corresponded to an Actinobacteria, but further research is required. /note=Transmembrane domains: 0 transmembrane domains predicted. Does not yield any additional information about function, and function is still unknown. There is a signal peptide region predicted, but this does not yield much information about function. /note= /note= /note=Primary Annotator Name #2: Critzer, Nicole /note=Auto-annotation: Used both Genemark and Glimmer to auto-annotate. Both agree on the start site being 22410. The start codon is ATG. /note=Coding Potential: There is strong coding potential predicted on the 3rd ORF. Although the start site and stop site encompass all the coding potential, one should note the potential stop before the stop site. /note=SD (Final) Score: -2.011 - least negative, so best sequence match and the corresponding ORF is the LORF /note=Gap/overlap: The gap is 9bp, this is very small and there is no coding potential preceding it to suggest another gene could fit in between. /note=Phamerator: Pham 68264 - aside from this, it is an orpham so phamerator cannot give us any other useful information. /note=Starterator: It is an orpham so starterator is uninformative because there are no other genes within the pham to compare it to. /note=Location call: 22410 - both Glimmer and Genemark agree on this call and it encloses all coding potential while having the least negative final score. /note=Function call:NKF - there is no strong evidence supporting any one function from any of the databases referenced (BLAST, CDD, HHpred) that being said the coding potential is strong within the region supporting that a gene is within the start and stop site. /note=Transmembrane domains: There is a signal peptide however this is not a transmembrane protein and 0 TMDs were predicted. CDS 22712 - 23095 /gene="30" /product="gp30" /function="HNH endonuclease" /locus tag="VroomVroom_30" /note=Original Glimmer call @bp 22712 has strength 10.67; Genemark calls start at 22712 /note=SSC: 22712-23095 CP: yes SCS: both ST: NI BLAST-Start: [HNH endonuclease, partial [Dehalococcoidia bacterium]],,NCBI, q2:s4 85.8268% 1.36684E-12 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.253, -4.413617268040186, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease, partial [Dehalococcoidia bacterium]],,RPJ55164,48.7603,1.36684E-12 SIF-HHPRED: HNH homing endonuclease; HNH catalytic motif, Helix-turn-helix DNA binding domain, protein-DNA complex, DNA binding protein-DNA COMPLEX; HET: EDO; 2.92A {Bacillus phage SPO1} SCOP: d.4.1.3, d.285.1.1,,,1U3E_M,90.5512,99.9 SIF-Syn: This gene is most closely aligned to genes found in cluster E. The best synteny for this gene was found with ABCcat, e-value 3e-11, Badstone, e-value 3e-11, and Dusk, e-value 3e-11. /note=Primary Annotator #1: Robles, Angel /note=Auto-annotation: Both Glimmer and GeneMark call the gene and they agree on the start site at 22712 bp. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.414. This is the best Z-score for this gene given the gap size. /note=Gap/overlap: The gap is -4 which is small and reasonable. /note=Phamerator: Pham number 68714 has 148 members, 9 are drafts. 02/07/2023. It is found in TBrady12_109, Lilizi_106. /note=Starterator: The start number called the most often in the published annotations is 25, it was called in 108 of the 139 non-draft genes in the pham. Found in 58 of 111 ( 52.3% ) of genes in pham. Called 100.0% of time when present. Found in 109 of 148 ( 73.6% ) of genes in pham. However, VroomVroom_30 Start: 22712, Stop: 23095, Start Num: 46 /note=Candidate Starts for VroomVroom_30: /note=(8, 22532), (46, 22712), (96, 22946), /note=Location call: Real gene with start site at 22712 possible operon /note=Function call: The top three phagesdb BLAST hits have the function of HNH endonuclease (E-value = 10^-11), and 5 out of 5 top NCBI BLAST hits also have the function of HNH endonuclease. (E-value <10^-12). HHpred had a hit for HNH endonuclease with 99.9% probability, 90% coverage, and E-value of 1.4e^-22. CDD had a hit for NHN endonuclese (E-value 1.8e-10) with 43% identity, 60% aligned, 34% coverage . /note=Transmembrane domains: TMHMM does not predict any TMDs, therefore it is not a membrane protein. /note= /note=Primary Annotator #2 /note=Name: Dawson, Niels /note=Auto-annotation: Glimmer and Genemark agree. The start site is also agreed upon. /note=Coding Potential: There is great coding potential. Glimmer and Genemark agree. The start site is at 22712. /note=SD (Final) Score: -4.414 /note=Gap/overlap: -4, indicating this gene is likely part of an operon. /note=Phamerator: Pham number 68714 has 148 members, 9 are drafts. 02/07/2023. It is found in TBrady12_109, Lilizi_106. /note=Starterator: The most commonly called start was 25, and it was found in 108 of 139 genes in the pham. Vroomvroom had a different start and it was 46, which was the same as the autoannotated start. /note=Location call: This gene may be part of operon, as gap is -4 and is that start site is called by both glimmer and genemark. It also includes all the coding potential. /note=Function call: In regards to phagesdb blast, there were many hits (checked below) that point to this protein being an HNH endonuclease. 5 out of 5 hits on NCBI BLAST had this gene asa HNH endonuclease. CDD had one hit for HNH endonuclease as well. They were good e values (-10 and below). HHPRED called this gene as HNH homing endonuclease as well with an e-value of -22. /note=Transmembrane domains: No TMDs predicted by TMHMM and this is not a membrane protein. CDS 23092 - 23409 /gene="31" /product="gp31" /function="hypothetical protein" /locus tag="VroomVroom_31" /note=Original Glimmer call @bp 23092 has strength 17.3; Genemark calls start at 23092 /note=SSC: 23092-23409 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Arthrobacter sp. B2a2-09]],,NCBI, q1:s1 98.0952% 2.61523E-46 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.174, -4.753679153475695, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter sp. B2a2-09]],,WP_269998110,79.4393,2.61523E-46 SIF-HHPRED: SIF-Syn: /note=Primary Annotator #1 Name: Rodriguez, Justin /note=Auto-annotation: Coding potential was reached in both Glimmer and GeneMark, and they were aligned. They both agree on the same start site of 23092. GTG is the start codon /note=Coding Potential: Yes, there is reasonable coding potential as shown by Glimmer and GeneMark graphs.Coding potential covers the proposed length of the gene. /note=SD (Final) Score: -4.754 which is reasonable /note=Gap/overlap: -4 which is reasonable; shows that it is part of a codon. /note=Phamerator: 9051, 2/6/2023. 7 pham members total, 3 are drafts. One other draft genome (Emotion) in the AZ cluster is in the pham. No gene function called /note=Starterator: There is a reasonable start site and it is conserved in one other draft genome (Emotion). Start site number is 4 and it corresponds to the base pair coordinate of 23092. This gene does not call the most annotated site, but 6 other members are in this pham and two call the most annotated site. Starterator is not very informative in this way. No start is held by a majority of the pham members. /note=Location call: The start site is likely 23092. This is most likely a real gene taking coding potential and calls from Glimmer and GeneMark (23092 called for both) into consideration. Starterator reports 23092 as a start site as well so it is likely valid. /note=Function call: The predicted function is NKF as significant hits (e-values of less than e-15) from NCBI or PhagesDB BLASTs represent hypothetical proteins. CDD and HHPRED have no hits at all, so little information on function can be worked with. /note=Transmembrane domains: The absence of TMDs does not inform much about the function of this protein since the protein has NKF at the moment. Based on pham maps in PECAAN, there is a gene directly upstream that codes for an HNH endonuclease. Therefore, this gene may have a similar function and would support there being no TMDs. /note= /note=Primary Annotator #2 Name: Milena, Deal /note=Auto-annotation: Both Glimmer and Genemark agree on the start of 23092 and stop of 23409. The start codon is GTG, which is one of the more common start codons. /note=Coding Potential: Both Glimmer and Genemark show high coding potential for this gene throughout the gene. The gene encompasses all the coding potential. /note=SD (Final) Score: -4.754 which is decent. /note=Gap/overlap: 5 base pair overlap, which suggests that this gene is most likely part of an operon. /note=Phamerator: This pham has 7 members. 3 of the phages are drafts. The only other phage in cluster AZ is Emotion, which is a draft genome. /note=Starterator: The start site called the most is 2. 2 out of 4 phage genomes call it, so this is not that helpful. Auto-annotation called start site 4 rather than 2. The most chosen start site does not appear to be present for our phage, so I think the auto-annotated start site is the best option. The auto-annotated start site, start site 2, is at 23092. /note=Location call: 23092 is the best possible start site for this real gene. It encompasses all the coding potential in Glimmer and GeneMark, and it is the only suggested start site present. Although it is not the most called start site in Starterator, the pham is very small and one draft genome (Emotion) also called this start site. The Z-score is good and is 2.174. /note=Function call: I do not think we have enough information to hypothesize the function. There are very few hits that have protein functions and those that do have higher e-values in BLASTp. CDD had no hits. For HHpred, all the hits had high E-values (the lowest E-value for a protein of known function is 79), so HHpred was not informative. Therefore, our protein is NKF. /note=Transmembrane domains: DeepTMHMM did not show any transmembrane domains. Therefore, these results did not help decide on a function for this protein. CDS 23412 - 23807 /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="VroomVroom_32" /note=Original Glimmer call @bp 23412 has strength 16.21; Genemark calls start at 23412 /note=SSC: 23412-23807 CP: yes SCS: both ST: SS BLAST-Start: [lipoprotein [Arthrobacter phage SilentRX]],,NCBI, q1:s1 96.9466% 1.79703E-50 GAP: 2 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.881, -3.6727816321281046, yes F: hypothetical protein SIF-BLAST: ,,[lipoprotein [Arthrobacter phage SilentRX]],,QWY82786,71.8519,1.79703E-50 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Rosenbluh, Riley /note=Auto-annotation: Glimmer and GeneMark were used to annotate, both indicated the start site to be 23412 /note=Coding Potential: Minimal overlap observed, covers the coding potential sufficiently /note=SD (Final) Score: -3.673 /note=Gap/overlap: 2 observed /note=Phamerator: The Pham is 55263 (2/20/2023). The analysis was run on 1/27/2023 on Database version 501. The Pham was not located within any other AZ clusters but has been identified as an lipoprotein in other phages but has no apparent call function. /note=Starterator: /note=Location call: /note=Function call: /note=Transmembrane domains: /note= /note= /note=Primary Annotator Name: Douglas, Katherine /note=Auto-annotation: 23412 was called for both Glimmer and Genemark /note=Coding Potential: Good coding potential in the forward direction on the self-trained genemark. Although there were some suspicious results on the host-trained genemark, there seems to be strong coding potential in at least the middle region of the gene and the bit of coding potential in the reverse direction is not strong enough to refute this conclusion. There is no synteny but if this gene was not real there would be a large gap in the genome so it likely does exist. Additionally, there are high results in the BLAST hits within the same pham. The chosen start site covers all coding potential. /note=SD (Final) Score: -3.673 (least negative value) /note=Gap/overlap: 2 (very small gap) /note=Phamerator: Pham 55263 as of 2/6/2023. Pham was not found in any other AZ cluster phages. Called as a lipoprotein in a few phages but many do not call a function. /note=Starterator: Start site #108 was found in 153/323 nondraft genomes but VroomVroom did not contain this start site. The next most annotated start site that VroomVroom contained was site #110. Although this was only in 21 other phages, it was called 100% of the time when present. /note=Location call: This is a real gene with a probable start site at 23412. There is coding potential in the region which is covered by this start site. Although this was not the highest annotated start site among the phages with this gene, it was annotated in 11 other phages and, when it was present, was picked to be the start site 85% of the time. VroomVroom did not contain the most annotated start site, though the chosen start site was very near it. There is a gap of 2 which is an acceptable gap. The start codon is ATG which has a high probability. /note=Function call: No known function. Although there were several hits in both phagesdb and NCBI, the hits did not have any known functions and, in the case of NCBI did not even coordinate to other phages. Rather, they coordinated to some of the host bacteria. There was only 1 phage that assigned a function and, since there was no additional evidence to back up this claim, this protein was assigned NKF. CDD had no hits. HHpred had some hits but there had poor evalues (more than 10^-5) and coverage under 45%. Since many of these also had unknown functions or belonged to proteins not in bacteriophages, these results did not provide any evidence for a proposed function. /note=Transmembrane domains: 0 predicted domains. There is no known function for this protein so it is reasonable that there are no TMDs. CDS 23804 - 24178 /gene="33" /product="gp33" /function="nucleoside deoxyribosyltransferase" /locus tag="VroomVroom_33" /note=Original Glimmer call @bp 23804 has strength 7.87; Genemark calls start at 23828 /note=SSC: 23804-24178 CP: yes SCS: both-gl ST: SS BLAST-Start: [nucleoside 2-deoxyribosyltransferase [Arthrobacter phage Liebe] ],,NCBI, q1:s1 83.0645% 1.16188E-31 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.532, -3.627004739933808, no F: nucleoside deoxyribosyltransferase SIF-BLAST: ,,[nucleoside 2-deoxyribosyltransferase [Arthrobacter phage Liebe] ],,YP_009817060,57.5758,1.16188E-31 SIF-HHPRED: NUCLEOSIDE 2-DEOXYRIBOSYLTRANSFERASE; active site, alpha/beta protein, biocatalyst, nucleoside, TRANSFERASE; HET: 5MD; 2.4A {Lactobacillus leichmannii} SCOP: c.23.14.1,,,1F8Y_A,97.5806,99.6 SIF-Syn: [nucleoside deoxyribosyltransferase, upstream gene has NKF, downstream gene is LAGLIDADG endonuclease, just like in phage Janeemi] /note=Primary Annotator Name #1: Sacristan, Ariana /note=Auto-annotation: Glimmer and Genemark are both used, but they don`t agree on the start site. Glimmer indicates 23804 as the start (GTG) and Genemark indicates 23828 as the start (ATG). /note=Coding Potential: This gene demonstrates reasonable coding potential predicted within the putative ORF that contains either of the chosen starts site. /note=SD (Final) Score: The Final score for the selected start site -3.627. Although this was not the best site, this piece of data can be ignored, as the 4 bsp overlap suggests that this gene is part of an operon. /note=Gap/overlap: Yes the overlap of 4 is reasonable as it indicates that the gene is part of an operon where the start codon overlaps with the upstream stop codon. This start site also includes the true LORF. /note=Phamerator: As of 02/06/2023 this gene is located in Pham 68744. This pham has several other AZ cluster group members, such as Janeemi and Phives, which were used to compare against Vroom Vroom for synteny. The function consistently called in Phamerator for this gene was nucleoside deoxyribosyltransferase. /note=Starterator: There is a reasonable start site choice that is conserved among the Pham 68744. It is the Start 32 which is called in 22/85 non-draft genes in the pham. However, the start site for this gene is called at Start 33, which is very close to the "most annotated" start, suggesting an evolutionary change. /note=Location call: Collectively the evidence suggests that this is a real gene with the potential start site most likely located at 23804. /note=Function call:The predicted function of this ORF is a nucleoside deoxyribosyltransferase. NCBI and PhagesDB BLASTp provided several strong hits; the top two hits from each database have nucleoside deoxyribosyltransferase as the called function and with corresponding E-values below 2e-28. HHPred also had numerous hits for a nucleoside deoxyribosyltransferase. A top PDB HHPred hit for nucleoside deoxyribosyltransferase has a probability of 99.58%, a coverage of 97.58% and an E-value of 1.7e-13. Additionally, there was a hit for CDD corresponding with an RCL superfamily that corresponded to nucleoside 2-deoxyribosyltransferase function with an E-value of 9.89e-03. /note=Transmembrane domains: There were no predicted TMDs by TMHMM, therefore it is not a membrane protein. /note= /note=Primary Annotator Name #2: Estampa, Julia /note=Auto-annotation: Glimmer and GeneMark don`t agree on the start site. Glimmer suggested the start site is at 23804bp, while GeneMark indicated it is at 23828bp. Glimmer`s reported start codon is GTG, while GeneMark`s is ATG. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. Host-Trained and Self-Trained GeneMark demonstrates strong coding potential that is consistent with the ORF. /note=SD (Final) Score: The SD score is -3.627, which is not the best from the list. However, since this gene has an overlap of 4bp, this suggests that this may possibly be an operon and thus the SD score may be ignored. /note=Gap/overlap: The gap is -4 bp, which suggests an overlap of 4 bp. It is small, reasonable, and also possibly suggests the presence of an operon. The length of the gene is acceptable. /note=Phamerator: Pham: 68744. Date found: 02/07/23. It is conserved with another draft phage, phage Adolin (AZ), phage Asa16 (AZ), and phage YesChef (AZ). Phamerator consistently called the function for this gene to be a nucleoside deoxyribosyltransferase. /note=Starterator: Start 33 @23804 has 1 manual annotation, while start 64 @23828 has 4 manual annotations. However, start 33 @23804 has the LORF and covers more coding potential than start 64. /note=Location call: The gathered evidence suggests that this gene is real and that the start site is most likely located at 23804 bp. /note=Function call: Nucleoside deoxyribosyltransferase. PhagesDB BLAST and NCBI BLASTp both called the function for multiple phages with this gene to be a nucleoside deoxyribosyltransferase. Results from PhagesDB BLAST revealed good scores and reasonably low E-values (top hit score is 131 and the E-value reported is 5e-31). Top hit score for NCBI BLASTp is 119 with an E-value of 1e-31. The second top hit in CDD yielded nucleoside 2-deoxyribosyltransferase with an E-value of 9.89e-03 (Accession number COG3613 ). Top 4 hits in HHpred revealed unknown function, while the 5th hit suggested nucleoside 2-deoxyribosyltransferase with an E-value of 1.7e-13, score of 85.05, probability of 99.58%, and 15% identities. /note=Transmembrane domains: Because there are no predicted TMHs or TMDs returned from DeepTMHMM, this suggests it is not a membrane protein. CDS 24175 - 24591 /gene="34" /product="gp34" /function="LAGLIDADG endonuclease" /locus tag="VroomVroom_34" /note=Original Glimmer call @bp 24175 has strength 9.82; Genemark calls start at 24175 /note=SSC: 24175-24591 CP: yes SCS: both ST: NI BLAST-Start: [LAGLIDADG family homing endonuclease [Herbiconiux moechotypicola]],,NCBI, q9:s2 93.4783% 2.45015E-66 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.532, -4.3934175870462076, yes F: LAGLIDADG endonuclease SIF-BLAST: ,,[LAGLIDADG family homing endonuclease [Herbiconiux moechotypicola]],,WP_259478894,89.1473,2.45015E-66 SIF-HHPRED: RRNA intron-encoded endonuclease; protein-DNA complex, LAGLIDADG, homing, endonuclease, DNA recognition, HYDROLASE-DNA COMPLEX; 2.5A {Vulcanisaeta distributa},,,3E54_A,78.2609,99.4 SIF-Syn: LAGLIDADG endonuclease; upstream gene is nucleoside deoxyribosyltransferase just like in phages Janeemi and Tuck, although VroomVroom has a different downstream gene than these phages. Downstream gene appears to be inserted between LAGLIDADG endonuclease and its downstream gene, recombination directionality factor. This insertion site appears shared, for example phages Liebe and Maureen have genes inserted at this site. /note=Primary Annotator #1 Name: Gowdy, Griffin /note=Auto-annotation: Glimmer and Genemark; same start site @ 24175; start codon: ATG /note=Coding Potential: Reasonable, agreed upon between self- and host- trained algorithms. Chosen start included. /note=SD (Final) Score: Final score = -4.393. However, this score is likely irrelevant, as stop@24591F is likely transcribed with its upstream neighbor. /note=Gap/overlap: There is an overlap of -4, which is reasonable for a polycistronic operon. /note=Phamerator: Pham 60911; 2/7/23. This pham is well conserved in clusters AZ and EH. Currently, there are 42 non-draft members of this pham, including in phages Powerpuff, Reedo, DrSierra, and Amyev. The function endonuclease is consistently called for this gene. /note=Starterator: VroomVroom does not share the most conserved start site (start 23, shared by 28/42 manually annotated genes), and multiple possible starts exist equidistant from the most conserved. Therefore, Starterator is uninformative /note=Location call: All together, there is strong evidence that this gene is real, and that the start site is at 24175. This start includes all coding potential, results in the longest ORF, uses a favorable start codon, results in a favorable overlap, and is generally well supported by the available evidence. /note=Function call: LAGLIDADG endonuclease. Multiple BLASTp hits with favorable e-values and % identities, a hit in CD search, and numerous strong hits in HHpred support calling this gene as an LAGLIDADG endonuclease. No requirements are listed for this function, however upon manual inspection, the sequence LAGLLEGEG is present in this region. While the actual sequence has diverged, the character of each amino acid remains consistent. PhagesDB and NCBI BLASTp hits had low e-values (less than e-57) and % identities > 35%. Further, traits mentioned here https://seaphages.org/forums/topic/5401/ are all displayed: the LAGLLEGEG motif is preceded by an aromatic amino acid, and this gene displays the requisite secondary structure of alpha-beta-beta-alpha-beta-beta-alpha (per HHpred). /note=Transmembrane domains: None per DeepTMHMM. This supports the functional call of LAGLIDADG endonuclease, because homing endonucleases must be soluble to locate their recognition sequence for cleavage. /note= /note=Primary Annotator #2 Name: Scriven, Savannah /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 24175. ATG start codon. /note=Coding Potential: The ORF has high coding potential on the forward strand in both GeneMark Self and Host, indicating that this is a real forward gene. Start site includes all coding potential. /note=SD (Final) Score: -4.393. This is the best final score on PECAAN, although SD score is irrelevant as this gene is part of an operon. Z score for start is highest at 2.532. /note=Gap/overlap: Upstream overlap of -4 and downstream gap of 0 indicate gene is part of an operon. /note=Phamerator: (02/06/23) Pham 60911 is in 27 of 40 cluster AZ phages (e.g. Janeemi, Liebe, Maureen, DrSierra). Function in every non-draft genome is called as either an endonuclease, LAGLIDADG endonuclease or HNH endonuclease. /note=Starterator: Not Informative, VroomVroom does not have most conserved start site. /note=Location call: Based on above evidence, this is a real gene and the most likely start site is 24175. Start site includes all coding potential, is the LORF, and leaves reasonable gap lengths characteristic of an operon. /note=Function call: Predicted function is endonuclease based on hits from NCBI BLAST and phagesDB BLASTp. PhagesDB BLAST has many strong hits with the suggested function with small e values of 8e-58 to 1e-50. Top hits had 83% identity and 87% coverage. NCBI had similar strong hits calling the same function with 75 - 83% identity, 94% coverage, and e values 9e-66 to 5e-72. There was one CDD hit calling a LAGLIDADG domain, which indicates this gene encodes an endonuclease belonging to the LAGLIDADG family. HHPRED had strong hits (>99.5% probability, >74% coverage) calling homing endonuclease function with low e values (9e-14 to 3e-12). Hits from CDD and HHPRED aligned with unique SEAPHAGE guidelines used to call function as LAGLIDADG endonuclease. /note=Transmembrane domains: DeepTMHMM predicts 0 TMDs. Not a membrane protein. CDS 24591 - 25298 /gene="35" /product="gp35" /function="ThyX-like thymidylate synthase" /locus tag="VroomVroom_35" /note=Original Glimmer call @bp 24591 has strength 13.86; Genemark calls start at 24591 /note=SSC: 24591-25298 CP: yes SCS: both ST: SS BLAST-Start: [thymidylate synthase [Gordonia phage Neville] ],,NCBI, q7:s5 97.0213% 1.19405E-70 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.638, -4.174392870950849, no F: ThyX-like thymidylate synthase SIF-BLAST: ,,[thymidylate synthase [Gordonia phage Neville] ],,YP_010245955,65.3846,1.19405E-70 SIF-HHPRED: Thymidylate synthase ThyX; Tetramer, UMP/dUMP methylase, ThyX homolog, TRANSFERASE; HET: 5BU, FAD; 1.76A {Streptomyces cacaoi subsp. asoensis} SCOP: d.207.1.0,,,4P5A_A,99.1489,100.0 SIF-Syn: ThyX-like thymidylate synthase, the upstream gene is LAGDILAG Endonuclease, downstream is likely recombination directionality factor, just like in phage Kaylissa. The actual gene is not present in any other cluster AZ final annotations. /note=Primary Annotator Name: Ismael Sindha, /note=Auto-annotation: 24591 start site /note=Coding Potential: strong through entire ORF /note=SD (Final) Score: -4.174 /note=Gap/overlap: -1 /note=Phamerator: /note=Starterator: /note=Location call: /note=Function call: /note=Transmembrane domains: /note= /note=Primary Annotator Name: Hamid, Bilal /note=Auto-annotation: Start sites at 24591 for both programs /note=Coding Potential: Strong coding potential shown throughout the whole ORF. /note=SD (Final) Score: -4.174 is the highest final score /note=Gap/overlap: -1 bp gap indicates a likely operon /note=Phamerator: 02/07/23 - pham is 118 which includes 379 members 35 of which are drafts, none of which are from the same cluster. Some other phams represented are BD, BK, C, CD, GD, EL, DU, FC. Across the pham, the gene is almost exclusively >90% called as a ThyX-like thymidylate synthase, with occasional calls as just ThyX or thymidylate synthase covering all other instances. /note=Starterator: Indicates start site 57 @24591 as the autoannotated site. Likely the best site as it overlaps by only 1 bp with the previous gene. Start 58 was the most often called site in 238/339 genes. This gene did not have start site 58. Site 57 was called 66.7% of the time when it appeared for 3 total MA`s. /note=Location call: 24591 is likely the best call based on the collection of starterator and phamerator data as well as coding potential. /note=Function call: ThyX-like Thymidylate synthase based on very strong CCD and HHpred hits indicating a very strongly conserved function. The CDD hit indicated a FAD-Dependent ThyX-Thymidylate Synthase with an e-value of 9.15e-56. While the best PDB hit called the function called a UMP/dUMP methylase PolB, the best PFam hit indicated Thymidylate synthase complementing protein with 100% probability and an e-value of 1.9e-31. /note=Transmembrane domains: 0 TMDs were called by DeepTMHMM which makes sense as the function does not specifically need a membrane binding to better function. CDS 25446 - 26147 /gene="36" /product="gp36" /function="recombination directionality factor" /locus tag="VroomVroom_36" /note=Original Glimmer call @bp 25446 has strength 16.82; Genemark calls start at 25446 /note=SSC: 25446-26147 CP: yes SCS: both ST: SS BLAST-Start: [recombination directionality factor [Arthrobacter phage Tuck]],,NCBI, q1:s1 100.0% 5.61151E-123 GAP: 147 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.541, -4.119075175954406, yes F: recombination directionality factor SIF-BLAST: ,,[recombination directionality factor [Arthrobacter phage Tuck]],,WAB10807,82.7004,5.61151E-123 SIF-HHPRED: Gp3-like ; Recombination directionality factor-like,,,PF18897.3,87.5536,100.0 SIF-Syn: /note=Primary Annotator #1: Smith, Steven /note=Auto-annotation: Glimmer and GeneMark both display the same start site. /note=Coding Potential: There is coding potential shown on both the host-trained and self-trained GeneMark /note=SD (Final) Score: -4.119, while it is not the best final score shown it is still very good. /note=Gap/overlap: There is a 147 bp gap which is very large, but also very similar to other gaps in the genomes of other AZ cluster phages so I do not think adding another gene is necessary. /note=Phamerator: 2/7/23. Pham 848 which is found in multiple other AZ cluster phages like Adolin and Adumb2043. /note=Starterator: Start 35 is the start called for this gene and is also the most commonly called start at 46 of 99 non-draft genomes. /note=Location call: Looking at the above evidence this call is most likely a functional gene starting at 24556. /note=Function call: NKF. PhagesDB and NCBIBlast called multiple hits with low e-values that pointed towards this protein being a recombination directionality factor. However, HHPred only gave one result and CDD gave zero which is not significant enough evidence to make a function call so this gene should be labeled as NKF /note=Transmembrane domains: No TMHMM domains were called so this gene does not encode for a transmembrane protein. /note= /note=Primary Annotator #2: Hernandez, Edgar /note=Auto-annotation: Glimmer and GeneMark both display the same start site #25446, with the start codon AGT. /note=Coding Potential: There is a reasonable amount of coding potential present in both the Host-Trained and Self GeneMark. The chosen start site includes all the coding potential. /note=SD (Final) Score: There’s an SD Final Score of -4.011, which is indicative of a higher sequence match. Meanwhile, the Z-score is 2.541, which is good since anything higher than 2 indicates that the RBS was above the mean. /note=Gap/Overlap: A gap of 147 indicates that there’s no overlap present, and although the gap is on the bigger size, the gap seems to be somewhat conserved when compared to the other genes in the genome. /note=Phamerator: The gene is located in Pham 848, and there are other AZ cluster group members like Brahms and Dismas, which were used to compare synteny with VroomVroom. The function of the gene is unknown as of now (NKF). /note=Starterator: There’s strong evidence suggesting that the start site 35 at #25466 is conserved across all members of Pham 848 because 46 out of 99 final genes have claimed it as a real start site. /note=Location call: Based on all the pieces of evidence gathered from Pharmerator, Starterator, synteny comparison, and coding potential, the possible start site for the gene is #25446. /note=Function call: PhagesDB and NCBI BlastP both called for significant hits with recombination directionality factor as the gene function. Hits Tuck_33 and Yang_30 yielded significant low E-values when examined with the AZ cluster . Additionally, CDD did not provide any useful information seeing as no hits were found. However, PDB HHpred suggested a hit with 100% probability that also possessed a small E-value. SEA-PHAGES does not require the presence of any protein to claim the recombination directionality factor as the gene function, and thus the gene function would also be NKF. /note=Transmembrane Domains: TMHMM predicted 0 transmembrane domains and as a result, the gene is definitely not a membrane protein since 2 transmembrane domains are required. CDS 26147 - 26347 /gene="37" /product="gp37" /function="membrane protein" /locus tag="VroomVroom_37" /note=Original Glimmer call @bp 26147 has strength 17.96; Genemark calls start at 26147 /note=SSC: 26147-26347 CP: yes SCS: both ST: NA BLAST-Start: GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.131, -4.455662434380964, yes F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: Ryan: Synteny does not tell us much additional information, as the upstream and downstream genes (Genes 36 and 38 respectively), were both genes of unknown function. Comparing the synteny to phages Adolin and Ascela additionally provided little additional information. Kaemin: Synteny does not give us any information about the genes, as both of the upstream and downstream genes (genes 36 and 38, respectively) are also of unknown function. Additionally comparing the gene with other phages provides little information as well. /note=Primary Annotator Name: Tosasuk, Kaemin /note=Auto-annotation: Both GeneMark and Glimmer agree that there is a start site at 26147. /note=Coding Potential: There is good coding potential on both the self-trained and host-trained gene mark on the start site 26147. /note=SD (Final) Score: -4.456, which is the least negative value. /note=Gap/overlap: Overlap of -1, which is the smallest out of all the possible values. There is also /note=Phamerator: 2/6/2023. The pham number was 61906, with no other members within this Pham. /note=Starterator: Starterator was not helpful as there are no other members of the Pham. /note=Location call: This location for the start site does seem accurate as it has the overall highest final score, Z-score, and the smallest possible overlap/gap which is also under 4 bp. /note=Function call: NKF, all matches including those found on HHPREd, CDD hits, and BlastP had high E-values. These results indicate that this protein`s function is most likely NKF. /note=Transmembrane domains: 2 TMDs, 20 amino acids each in length as per Deep TMHMM. /note= /note=Annotator #2: Hoang, Ryan /note=Auto-annotation: Both GeneMark and Glimmer state that there is a start site at 26147. /note=Coding Potential: There is coding potential in both of the Host-trained and Self-trained GeneMarks. They both are in the 2nd ORF. However, it looks like there is some tapering off of coding potential before the stop sequence. Still, it appears that there is sufficient coding potential to say that there is a real gene. There is coding potential that starts right after the suggested start site, indicating that the start site encapsulates the coding potential of this gene. /note=SD (Final) Score: -4.456 and Z-score of 2.131. This is higher than the alternative start sites present, indicating this is the strongest candidate for a start site. /note=Gap/overlap: There is a 1bp overlap between the upstream gene and this gene. There is also a 3bp overlap between the downstream gene as well. These are very reasonable overlaps in general, indicating that the start site that was selected is sensible. /note=Phamerator: Phamerator was run on 2/6/2023. The pham number was 61906. There were no other members of the pham, suggesting that it was an orpham. /note=Starterator: Starterator was not informative because the pham was an orpham. /note=Location call: Overall, this location of the start site does indeed look accurate. It has the highest final score and Z-score. Furthermore, any overlaps are reasonable, and below the 4bp requirement present. /note=Function call: NKF; While HHpred did predict that there was some indication that this gene may be a membrane and cell surface protein from E. coli, the E-values were rather high, with E-values of 4.8 and 8.9 for the highest scores. Furthermore, there were no other hits on Blast from NCBI or Blast on PhagesDB, indicating that this is most likely best to be called as a NKF. /note=Transmembrane domains: There were no transmembrane domains indicating that this is not a membrane protein. CDS 26344 - 26469 /gene="38" /product="gp38" /function="hypothetical protein" /locus tag="VroomVroom_38" /note=Original Glimmer call @bp 26344 has strength 9.26; Genemark calls start at 26344 /note=SSC: 26344-26469 CP: yes SCS: both ST: SS BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.532, -6.4610646886663305, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator #1 Name: Tran, Krysten /note=Auto-annotation: Both Glimmer and GeneMark; both agree on the same start site at 26344; Start codon called - GTG /note=Coding Potential: The gene does have reasonable coding potential predicted within the putative ORF and the start site does cover all the coding potential. /note=SD (Final) Score: -6.641; The start site does not have the best RBS score, as it is the more negative score out of the two start sites. However, I still believe it is the better start because the gap of -4 bp is more reasonable than the gap of -259. /note=Gap/overlap: There is a 4 overlap between this gene’s suggested start and the previous gene’s stop; This is not the longest ORF, as the longest ORF ha a start at 26089, however, it is the most reasonable ORF because the overlap is only 4 while the other start has an overlap of 259. Therefore, it is more likely that the start at 26344 is the start to this gene as the other start site may indicate a separate gene. Additionally, a 4 bp overlap is indicative of an operon, which is a more reasonable justification for an overlap at the start site. /note=Phamerator: The gene is found in the pham 48566 as of 02/08/23. The gene is conserved in the other member of the cluster AZ, only in the draft Emotion. /note=Starterator: There are no other non-draft genes in the pham. Start number 4 is at position 26344 in this phage. This start site had no manual annotations and is consistent with the start sites called by both Glimmer and GeneMark. /note=Location call: Based on the above evidence, the start site is most likely 26344 (start codon: GTG). /note=Function call:Based on the BLAST results for the sequence in addition to the CDD and HHpred data, there is not enough data to hypothesize a function. PhagesDB BLASTp only had two results, one was VroomVroom and the other was a draft gene in Emotion. For Emotion, the function was labeled as unknown, which was uninformative in forming a hypothesis about the gene function. Additionally, there were no results in NCBI BLASTp. CDD had no hits at all for this gene. For HHpred, none of the hits meet all the requirements for being a good hit (values are not in the target range listed on the slides). The majority of the results have a low probability, low percent coverage, and high E-values. I was not able to identify any related genes or their functions to hypothesize a function for this gene. /note=Transmembrane domains: There are NO TMDs present for this protein based on the text and the topology graph as well as the probability graph. The absence of TMDs makes sense because there is no hypothesized function for this gene. /note= /note=Primary Annotator Name #2: Hughes, Audia /note=Auto-annotation: Both; Agree on start site 26344; Start codon GTG /note=Coding Potential: Reasonable Coding Potential; Start Site includes all coding potential /note=SD (Final) Score: -6.46, RBS score negligible as gap is -4 basepairs suggesting an operon. /note=Gap/overlap: -4, length of gene acceptable given start acceptable. Chose this option over candidate with longer ORF as the longer ORF candidate had an unreasonable gap (-259) /note=Phamerator: Investigation Date; 2-3-23, pham # 48566. Gene is present in genome draft Emotion of same cluster AZ. No function called for gene. /note=Starterator: Conserved start site: 4, coordinate: 26344, 2/2 call site #4, MEMBERS ARE ALL DRAFT ANNOTATIONS. /note=Location call: 26344 potential start site most likely, coding potential suggests real gene. 2 start site candidates were proposed, start site with the smallest overlap was chosen. Gene candidate with overlap -4 favored over gene candidate with overlap -259 /note=Function call: No program returned informative results. No CDD, No NCBI, all HHPRED have e-value above 19, phagesdb blast contains only a draft gene for comparison that has no known function. /note=Transmembrane domains: No transmembrane proteins detected. Does not assist in supporting the hypothesis of function as there is no known function of the gene in question. CDS 26557 - 26892 /gene="39" /product="gp39" /function="NrdH-like glutaredoxin" /locus tag="VroomVroom_39" /note=Original Glimmer call @bp 26557 has strength 15.59; Genemark calls start at 26557 /note=SSC: 26557-26892 CP: yes SCS: both ST: SS BLAST-Start: [glutaredoxin-like protein NrdH [Devriesea agamarum]],,NCBI, q6:s3 67.5676% 1.52088E-10 GAP: 87 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.037, -2.970161406017234, yes F: NrdH-like glutaredoxin SIF-BLAST: ,,[glutaredoxin-like protein NrdH [Devriesea agamarum]],,WP_058234513,55.1724,1.52088E-10 SIF-HHPRED: c.47.1.0 (A:) automated matches {Baker`s yeast (Saccharomyces cerevisiae) [TaxId: 559292]} | CLASS: Alpha and beta proteins (a/b), FOLD: Thioredoxin fold, SUPFAM: Thioredoxin-like, FAM: automated matches,,,SCOP_d2m80a_,64.8649,99.1 SIF-Syn: NrdH-like glutaredoxin, upstream gene is Holliday junction resolvase and downstream gene is function unknown, just like in phage Elezi /note=Primary Annotator Name: Unanwa, Nnaemeka /note=Auto-annotation: Start site at 26557 in both Glimmer and GeneMark. ATG codon predicted at this site, which is a common start codon. /note=Coding Potential: Yes, there is reasonable coding potential where the gene is located on the forward strand. The Glimmer and GeneMark start site covers the entire coding potential. /note=SD (Final) Score: -2.970. This is the most positive RBS final score, making it the the most likely start site. /note=Gap/overlap: 87 BP. This is reasonable bc there is little space for another gene to be inserted and this gap has synteny with other phages. /note=Phamerator: Pham 68549 on 01/27/23. Conversed in VResidence_33 and Warda_33. /note=Starterator: The most annotated start in the Pham is start 109, but start 109 does not exist for our specific phage. The most annotated start in our phage`s gene is start 99, with 7 MAs. This corresponds to the start at 26557. /note=Location call: This gene is most likely real gene at start 26557 due to synteny, coding potential, following annotation guideline rules, etc. /note=Function call: This gene is likely a NrdH-like glutaredoxin protein. The top non-draft phagesDB hits (Jinkies_45 and DS6A_74) had did not have strong e-values but still had results that pointed to the NrdH-like glutaredoxin protein. The top hits from the NCBI BLASTp are from Devriesea agamarum and Bifidobacterium phasiani, which also encode for glutaredoxin-like protein NrdH (e-values of 2e-10 and 3e-10 respectively). There is one specific hit on CDD for a NrdH family protein (e-value 2.76e-10) and a HHpred hit, SCOP_d2m80a_, for a yeast dithiol glutaredoxin protein (e-value 1.8e-8). This makes me fairly confident that this gene is a NrdH-like glutaredoxin protein due to the low e-values of the results. /note=Transmembrane domains: No transmembrane domains detected on DeepTMHMM, meaning that this is not a membrane protein. /note= /note=Primary Annotator #2 Name: Kim, Cindy /note=Auto-annotation: Glimmer and GeneMark both called the start site at 26557 bp. /note=Coding Potential: There is good coding potential on the forward reading frame only, indicating that it is a forward gene (on both Host and Self). /note=SD (Final) Score: -2.970. This is the best Final score on PECAAN. /note=Gap/overlap: Gap: 87 bp. This is a reasonable gap and appears to be conserved in other phage genomes as well (YesChef and Powerpuff). /note=Phamerator: Pham 68549 on 2/3/23. It is conserved, found in Adonis and Adumb (AZ). /note=Starterator: The most annotated start site was site 109, with 142 out of 853 non-draft genes calling this, but this site does not exist in this gene. Instead, start site 99 corresponds to a start site at 26557 bp, which is in agreement with the Glimmer and GeneMark prediction. Start site 99 was manually annotated 7 times. /note=Location call: Based on the above evidence, this is a real gene with a likely start site at 26557 bp. /note=Function call: NrdH-like glutaredoxin. PhagesDB BLAST’s top two hits were draft genomes with unknown function but low E values (8e-63 to 9e-29), but the next top hits from non-draft genomes had a function of NrdH/NrdH-like glutaredoxin (worse E values of 5e-9 to1e-8). The top hits from NCBI Blastp also yielded functions of NrdH-like glutaredoxin with 41-45% identities. CDD has a hit with 70.27% coverage and an e value of 2.46e-10 that corresponds to a NrdH-like glutaredoxin. HHpred showed a hit with a thioredoxin function with a 99.1 probability. /note=Transmembrane domains: DeepTMHMM predicted zero TMDS. Therefore this gene is not a membrane protein. CDS 26879 - 27325 /gene="40" /product="gp40" /function="Holliday junction resolvase" /locus tag="VroomVroom_40" /note=Original Glimmer call @bp 26879 has strength 11.25; Genemark calls start at 26879 /note=SSC: 26879-27325 CP: yes SCS: both ST: SS BLAST-Start: [holliday junction resolvase [Arthrobacter phage BaileyBlu]],,NCBI, q1:s1 92.5676% 2.33187E-61 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.615, -5.907662168305453, no F: Holliday junction resolvase SIF-BLAST: ,,[holliday junction resolvase [Arthrobacter phage BaileyBlu]],,UJQ87175,81.4286,2.33187E-61 SIF-HHPRED: Holliday junction resolvase; archeal holliday junction resolvase helicase DNA binding enzyme phage 15-6 thermus thermophilus, RECOMBINATION; HET: MSE, SO4; 2.5A {Thermus thermophilus phage 15-6},,,7BGS_B,76.3513,99.6 SIF-Syn: Holiday junction resolvase, upstream gene is NrdH-like glutaredoxin, just like in phage Asa16. Downstream gene is NKF belonging to an orpham. /note=Primary Annotator #2 Name: Kretschmer, Thomas /note=Auto-annotation: GLIMMER and GENEMARK both predict the start site to be 26879. The start codon is predicted to be ATG. /note=Coding Potential: There is strong evidence of coding potential in this gene with the predicted start site covering the entirety of the coding potential. /note=SD (Final) Score: The above start site`s final score is -5.908 which is not the best. This start site is still fairly reasonable, but there appears to be a more reasonable start site at 26978 with a final score of -3.348. /note=Gap/overlap: With the suggested start site, there is a 14 bp overlap. This is fairly large, the alternative being to start at 26978 with an 85bp gap, the smallest gap without any overlap. The length of this gene is reasonable. /note=Phamerator: 62018 as of 2/10/2023. This pham is conserved within the cluster as noted by phages: Adolin_35, Adumb2043_16, and Amyev_35. The function of this pham is Holliday Junction resolvase. /note=Starterator: There is a reasonable start site that is conserved at 26879. The start codon is ATG. /note=Location call: The evidence suggests this is a real gene, and starts at 26978. start site 71, 31 of 237 manual annotations call this start site. /note=Functional call: Holiday Junction Resolvase /note=Transmembrane domains: None, this is marked as an "inside" gene. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: /note= /note=#2 /note=Primary Annotator Name: Vajragiri, Shreya /note=Auto-annotation: Both Glimmer and GeneMark call the gene, both agreeing the start site is 26879. /note=Coding Potential: There is strong coding potential in both the Self-trained and Host-trained GeneMark; coding potential is on the forward strand, therefore it is a forward gene. /note=SD (Final) Score: -5.908; this is the 4th best RBS score; there were better RBS scores but those caused there to be either gaps upstream of the start site where there is strong coding potential, or even larger overlaps. /note=Gap/overlap: -14bp. This is a pretty large overlap and pretty unusual. However, all alternate start sites either widened the overlap or created large gaps where there was strong coding potential. Further, slight overlaps (~3bp) have been conserved between this gene and the preceding one (i.e. Liebe) /note=Phamerator: 1/27/23 - Pham is 62018. Gene is conserved in phages Liebe, KeAlli, Adolin, and most phages in AZ cluster. Function call is holliday junction resolvase, which aligns with other members of the Pham in the AZ cluster (Liebe, etc.) /note=Starterator: The start site is called to be 71 by 55/302 genes in the Pham, and manually called by 31/237. It is called 98.2% of the time when present, and is the start site for most Phams belonging to AZ phages. It corresponds to start position 26879. Agrees with Glimmer/GeneMark. /note=Location call: Evidence supports that this is a real gene, and its start site is 26879 (ATG). /note=Functional call: PhagesDB, NCBI Blast, and HHPred supports that this protein is a holliday junction resolvase. Multiple PhagesDB and NCBI BLAST hits for other phages, with e-values of around 3e-50, and many HHPred hits with high probabilities and identities. /note=Transmembrane domains: DeepTMHMM does not call any transmembrane domains. It does say that it is an ‘inside’ protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS complement (27322 - 27510) /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="VroomVroom_41" /note=Genemark calls start at 27510 /note=SSC: 27510-27322 CP: yes SCS: genemark ST: NA BLAST-Start: [hypothetical protein SEA_ITER_38 [Arthrobacter phage Iter]],,NCBI, q4:s8 88.7097% 0.00865706 GAP: 124 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.037, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ITER_38 [Arthrobacter phage Iter]],,URQ05026,36.7816,0.00865706 SIF-HHPRED: SIF-Syn: /note=Primary Annotator #1 Name: Vanderpool, Lauren /note=Auto-annotation: GeneMark, start 27510 /note=Coding Potential: There is sufficient coding potential in the reverse direction, without any overlap. /note=SD (Final) Score: -5.345, which correlates to the best z-score listed above. /note=Gap/overlap: 2 bp, this is well below the recommended max gap /note=Phamerator: The pham number is 69987 (as of 2/8/23). It is the only gene in that pham. /note=Starterator: The report has not been made /note=Location call: 27657 /note=Function call: There are very few hits, so the function is unknown based on lack of evidence /note=Transmembrane domains: /note= /note= /note=Primary Annotator #2 Name: Le, Vivian /note=Auto-annotation: GeneMark calls for a start of 27510. Glimmer does not call a start. /note=Coding Potential: There is reasonable coding potential in the reverse direction. There does not seem to be any overlap with genes in the reverse direction either. /note=SD (Final) Score: -2.443. It is the best final score on PECAAN. /note=Gap/overlap: 124 bp. This is a reasonable gap and there is no coding potential in the gap to suggest a new gene should be added. /note=Phamerator: The pham number as of 02.08.2023 is 69987. It is the only gene in that pham. /note=Starterator: A report was not found/has not been made. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 27510. However, since it is the only gene in the pham and there was no starterator report, the current suggested start site is not informative enough. /note=Function call: For now, the function is unknown, because there are very few to little hits. /note=Transmembrane domains: CDS 27635 - 30127 /gene="42" /product="gp42" /function="DNA primase/helicase" /locus tag="VroomVroom_42" /note=Original Glimmer call @bp 27635 has strength 12.48; Genemark calls start at 27635 /note=SSC: 27635-30127 CP: yes SCS: both ST: SS BLAST-Start: [DNA primase [Arthrobacter phage DrManhattan] ],,NCBI, q1:s1 100.0% 0.0 GAP: 124 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.362, -3.898968357315439, no F: DNA primase/helicase SIF-BLAST: ,,[DNA primase [Arthrobacter phage DrManhattan] ],,YP_009815380,85.1852,0.0 SIF-HHPRED: DNA primase; Helicase, DNA binding, AMPPNP, REPLICATION; HET: ANP; 3.1A {Staphylococcus aureus},,,7OM0_B,43.8554,100.0 SIF-Syn: DNA primase/helicase, upstream and downstream genes are both NKF, just like in Reedo. /note=Primary Annotator #1 Name: Vu, Thomas /note=Auto-annotation: GeneMark and Glimmer both agree that the start site is at 27635. The start codon is ATG. /note=Coding Potential: There is high gene coding potential across the region suspected to be the real gene. The self-trained and host-trained GeneMarks agree with this conclusion as well. /note=SD (Final) Score:-3.899 (2nd best). The gene candidate corresponding to this SD score is still the best possible candidate since it still has a very strong SD score but does the most to minimize the gap between this gene and the one before it. /note=Gap/overlap: Gap of 124 BP, indicating there is no overlap and the spacing of 124 BP is reasonable. There is also synteny with other genes on Pham Maps. This start candidate is the most favorable since it has a strong SD score but minimizes the gap between the previous gene. The length of the gene is acceptable. /note=Phamerator: The pham is 880 and the analysis was run on 01/27/23. The pham is in other clusters besides AZ such as EB and BL. The phages compared against VroomVroom were AbbeyMikolon and Abigail phages. /note=Starterator: The start site is conserved among other members of the pham. The start site number is 42 which corresponds to a BP of 27635. There are 134 members in this pham of which 97 are non-draft genomes. 50/97 call this start site of 42. /note=Location call: This is a real gene with a likely start site of 27635 BP. /note=Function call: The function of this gene is likely DNA primase/DNA helicase activity. This is supported by the CDD database. For example, a protein hit was recorded as the COG3378 superfamily which had an e-value of 7.25e-88 and coverage of ~55% described the function as being a phage-associated DNA primase. This strong e-value as well as a strong score provides evidence that our gene, which shares conserved domains with COG3378, also has a similar function of being a DNA primase. Likewise, the HHpred supported the conclusion that the gene’s function is DNA primase as there were multiple strong hits (such as 8APM with an e-value of e-31 and a probability of 100) that also described their protein’s function as “DNA primase/helicase”. Lastly, NCBI Blast contained multiple hits at 100% coverage and an e-value of 0, supporting the gene function of DNA primase/helicase. /note=Transmembrane domains: No TMDs were predicted by TMHMM or TOPCONs, so no transmembrane function inferred. /note= /note=Primary Annotator #2 Name: Li, Anna /note=Auto-annotation: Both Glimmer, GeneMark; agree at the same start site (site #: 27635); start codon called: ATG and GTG (2 TTG) /note=Coding Potential: Yes. gene has reasonable coding potential. Chosen start site covers all of the coding potential. /note=SD (Final) Score: -3.899 (2nd best, reasonable to suggest a credible RBS) /note=Gap/overlap: +124bp (Upstream gap is reasonable considering synteny with other final genomes) /note=Phamerator: 880 as of 2023-02-08, other non-draft members of subcluster (AZ) contained this pham (e.g. Amyev, Asa1) /note=Starterator: Most manually annotated start site 42 in 50/97 non-draft genomes. Start site 42 @27635 in VroomVroom. /note=Location call: Gene is most likely real; start site 27635 seems most likely based on coding potential, Glimmer and Genemark start site call and Starterator /note=Function call: DNA primase/helicase: CDD Hits call for DNA primase and related activity with low E values, and thus are more likely to be positive hits for the function of this gene. HHpred hits predicted both primase and helicase function with low E-values (<10^-18), high probability (>99%) and acceptable coverage (>40%) using 7OM0_B and PF04735.15. No additional information is necessary to call the DNA primase/helicase function. Top hits from PhagesDB and NCBI Blast gave E-values of 0, with all genes giving DNA primase/helicase hits (Adolin, DrManhattan, YP_009815380). /note=Transmembrane domains: TMHMM does not call any transmembrane domains in this region of the genome. CDS 30137 - 30256 /gene="43" /product="gp43" /function="hypothetical protein" /locus tag="VroomVroom_43" /note=Original Glimmer call @bp 30137 has strength 17.24; Genemark calls start at 30137 /note=SSC: 30137-30256 CP: yes SCS: both ST: NI BLAST-Start: GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.126, -4.325675514574479, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=#1 /note=Primary Annotator Name: Wang, Xinyi /note=Auto-annotation: Glimmer and GeneMark indicates the start codon at 30137. /note=Coding Potential: There is coding potential in both forward and reverse directions, but the forward direction is slightly better than the reverse one. However, since the gene length is less than 120 bp, further examination is needed to find more evidence. /note=SD (Final) Score: -4.326 /note=Gap/overlap: For the start codon, a 9bp gap before the previous gene stop site, and a 1bp gap before the last codon stop. For the stop codon, 1 bp overlaps with the next gene start codon. /note=Phamerator: Pham 48297, 2/6/2023 /note=Starterator: I agree with the autoannotated start site at 30137 because the starterator report is not informative due to the lack of published genes in this pham. Also, the autoannotated start site has a good z-score and final score. /note=Location call: 30137, real gene /note=Function call: Both BLASTp programs, CDD, and HHpred provides limited confidential information on the protein, which makes it difficult to hypothesize the function. This situation is expected when the gene was first identified given that the sequence length is short and no synteny displayed. The function needs to be validated through wet lab experiments instead of database hits. /note=Transmembrane domains: The website result corresponds to the PECAAN result which shows that the protein isn’t a membrane protein. /note= /note= /note= /note=#2 /note=Primary Annotator Name: Li, Mulin /note=Auto-annotation: Both Glimmer and GeneMark call and agree on the start site. /note=Coding Potential: Both self-trained and host-trained GeneMark predicts high coding potential at the called gene. /note=SD (Final) Score: -4.326 /note=Gap/overlap: The upstream gap is 9 bp and the downstream gap is -58. /note=Phamerator: This gene product belongs to the Pham 48297, which includes two phages in total. Both of them are draft genome and belong to the AZ cluster. The phamerator report was generated on 01/27/23. /note=Starterator: The most conserved start site is an autoannotated (2, 30851). This start site is not supported by manual annotation. This start site is supported by a high RBS score and GeneMark coding potential reading. /note=Location call: This is probably not a real gene. The gene is shorter than 40 codons, and there is severe overlap with the next called gene position. /note=Function call: NFK; This gene does not receive functional support from PhageDB Blast, NCBI Blast, HHpred, and CDD analysis. All analysis report unknown function, suggesting that this gene might be a novel gene. /note=Transmembrane domains: Neither TmHmm nor TopCon predict transmembrane domains for this gene. /note=Secondary Annotator Name: Li, Mulin /note=Secondary Annotator QC: CDS 30256 - 30390 /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="VroomVroom_44" /note=Original Glimmer call @bp 30256 has strength 4.49 /note=SSC: 30256-30390 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_CREWMATE_45 [Arthrobacter phage Crewmate]],,NCBI, q1:s1 90.9091% 1.1823E-13 GAP: -1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.285, -6.204365880072431, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CREWMATE_45 [Arthrobacter phage Crewmate]],,UIW13297,87.5,1.1823E-13 SIF-HHPRED: SIF-Syn: /note=Primary Annotator #1 Name: Wu, Grace /note=Auto-annotation:Only glimmer indicated the start at 30259 /note=Coding Potential: There is ORF on the self-trained and host trained GeneMark, but the coverage is low. We can see low coding potential in both self-trained and host-trained Genemark. /note=SD (Final) Score:- 6.204,this is not the best final score, whereas start 30199 has score -5.092. /note=Gap/overlap: The overlap is 1 at start 30256, a common overlap for operon. The gene is overlapped with another gene by 1 amino acid, within the acceptable number that there is no sequence that needs to be deleted or added. /note=Phamerator: Pham 10251, in cluster AZ. This Pham has 6 members, with 3 draft genes. /note=Starterator: Pham 10251, with start 3, which is the most common (and the only) start that is conserved for this Pham. /note=Location call: Start 30256 (ATG) is the evidenced start. This start does not have a good final score, z-score, and LORF, but it has a low overlap (1), which could be a strong evidence for a better candidate as the start site compared to start 30199 with an overlap of 58. /note=Function call: The function is hypothesized as NKF. The reason why is because: there is no CDD hit, and HHpred indicated unknown function on the highest hits. After running TmHmm, we ruled out the possibility of being membrane protein. The best hypothesis for the function of this gene is NKF /note=Transmembrane domains: 0 /note= /note=Primary Annotator #2 Name: Lim, Madeleine /note=Auto-annotation: Glimmer: 30256; no Genemark /note=Coding Potential: Low /note=*important note: more coding potential in reverse region /note=SD (Final) Score: -6.204 (lower score than the start @30199) /note=Gap/overlap: -1 (less overlap than the start @30199, thus making @30256 a more likely candidate) /note=Phamerator: 10251 /note=Starterator: Start 3 @30256 has 3 MA`s /note=Location call: 30256 /note=Function call: BLAST results were unconclusive; the top three genes with the lowest e-value were marked as "hypothetical protein". The two closest matches to the protein sequence (6LKF_A and 7UNW_E) both have an e-value higher than 1 and very different functions; therefore this specific gene`s function cannot be determined from HHPred. /note=Transmembrane domains: CDS 30595 - 32457 /gene="45" /product="gp45" /function="DNA polymerase I" /locus tag="VroomVroom_45" /note=Original Glimmer call @bp 30595 has strength 19.85; Genemark calls start at 30595 /note=SSC: 30595-32457 CP: no SCS: both ST: SS BLAST-Start: [DNA polymerase I [Arthrobacter phage Adumb2043]],,NCBI, q1:s1 100.0% 0.0 GAP: 204 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.274, -2.4811409279978642, yes F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Arthrobacter phage Adumb2043]],,QOP65098,87.1176,0.0 SIF-HHPRED: Apicoplast DNA polymerase; DNA polymerase, exonulease, apicoplast, Plasmodium falciparum, REPLICATION, TRANSFERASE; HET: PEG, EDO; 2.5A {Plasmodium falciparum (isolate 3D7)},,,7SXQ_A,96.9355,100.0 SIF-Syn: Arthrobacter phage Adumb2043 Arthrobacter phage Janeemi These phages have similar sequences compared to our genes. The Blast result shows a low E value and a good score. /note=Primary Annotator Name #1: Zhu, Yichen /note=Auto-annotation: Gene (stop@32457 F) /note=Coding Potential: Good /note=SD (Final) Score: -2.481 /note=Gap/overlap: 204 /note=Phamerator: 68543 (1616 members) /note=Starterator: 215 (The start site is conserved among the Cluster AZ) /note=Location call: 30595 /note=Function call: DNA Polymerase I (All databases have good hits for this function) /note=Transmembrane domains: 0 /note= /note=Primary Annotator #2 Name: Martin, Kyle /note=Auto-annotation: Both Glimmer and GeneMark were used in this case, and both start at 30595. /note=Coding Potential: The Coding potential is in both the forward and reverse direction. The coding potential is similar in both the Host and Self-Trained GeneMark. /note=SD (Final) Score: -2.481 (best of all candidates) /note=Gap/overlap: The gap is 204 BP, which indicates a gap. The length of the gene is also reasonable. /note=Phamerator: The pham number is 68543. The analysis was run on 01/27/23. The pham was present in other members of the cluster AZ and was compared against phage Emotion. The function was unknown. /note=Starterator: The start site choice conserved among the members is 68556. The start site is number 216. /note=Location call: The gene is real and the start site is 30595. /note=Function call: The HHpred data supports the conclusion that the gene is an Apicoplast DNA polymerase and the CDD data supports the conclusion that the gene is a DNA Polymerase 1. Which would fall under DNA Polymerase 1. /note=Transmembrane domains: Zero CDS 32510 - 32839 /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="VroomVroom_46" /note=Original Glimmer call @bp 32510 has strength 11.81; Genemark calls start at 32510 /note=SSC: 32510-32839 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KEALII_41 [Arthrobacter phage KeAlii]],,NCBI, q1:s1 93.578% 6.28211E-18 GAP: 52 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.116, -4.697746622201849, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KEALII_41 [Arthrobacter phage KeAlii]],,UDL14647,62.3762,6.28211E-18 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Barden, Sophia /note=Auto-annotation: Glimmer and GeneMark agree on the designation for the start at 32510. /note=Coding Potential: Prominent and notable coding potential on both Host- and Self-trained GeneMark maps, in the forward direction, on this gene starting at 32510 and persisting until stop 32839. This indicates presence of a real gene with a start codon at 32510 and a stop at 32839. /note=SD (Final) Score: The RBS final score is -4.698, indicating that this gene candidate is the best supported of all the potential candidates provided. /note=Gap/overlap: There is a notable 52 bp gp prior to the designated start site of our gene. (Potential for a gene to be added before but unlikely due to lack of coding potential, short length of gap, and lack of synteny indicating no presence of another gene). /note=Phamerator: As of 2/7/2023, this gene was found in Pham 965. This Pham represents 121 members, and 36 are denoted as drafts. We see the presence of many phages in the AZ cluster containing the gene present in Pham 965. /note=Starterator: This analysis was run 01/27/23 on database version 501. The “Most Annotated” start for this gene in Pham 965 was start # 37, called in 83 of the 85 (97.6%) non-draft genes within the pham 965. However, the auto-annotated start for VroomVroom_46, G(Stop@32839) was start #39, called in 1 of 121 (0.8%) of the genes in pham. We decided to accept the start number 39 at position 32510 due to the other evidence collected pointing towards this being the most likely and best possible start site. /note=Location call: Considering all the evidence collected above, we conclude the start site for the gene(stop@32839F) is at 32510. We conclude that this gene is real. /note=Function call: There is low conservation of this gene across the AZ cluster and low BLAST hits. We are unable to conclude the function of this gene based on BLAST results. There were no significant CDD or HHpred hits for this sequence. We conclude NKF. /note=Transmembrane domains: No evidence of Transmembrane Domains present from TMHMM prediction. We conclude that this is not a membrane protein. /note=Secondary Annotator Name: Martinez, Daniela /note=Secondary Annotator QC: /note= /note=Primary Annotator #2 Name: Martinez, Daniela /note=Auto-annotation: Both Glimmer and Genemark called this gene. Both algorithms also agreed on the same start at 32510. /note=Coding Potential: The region between the predicted start codon at 32510 and the stop codon at 32839 contains good coding potential in the forward direction. The region of coding potential covers the entire putative ORF in both Genemark and Glimmer. /note=SD (Final) Score: -4.698. This is the best final score as determined by PECAAN. /note=Gap/overlap: There is a 52 bp gap between this gene and the previous one. A gene does not need to be added in this region since other genomes with a gene within this gap have a larger one. There is not enough space for a gene to be added. /note=Phamerator: As of february 7, 2023 this gene is part of the 965 pham. /note=Starterator: There are 121 members in this pham, 36 of which are drafts. The start predicted by Glimmer and Genemark is start site #39 on starterator and was found only in one of 121 genes in this pham. There are no manual annotations of this start. /note=Location call: Based on the previous evidence this is a real gene. The start site is at 32510. This is the most likely start since the other start options would create a large gap. The z-score and final score indicate this as the most likely start as well. /note=Function call: Both Phagesdb and NCBI BLAST results return unknown functions for this gene. Upon further analysis we have concluded there is no known function for this gene. /note=Transmembrane domains: There are no TMDs as observed on DeepTMHMM. This indicates this is not a membrane protein. CDS 33099 - 33911 /gene="47" /product="gp47" /function="DNA binding protein" /locus tag="VroomVroom_47" /note=Original Glimmer call @bp 33099 has strength 18.12; Genemark calls start at 33099 /note=SSC: 33099-33911 CP: yes SCS: both ST: SS BLAST-Start: [DNA binding protein [Arthrobacter phage Tuck]],,NCBI, q3:s2 99.2593% 4.57599E-50 GAP: 259 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.939, -3.2975203404035645, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Arthrobacter phage Tuck]],,WAB10822,58.5821,4.57599E-50 SIF-HHPRED: RNA polymerase sigma factor RpoS; transcription initiation, Pseudomonas aeruginosa, RNA polymerase, sigmaS, SutA, RNAP beta lobe, open beta lobe, TRANSCRIPTION; 3.13A {Pseudomonas aeruginosa PAO1},,,7XL3_F,95.9259,100.0 SIF-Syn: Nguyen, Angelynn: [DNA binding protein, upstream is no known function and downstream is another DNA binding protein, just like in Phives] /note=Primary Annotator Name:Berber-Pulido,Rodrigo /note=Auto-annotation: Both Glimmer and Genemark agree that the start site is 33099 /note=Coding Potential: Both Host-trained and phagesdb genemark agree that this gene has very high coding potential throughout the entire gene. /note=SD (Final) Score: -3.298, not the lowest of the results, however this start site is probably the start site as it does cover all the coding potential while the one with the lowest start site does not. /note=Gap/overlap: 259, it is a big gap which there may be a reason to add another gene. Moreover, when comparing with other maps, this gene and the next are often combined into one on the genome in which we compare it to which is interesting. However, It is important to note that I do not think there is reason to add a gene because when comparing to other genomes, this gene often does have a big Gap before it, thus I would assume it is normal. /note=Phamerator: A part of pham 63728, other phages that are also in AZ cluster and share this gene are Adolin_45, Adumb2043_42, and Amyev_44. 123 members, 38 are drafts. /note=Starterator: Most published annotation is 34 which was called 35 of the 85 non draft genes in the pham. For this gene, start (33,33099) has 1 MA, more than the rest. I would conclude that this gene starts at 33099 as it is also the one that the auto annotation called and has one MA. /note=Location call: 33099 /note=Function call: RNA polymerase sigma factor has the strongest evidence as HHPRED calls for these functions in terms of similarity with significant low e values. While others may call it a DNA binding protein, I would argue that in terms of the same cluster (AZ) and with the lower e value, that this function is a RNA polymerase sigma factor. /note=Transmembrane domains: No hits on it having transmembrane domains /note= /note=Primary Annotator 2 Name: Nguyen, Angelynn /note=Auto-annotation: Both Glimmer and GeneMark agree that the start is 33099. /note=Coding Potential: There is very good coding potential in the forward direction according to both host and self-trained gene mark. /note=SD (Final) Score: -3.298, even though this is not the highest final score, there is high coding potential from this start through the rest of the gene. /note=Gap/overlap: There is a 259 gap before and a 175 gap after the gene. Even though the gap is rather large, I don’t believe there is enough reason to add another gene. When comparing to the Pham maps of other phages (like Liebe), there is also a large gap before and after similar genes. /note=Phamerator: 63728, Date 2/6/23. This is in the AZ cluster. And the pham has 123 members with 38 drafts. It is conserved since it was found in other phages like Adolin_45 which is also in the AZ cluster. /note=Starterator: 63728, This analysis was run on 2/6/23 and it was confirmed that start number 33 is correct. However, the most annotated start number is 34 whereas the start site for vroomvroom is 33. But since it is called by both auto-annotation I believe that the start site is correct. /note=Location call: All the evidence supports that the gene is real and the start site is at 33,099. Although this region is not the most conserved from the Starterator data, the auto annotation and manual annotations agree that this should be the start site. /note=Function call: DNA binding protein. PhageDB BLASTp and NCBI BLAST both agree on this function, both with very low e-values (1.17e-5). The HHpred had the top 5 hits indicated that the function is the RNA polymerase sigma factor with a probability of 100%, a 95% identity, and a 1e-29 e-value. The CDD had 5 hits for some sort of RNA polymerase with a low e-value for all of them. Even though both indicate that this gene is an RNA polymerase sigma factor, the SEA-Phages functional assignments state that we should not name it that. And therefore it should be called a DNA binding protein instead. /note=Transmembrane domains: There are no transmembrane domains since the DeepTMHMM indicates that the gene is inside. CDS 34087 - 35061 /gene="48" /product="gp48" /function="DNA binding protein" /locus tag="VroomVroom_48" /note=Original Glimmer call @bp 34087 has strength 19.13; Genemark calls start at 34087 /note=SSC: 34087-35061 CP: yes SCS: both ST: SS BLAST-Start: [DNA binding protein [Arthrobacter phage BaileyBlu]],,NCBI, q15:s11 80.2469% 4.00098E-29 GAP: 175 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.755, -5.092497832398658, no F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Arthrobacter phage BaileyBlu]],,UJQ87182,52.0295,4.00098E-29 SIF-HHPRED: RNA polymerase sigma factor SigB; DNA-dependent RNA polymerase, alternative sigma, TRANSCRIPTION; 3.84A {Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)},,,7PP4_f,82.716,100.0 SIF-Syn: DNA binding protein. Adumb2043 also states the function of this protein is a RNA polymerase sigma factor, which is a DNA binding protein. Both genes upstream and downstream have NKF for VroomVroom, while other phages, such as Amyev, have functions listed for their nearest downstream genes, such as endonuclease. /note=Primary Annotator Name: Bursulaya, Isabelle /note=Auto-annotation: Both Genemark and Glimmer call the start at 34087 /note=Coding Potential: The coding potential for both the host and self trained gene-mark were very similar and both showed strong coding potential in the forward direction /note=SD (Final) Score: The final score is -8.653, which is not the best score on PECAAN but is in fact the worst /note=Gap/overlap: The gap is 141, which is large but not entirely unreasonable. However, when looking at the Pham Maps, this gene shows something very strange when compared to other phages. The gene is often tied (connected) to its corresponding gene on the other phage, but instead of only this gene being connected to it, there are two. This means that there was a duplication of the gene. /note=Phamerator: pham as of 2/6/2023 is 63728. It is conserved in Adumb2043 and Adolin, both of which are in the same cluster (AZ) /note=Starterator: Start site number 31 had no manually annotated genes, which is start site 34087 which agrees with Glimmer and GeneMark. This was the auto-annotation. This gene is most likely a duplication. /note=Location call: Based on the evidence, this does look like a real gene, and it was just duplicated. I believe this is the correct start site as well. /note=Function call: I would hypothesize that the function of this gene/protein is a DNA binding protein. The top hits in the NCBI BLASTp (BaileyBlu, Janeemi, and DrSierra) listed the function as a DNA binding protein, with query covers of 80% with E values all less than 5e-29. However, the percent identities were fairly low, but were greater than 32%. Usually, anything 35% or greater is good, so I believe that 32% is close enough. For PhagesDB BLASTp, DrSierra and Crewmate had good E values (5e-32 and 8-31 respectively) with fair identities (33% and 34% respectively). Their functions were also listed as DNA binding proteins. The alignments were fairly good, with mostly 80-200 bases being conserved on PhagesDB BLASTp. Both CDD and HHpred called the function mainly as a sigma factor for RNA polymerase, but this was not approved on the Approved Functions List, and instead asked us to call the protein a DNA binding protein. /note=Transmembrane domains: No predicted TMDs by DeepTMHMM, so the protein is not a transmembrane protein. This makes sense based on the predicted function, because a DNA binding protein should remain inside of the cell in order to be able to bind to the DNA, and not be in the membrane of the bacteria. /note= /note=Primary Annotator Name: Nguyen, Mya /note=Auto-annotation: Glimmer and Genemark start at 34087 /note=Coding Potential: good coding potential in forward direction /note=SD (Final) Score: -8.653, bad score, most negative out of all the hits /note=Gap/overlap: 145, slightly large gap, but nothing of concern after looking at Pham maps. /note=Phamerator: 63728, 2/6/23. The pham that the gene is in also has other genes in the pham of the same cluster. /note=Starterator: Start number 31, correlates to start site 34087. No manual annotations. Start site 34807 supports the start site that Genemark and Glimmer called. /note=Location call: Most of the genes that we compared VroomVroom to had a start number at 34 but for our gene, the closest was start number 31 and 32. Because number 31 has the best final score without making the gene too short, I think that the actual start site is 34087. /note=Function call: Predicted function is DNA binding protein based on evidence from PhagesDb BLAST and NCBI BLAST hits where each of them had function calls of DNA binding protein and low e-values (less than e-28). /note=Transmembrane domains: No TMDs, which is reasonable since it is a DNA binding protein which cannot reside in the membrane. CDS 35179 - 36042 /gene="49" /product="gp49" /function="hypothetical protein" /locus tag="VroomVroom_49" /note=Original Glimmer call @bp 35179 has strength 19.72; Genemark calls start at 35179 /note=SSC: 35179-36042 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_CREWMATE_54 [Arthrobacter phage Crewmate]],,NCBI, q3:s4 99.3031% 2.25168E-47 GAP: 117 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.96, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CREWMATE_54 [Arthrobacter phage Crewmate]],,UIW13306,63.0996,2.25168E-47 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Chawla, Esha /note=Auto-annotation: Glimmer Start Site: 35179, GeneMark Start Site: 35179. Both Glimmer and GeneMark have chosen the same start site in this case, and thus, we should have good confidence in this start site. The start codon at position 35179 is ATG. /note=Coding Potential: There is high coding potential contained in the first forward reading frame in both GeneMark Host and Self. Furthermore, from the GeneMark and Glimmer proposed start site (35179), there is fairly high coding potential throughout almost the entirety of the gene and the putative ORF, from the start site at 35179 to the stop site at 36042. Thus, the chosen start site covers all of this coding potential. As such, this gene is likely a real gene with a start site at 35179. /note=SD (Final) Score: -2.601 – this SD score is the best of those presented on PECAAN. /note=Gap/overlap: 118 bp, this is a slightly large gap. However, considering there is not much good coding potential upstream of the currently proposed start site, I do not think an entirely new gene should be inserted or the start codon should be pushed further upstream. Further, other conserved genomes also have a similar gap presented upstream of this gene, which further suggests that the start site should not be changed or an entirely new gene should not be inserted. /note=Location call: 35179, but may potentially need to be further upstream, considering there is a 118 bp gap between the start site of this gene and the stop site of the previous gene /note=Phamerator: On the day of my investigation, 2/6/2023, this gene was found in Pham 67087. This Pham has 3 members, 2 of which are drafts. This gene is conserved in many other cluster AZ phages, including Tweety19, Liebe, and Maureen. The phamerator and phams database did not call a particular function for this gene. /note=Starterator: The auto-annotated start site of 35179 is also the most manually annotated start site. It was manually annotated in 2 of 3 genes in this pham. The auto-annotated start site is 37, which has a corresponding base pair of 35179. The start site choice is a reasonable choice, as it is well-conserved among the members of the pham AZ, which this gene belongs to. /note=Location call: Taken together, the gathered evidence suggests that this gene is a real gene and has a start site at 35179. This gene is a real gene, as it is conserved in phamerator and has good coding potential. The most likely potential start site candidate is at position 35179, as it is well conserved in starterator and covers all of the coding potential. There is not much coding potential upstream of the currently-proposed start site. /note=Starterator Drop-Down Menu (see end of PECAAN Notes Instructions): SS “suggested start” /note=Coding Potential Drop-Down Menu (see end of PECAAN Notes Instructions): yes /note=Function call: Based on the collected data, we cannot conclusively conclude the function of this gene. When comparisons of this sequence were made in the phagesDB platform, the smallest e-value was made with Cremate_54, an A. globi phage in cluster AZ, whose function is unknown. Moreover, the e-value was also not very small, with an e-value of just 1e-50. Moreover, for all other phages, which had even larger e-values than Crewmate_54, the e-value was even larger. Moreover, when the analysis was run through NCBI Blast, all observed results were with hypothetical, or draft, proteins. Even with these hypothetical proteins, the percent identity was also fairly low, with the highest being 42.16%. As such, considering the e-values are fairly large, the percent similarity is fairly low, and only hypothetical proteins/drafts appear, we cannot conclusively determine the function of this gene or its produced protein. /note=Based on all of the data I have collected from BLASTp, PECAAN, CDD, and HHpred, it appears there is a lack of agreement on the function of this gene. BLASTp and PECAAN were unable to give a conclusive function. Similarly, CDD proposed that there are no identifiable conserved domains. Finally, HHpred did not have any hits that had low e-values – the smallest e-value was 5.1, which is fairly high. Thus, because all of the programs used were unable to provide a conclusive function for this ORF, this ORF’s current function is NKF. /note=Transmembrane domains: No transmembrane domains – unable to conclude if this is in-line with the function, as the current function of this gene is NKF. /note= /note=Primary Annotator #2: Okumura, Joey /note=Auto-annotation: Glimmer and Genemark used → agree on start 35179 with ATG codon /note=Coding Potential: start site covers good coding potential in first forward ORF → similar results in host and self trained GeneMark (some coding potential in third reverse ORF but worse than forward + does not cover entire coding region with suggested start site of 35179) /note=SD (Final) Score: best FS of -2.601 (z score of 2.96) /note=Gap/overlap: 117 = on the larger side but most genes have minimum of 120 bp /note=Phamerator: Pham 69612. Date 2/07/2023. Only 2 other pham members: FP draft phage and singleton nondraft phage. /note=Starterator: Start site 2 in Starterator was called in 1/1 non-draft genes in the pham. Start 1 is 35179 in VroomVroom. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on above evidence, this is likely a real gene with the start site at 35179. /note=Function call: NKF. PhagesDB BLAST and NCBI Blast had various results with good e values (90%) such as Tweety19 and Crewmate, but the function was unknown. No hits in CDD + lowest e value for hits in HHpred was 5. /note=Transmembrane domains: Not a transmembrane protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 36162 - 36767 /gene="50" /product="gp50" /function="SprT-like protease" /locus tag="VroomVroom_50" /note=Original Glimmer call @bp 36162 has strength 15.23; Genemark calls start at 36162 /note=SSC: 36162-36767 CP: yes SCS: both ST: SS BLAST-Start: [SprT-like protease [Arthrobacter phage Iter]],,NCBI, q7:s2 95.5224% 3.09155E-104 GAP: 119 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.96, -2.6013996449736907, yes F: SprT-like protease SIF-BLAST: ,,[SprT-like protease [Arthrobacter phage Iter]],,URQ05034,83.3333,3.09155E-104 SIF-HHPRED: SprT-like domain-containing protein Spartan; DPC repair protease, DNA BINDING PROTEIN; HET: ADP, MLZ, FLC; 1.5A {Homo sapiens},,,6MDW_A,51.2438,99.7 SIF-Syn: /note=Primary Annotator Name: Critzer, Nicole /note=Auto-annotation: Used both Glimmer and Genemark for auto-annotation. Both agree on the start site being 36162 with the start codon ATG. /note=Coding Potential: The coding potential is reasonable and is within the start and stop site. Its presence in the 3rd ORF confirms it is a forward gene. /note=SD (Final) Score: -2.601, which is the least negative score so it has the best sequence match. /note=Gap/overlap:The gap is 119 bp - although it is a little suspicious that the gap is just below the minimum 120 bp for a gene to be present, when we compare with other synteny maps the gap remains relatively conserved among other phages (about 50-150 bp) and the given gap is the smallest of the other proposed start sites. /note=Phamerator: The gene is in pham 1210 on 2/7/23 conserved in Crewmate and DrManhattan in cluster AZ. /note=Starterator: (Start: 45 @36162 has 3 MA`s) on 1/27/23 on database version 501. Does not contain most conserved start site 46, so starterator auto annotated start 45@36162. This start site 45 agrees with Glimmer and Genemark and includes all the coding potential. It is also called 100% of the time when present. This plus the fact that there are 3 manual annotations makes me agree that the autoannotated start site is reasonable. /note=Location call: 36162 - both Glimmer and Genemark agree, it has the high final score, and this start site encompasses all the coding potential /note=Function call: SprT-like protease - the hits with some of the lowest e-values and high percent identity for the BLAST hits were KeAlii_44 (4e-83) and Reedo_46 (2e-83) which both function as SprT-like proteases. Both Phagesdb and NCBI BLAST agreed on this. Other hits with low e-values would be Adolin (2e-104) and Iter (3e-104) which are also SprT-like proteases and have percent identity around 76%. HHpred also agrees with this call in that it gave both a strong PDB hit and Pfam hit related to the SprT-like domain. /note=Transmembrane domains: 0, there were 0 predicted TMDs so the evidence suggests this is not a transmembrane protein /note= /note=Primary Annotator # 2 Name: Ortiz-Gomez, Diana /note=Auto-annotation: Glimmer and Genemark. Both show the start site at 36162. /note=Coding Potential: Coding potential is seen in the forward strand which confirms that this is a forward gene. Genemark Self and Host also indicate this coding potential. /note=SD (Final) Score: The best final score is -2.601. /note=Gap/overlap: There is a gap of 119bp which is large, but smaller than other potential start sites. There is no coding potential in this area, so it is unlikely that there`s a gene that needs to be added here. /note=Phamerator: Pham 1210. 2/7/2023. Conserved in EZ cluster, such as Caron and Barnstormer. /note=Starterator: Start 45 conserved and corresponds to 36162. 5/97 call this start. /note=Location call: The evidence shows that this is a real gene with the best start site at 36162. Both Genemark and Glimmer agree with this start site and this site has the best Z-value and Final Score. /note=Function call: SprT-like protease. Top phages in PhagesDB BLAST (Reedo and KeAlii) have this function (2e-83 and 4e-83). Top phages in NCBI BLAST (Iter and Adolin) also show strong evidence with this function (3e-104 and 2e-102). Both HHpred and CDD hits also indicate that this is a SprT protein with e-values of 4e-17 and 1.2e-11 in HHpred and 7.54e-8 for CDD. /note=Transmembrane domains: No TMDs predicted, indicating that this is not a transmembrane protein. CDS 36896 - 37126 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="VroomVroom_51" /note=Original Glimmer call @bp 36896 has strength 18.19; Genemark calls start at 36896 /note=SSC: 36896-37126 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU70_gp50 [Arthrobacter phage Liebe] ],,NCBI, q1:s1 86.8421% 1.10342E-15 GAP: 128 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.549, -4.5016403230594895, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU70_gp50 [Arthrobacter phage Liebe] ],,YP_009817082,62.0253,1.10342E-15 SIF-HHPRED: SIF-Syn: Three other phages have this gene. There was good synteny with Liebe, at gene 50 e-value 3e-14, Maureen at gene 50 e-value 3e-14, and Tweety19 gene 49 e-value 1e-13. There is no synteny for this gene with Warda. There is also no known function for these genes listed. /note=Gene 51 Stop@37126 /note=Primary Annotator Name: Dawson, Niels /note=Auto-annotation: Both glimmer and genemark agree on start site and stop site. The start codon is called at 36896. The coding potential is covered with this start site. /note=Coding Potential: Yes, the start site covers the entire coding potential. There is sufficient evidence to conclude this is a real gene. This is evidenced by glimmer and genemark calls and coding potential calls. There is also good synteny for this gene as discussed in the next box. /note=SD (Final) Score: -4.502, z-score is >2. These are good scores for the start site and are further evidence for its existence as well as good evidence for the chosen start site. /note=Gap/overlap: 128. Gap seems reasonable as there are no other gene candidates. There is a reverse coding potential. However, there is not enough space around the reverse coding potential for the polymerase to turn around. Therefore, the gap is the most reasonable gap for this gene. /note=Phamerator: 37126 - 5 members, 2 are drafts. 37126 - All the other phages are in AZ. I used liebe, marine and tweety19. /note=Starterator: Start site is at 36896. This is predicted by autoannotation and predicted by starterator.Date - 1/27/2023. 37126 - Start 2 was called in 3 of 3 non-draft genes. Start 2 was not listed in the vroomvroom genome on starterator. However, starterator called start 1 100% of the time when present. This start site (start site 1), 36896, was called once and was called by starterator for vroomvroom. This means that starterator did not call the most common start site for vroomvroom, but did call the glimmer/genemark/manually predicted start site, adding further evidence for the chosen start site. /note=Location call: This is a real gene and the start site is at 36896. Start site is at 36896. This is predicted by autoannotation and not predicted by starterator. However, the start found most commonly on starterator was not listed on vroomvroom. /note=Function call: This function is unknown. Although there were some hits on this gene from other phages that shared synteny for this gene, the functions of those genes are currently unknown, meaning that this gene’s function is also unknown. These hits were from phagesdb BLASTp and NCBI BLASTp. There were good e-values for hits, but only hypothetical proteins were found to exist on these hits. There were no hits on CDD. HHPRED had only hits with positive e-values, and therefore were insignificant. CDD and HHPRED provided no clues to functional characterization. /note=Transmembrane domains: 0 TMD hits on TMHMM, no transmembrane domains detected. The protein was labeled as outside. /note= /note=Primary Annotator Name: Pan, Crystal /note=Auto-annotation: both glimmer and genemark agree on stop and start site. start site is at 36896. /note=Coding Potential: there is good coding potential and the start site includes all of the coding potential. Genemark and glimmer both concur that there is good coding potential at this start site. This is sufficient evidence that this gene is real. /note=SD (Final) Score: -4.502, z score >2. These scores are good scores that indicate the start site is good and that this gene is real. /note=Gap/overlap: +128 nucleotides, the gap is quite big, and there is not really coding potential there, so it makes sense to not add any genes here. There is not reverse coding potential either, so there’s no other genes that would make sense to be added in this gap. /note=Phamerator: 5 members, 2 drafts. All are in AZ cluster. Marine and tweety19 were used as comparisons. /note=Starterator: start 2 was called in 3/3 non draft genomes, but does not exist in vroomvroom genome. The start site at 36896 was called by the starterator, which agrees with the start sites that are called by glimmer and genemark, rather than the most common start site (that does not exist in our genome). /note=Location call: Auto-annotation predicted our start site at 36896, but starterator did not concur with this result. The start site that starterator predicted does not exist in our genome, and thus can be ignored. /note=Function call: Neither databases allowed for determination of what this protein does, so we cannot conclude what kind of protein it is. There are hits of other genes that have good synteny with this gene, however, those genes have also not been assigned any function as of yet. Of the few hits that we got, the function of the protein was still hypothetical and had a decent e-value, but has not been verified and so is not reliable. Thus this gene should be assigned NKF. /note=Transmembrane domains: There are 0 TMD hits, which means that there are no transmembrane domains. CDS 37268 - 37579 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="VroomVroom_52" /note=Original Glimmer call @bp 37268 has strength 18.12; Genemark calls start at 37268 /note=SSC: 37268-37579 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_KEALII_43 [Arthrobacter phage KeAlii]],,NCBI, q1:s1 97.0874% 3.09181E-18 GAP: 141 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.037, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_KEALII_43 [Arthrobacter phage KeAlii]],,UDL14649,63.1068,3.09181E-18 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Deal, Milena /note=Auto-annotation: Both Glimmer and GeneMark agree on the same start site of 37268 with the first codon being ATG, which is a more common start codon. /note=Coding Potential: There is high coding potential for the entirety gene and the start and end positions encompass all the coding potential. /note=SD (Final) Score: -2.443, and this is the best of the two scores. /note=Gap/overlap: There is a 141 base pair gap, which is on the longer side. This is the longest ORF though. This gap does not appear to have much coding potential. Also, other phages that have genes in this pham (Asa16, DrManhattan, Adolin) have a gene that corresponds to genes 47 and 48 in VroomVroom before this gene, so overall it does not appear that we need to add a new gene upstream. /note=Phamerator: This phage is in pham 4404. The pham has 21 members, with 8 being draft genomes. Other phages with a gene in this pham that are in cluster AZ include Adolin and Asa16. /note=Starterator: The start site number called most often was 10. It was called in 13 of the 13 non-draft phage genomes in the pham. The auto-annotated start site is the same as the most commonly called start site. This start site corresponds to 37268 bp. /note=Location call: This is a real gene with a start site at 37268. The auto-annotated start site seems to be the best option because it has been called by every non-draft phage genome in Starterator. Also, Glimmer and GeneMark both show that the start site encompasses all the coding potential and has high coding potential throughout. Finally, the Z-score is high (3.037) and the final score is good (-2.443) which are better than the Z-score and final score for the other candidate start site. /note=Function call: We do not have enough information to hypothesize a function for this protein. None of the top hits for phagesdb blast or NCBI blast had predicted functions, and the only hits with functions had high e-values. No results came up on CDD. For HHpred, some hits showed up, but with high E-values and/or unknown functions. Therefore, this gene is NKF. /note=Transmembrane domains: DeepTMHMM did not show any transmembrane domains. Therefore, these results did not help decide on a function for this protein. /note= /note=Primary Annotator #2 Name: Pisipati, Kirthana /note=Auto-annotation: Both Glimmer and Genemark have a start site of 37268. The start codon is ATG, which has a high probability. /note=Coding Potential: High coding potential on both host trained and self trained Genemark. The start site includes all of the coding potential. /note=SD (Final) Score: The final score is -2.443, which is the best score. /note=Gap/overlap: The gap is 141bp, and this start site has the longest ORF at 312bp. The gap is not long enough to add a gene, and there is no coding potential upstream of the gene. /note=Phamerator: This gene is in pham 4404 as of 2/8/23, which has 21 members, 8 of which are drafts. Most of the genomes in this pham belong to cluster AZ (DrManhattan, London, Reebo). /note=Starterator: Start site 10 was called the most often, and was manually annotated in 13 of 13 non draft genomes. /note=Location call: This gene is real, and has a likely start site of 37268. /note=Function call: This genes has no known function (NKF). BLASTp outputs were not informative, since the hits had unknown functions. There were no results from CDD, and the HHpred output only had hits with high e values and unknown functions. /note=Transmembrane domains: There were no transmembrane domains according to DeepTMHMM, which does not provide us with any information about this gene`s function. CDS 37900 - 38262 /gene="53" /product="gp53" /function="hypothetical protein" /locus tag="VroomVroom_53" /note=Original Glimmer call @bp 37900 has strength 15.64; Genemark calls start at 37900 /note=SSC: 37900-38262 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_TBONE_51 [Arthrobacter phage Tbone] ],,NCBI, q1:s1 100.0% 3.50361E-5 GAP: 320 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.362, -3.9776535502172963, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_TBONE_51 [Arthrobacter phage Tbone] ],,QPX62382,57.1429,3.50361E-5 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Douglas, Katherine /note=Auto-annotation: Glimmer and Genemark agree. Start site at 37900 with start codon ATG. /note=Coding Potential: There is good coding potential and few large gaps in both the host and self-trained genemark. There is no synteny among other AZ phages and there are very few significant BLAST hits, none of which have assigned functions. Despite this, the strong coding potential and large gap in the genome that would result if this were not a gene suggest this is a real gene with strong coding potential. The chosen start site covers all coding potential. /note=SD (Final) Score: -3.978 /note=Gap/overlap: 320 (this is a larger gap which might mean a different starting site such as 37804) /note=Phamerator: Pham 68916 on 2/6/2023. All other phages within pham belong to the AZ cluster. No called function. /note=Starterator: Start site #12 at 37900 is found in 28/29 nondraft phages including VroomVroom. /note=Location call: This is a real gene with a probable start site at 37900. This site covers the coding potential and it was the most annotated start site found in all but 1 of the nondraft phages. There is a large gap of 363bp however there is no coding potential in that region so selecting one of the less annotated start sites with a smaller gap is not needed. Based on this and the high number of annotations for the start site, the start site is most likely at 37900. Although this was not the longest ORF, it was sufficiently long for a protein-coding region and covers the coding region on genemark. The longest ORF included a large stretch that did not contain coding potential so it was disregarded. This, combined with all the other evidence supports this start site being the best start site for the phage. The start codon is ATG which has a high probability. /note=Function call: No known function. Although there are matches to other phages in phagesdb, there are no called functions associated with this protein. Additionally, there are no significant hits within the NCBI database. The fact that no other person has been able to assign a function suggest that there is NKF at this point in time. CDD had no hits. HHpred also only had one hit and it had a high evalue (>10^-5). It was therefore considered not significant. /note=Transmembrane domains: 0 TMDs predicted. This is reasonable as this is a no known function gene. Suggests this protein is not a transmembrane protein. /note= /note=Primary Annotator Name #2: Reyimjan, Diana /note=Auto-annotation: Both Glimmer and GeneMark predicted this gene. They agree on the same start site of 37900 with an ATG start codon. /note=Coding Potential: The gene has reasonable coding potential predicted within the region for both self and host trained genemark. The chosen start site covers this coding potential. /note=SD (Final) Score:-3.978. This is the most positive score and therefore the best for the start sites. /note=Gap/overlap: 320. Pretty large but there is no other coding potential in the region that could indicate a gene not being present. /note=Phamerator: As of 2/7/23, this gene is in pham 68916. All the phages in this pham belong to cluster AZ. There is no function called for this gene. However, we can be confident that this gene is real from this data. /note=Starterator: Start site #12 is the most conserved start site. 28 out of 29 non-draft genomes manually called this start. Start #12 corresponds to start 37900. There are 52 members in the pham. /note=Location call: The gene is real, and the potential start site is 37900. This start site covers all coding potential, although the large gap indicates that there may be a gene missing upstream. However, no more coding potential was called in this ORF upstream, so the start site is likely correct. Additionally, this is not the longest ORF, but the final scores and z scores are the best out of all the starts. Starterator confirms that the start site is correct through its conservation in other phages of the same cluster. /note=Function call: Function is unknown. No blast hits with a low enough e value call a function for this protein. Neither CDD nor HHPred show alignments that correspond to a known function. /note=Transmembrane domains: 0 called. This does not give any additional information about a possible function. CDS 38259 - 39080 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="VroomVroom_54" /note=Original Glimmer call @bp 38259 has strength 17.48; Genemark calls start at 38259 /note=SSC: 38259-39080 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_CREWMATE_58 [Arthrobacter phage Crewmate]],,NCBI, q128:s1 51.2821% 1.74042E-11 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.96, -2.7423981586358774, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_CREWMATE_58 [Arthrobacter phage Crewmate]],,UIW13310,56.2963,1.74042E-11 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name #1: Estampa, Julia /note=Auto-annotation: Glimmer and GeneMark both call the gene and agree that the start site is at 38259 bp. The start codon for both is GTG. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. Host-Trained and Self-Trained GeneMark both reflect similar high coding potential that is consistent with the ORF. The chosen start site covers approximately all the coding potential. /note=SD (Final) Score: The SD score is -2.742, which is the best from the list. However, the SD score may be ignored due to the possibility of the gene being a part of an operon. /note=Gap/overlap: The gap is -4 bp (overlap of 4 bp) long upstream from the gene, and is small and reasonable. The length of the gene is acceptable. An overlap of 4 bp suggests the gene is likely part of an operon. /note=Phamerator: Pham: 48281. Date found: 02/06/23. It is conserved with another draft phage, phage Emotion (AZ). /note=Starterator: There are no manual annotations for start site 1 in Starterator. Start 1 is found in 2 of 2 genes in this Pham, both of which are draft annotations. Thus, Starterator is uninformative. /note=Location call: The gathered evidence suggests that this is a real gene (good coding potential) and that the start site is most likely 38259 bp. This start site also includes the LORF. /note=Function call: NKF. Top 47 hits on PhagesDB BLAST demonstrated “function unknown” for phages with E-values ranging from e-155 to 0.87. Phages with suggested functions have very poor sequence alignment, high E-values, and are from a different Pham. NCBI BLASTp results also revealed “function unknown” for top hits. No CDD hits returned. No relevant HHpred results with the top hit having a poor score of 33.7 and a high E-value of 13. Thus, it’s most reasonable to assume NKF. /note=Transmembrane domains: Since there are no predicted TMHs or TMDs returned from DeepTMHMM, it is not a membrane protein. /note= /note= /note= /note=Primary Annotator #2: Robles, Angel /note=Auto-annotation: Glimmer and GeneMark both call the start at 38259. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.742, which is the best final score on PECAAN. /note=Gap/overlap: Gap of -4 (overlap). This is small and reasonable. /note=Phamerator: Pham: 48281 found on 02/09/2023. It is conserved and found in Emotion_59 (AZ) /note=Starterator: No Manual Annotations of this start. Start 1 found in 2 of 2 ( 100.0% ) of genes in pham /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 38259 /note=Function call: Function unknown. The top phagesdb BLAST hits have an unknown function (E-value < 10^-155), and the NCBI BLAST hits also have an unknown function. HHpred had a hit for Dual specificity mitogen-activated protein with 58.2% probability, 21% coverage, and E-value of 15. CDD had no relevant hits. /note=Transmembrane domains: TMHMM does not predict any TMDs, therefore it is not a membrane protein CDS 39077 - 39274 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="VroomVroom_55" /note=Original Glimmer call @bp 39077 has strength 14.02; Genemark calls start at 39077 /note=SSC: 39077-39274 CP: yes SCS: both ST: NA BLAST-Start: [hypothetical protein [Arthrobacter mobilis] ],,NCBI, q3:s4 61.5385% 2.80919E-5 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.368, -5.972387497942108, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter mobilis] ],,WP_168488253,44.7761,2.80919E-5 SIF-HHPRED: SIF-Syn: /note=Primary Annotator #1 Name: Gowdy, Griffin /note=Auto-annotation: Glimmer and Genemark; same start @ 39077; start codon: GTG /note=Coding Potential: Reasonable, agreed upon between self- and host- trained algorithms. Chosen start included. /note=SD (Final) Score: Final score = -5.972. This is the best possible score. However, this score is probably irrelevant, as stop@39274 is likely transcribed with its upstream neighbor. /note=Gap/overlap: There is an overlap of -4, which is reasonable for a polycistronic operon. /note=Phamerator: Pham 61846; 2/7/23. VroomVroom gene stop@39274 F is an orpham. /note=Starterator: N/A for an orpham. /note=Location call: All together, there is some evidence to support this gene being real, and that the start site is at 30977. Start at 30977 is the best possible start site, however more evaluation (including searches for conserved domains) will be needed to determine the validity of this gene as it is an orpham. /note=Function call: NKF. No significant hits in phagesDB or NCBI BLASTp . Additionally, no significant hits in CD search or HHpred. /note=Transmembrane domains: None found. DeepTMHMM predicts that this protein is globular, indicating it is likely soluble. /note= /note=Primary Annotator #2 Name: Rodriguez, Justin /note=Auto-annotation: Glimmer and GeneMark call the gene at the same start site of 39077. The start codon is GTG. /note=Coding Potential: Reasonable for both Glimmer and GeneMark; it spans the ORF for the most part including the start site. /note=SD (Final) Score: -5.972 and it is the best one. It is likely irrelevant though since the gene is likely part of an operon. /note=Gap/overlap: -4 which usually indicates the gene as part of an operon /note=Phamerator: 61846, 2/7/2023. No other genes are in this pham so it is an orpham. No function is called /note=Starterator: No Starterator report since it is an orpham /note=Location call: Besides there not being a starterator report, the auto-annotated start site at 39077 is most likely the correct one based on the gene length, gap, and coding potential of Glimmer and GeneMark. /note=Function call: The predicted function is NKF. There are no significant hits from PhagesDB BLAST and NCBI Blast has one significant hit (e-value 3e-05) that is for a hypothetical protein (no function assigned). No significant hits in CDD or HHPRED. /note=Transmembrane domains: DeepTMHMM shows no evidence of transmembrane domains. It does suggest that the protein is globular, meaning it could be hydrophilic and not membrane-associated. Location in the genome is not informative since genes directly upstream and downstream are designated NKF as well. CDS 39271 - 39462 /gene="56" /product="gp56" /function="hypothetical protein" /locus tag="VroomVroom_56" /note=Original Glimmer call @bp 39271 has strength 18.07; Genemark calls start at 39271 /note=SSC: 39271-39462 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein SEA_GUDMIT_42 [Gordonia phage Gudmit]],,NCBI, q18:s19 71.4286% 6.09246E-8 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.609, -3.4685663819143713, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_GUDMIT_42 [Gordonia phage Gudmit]],,QHB37271,47.8261,6.09246E-8 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hamid, Bilal /note=Auto-annotation: Both Genemark and glimmer indicate the best start site of 39271 This start site covers all of the coding potential. /note=Coding Potential: Both Host and self-trained gene marks indicate coding potential throughout the totality of the auto annotate gene. /note=SD (Final) Score: -3.469. While the other start site has a higher score, it ignores coding potential and gives a bigger gap to the prior gene. /note=Gap/overlap: -4 gap indicates overlapping start+stop codons indicating a likely operon /note=Phamerator: 02/07/23 - 69166 with 21 members of the Pham. 2 singletons and 17 cluster k genes. The only other cluster AZ gene is from emotion which is currently (02/09/23) a draft. No functions for this pham have been called for any of the genes. /note=Starterator: Autoannotation indicates start site 8 @ 39271 with no MAs. The only other gene with this start site is from phage emotion (the only other Cluster AZ within the Pham). Start site 6 has the most MA`s for this gene (18 of 19) but does not appear in my gene. /note=Location call: Start site 8 @39271 is likely the best site due to limited overlap /note=Function call: The highest score non-draft genes were Gudmit and Sidious which had 9e-8 and 3e-7 e- values associated with them. Neither of these phages had a function called for this gene. Additionally, within all top NCBI blastp hits, none of the results give a function call, and instead merely say “hypothetical protein” indicating not enough is known to make the call. No CDD hits appeared. The best HHpred hits were 33.54% probability with Herpesvirus UL56 protein, a tail anchor protein. /note=Transmembrane domains: 0 TMBs were called through TMHMM. With no known function, it is hard to say whether this evidence is corroborated by our other data or not. /note= /note=Primary Annotator Name: LastName, FirstName /note=Auto-annotation: /note=Coding Potential: /note=SD (Final) Score: /note=Gap/overlap: /note=Phamerator: /note=Starterator: /note=Location call: /note=Function call: /note=Transmembrane domains: CDS 39455 - 39667 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="VroomVroom_57" /note=Original Glimmer call @bp 39455 has strength 8.2; Genemark calls start at 39455 /note=SSC: 39455-39667 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein HOU70_gp61 [Arthrobacter phage Liebe] ],,NCBI, q1:s1 97.1429% 0.00118821 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.054, -4.553144366458975, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein HOU70_gp61 [Arthrobacter phage Liebe] ],,YP_009817093,51.4286,0.00118821 SIF-HHPRED: SIF-Syn: /note=Primary Annotator #1: Hernandez, Edgar /note=Auto-annotation: Glimmer and GeneMark both display the same start site #39455, with the start codon GTG. /note=Coding Potential: There is a reasonable amount of coding potential present in both the Host-Trained and Self GeneMark. The chosen start site includes all the coding potential. /note=SD (Final) Score: There’s an SD Final Score of -4.553, which is indicative of a higher sequence match. Meanwhile, the Z-score is 2.054, which is good since anything higher than 2 indicates that the RBS was above the mean. /note=Gap/Overlap: A gap of -8 indicates that there is a small overlap present. /note=Phamerator: The gene is located in Pham 62962, and there are other AZ cluster group members like Liebe and Maureen, which were used to compare synteny with VroomVroom. The function of the gene is unknown as of now (NKF). /note=Starterator: There’s strong evidence suggesting that the start site 1 at #39455 is conserved across all members of Pham 62962 because 2 out of 2 final genes have claimed it as a real start site. /note=Location call: Based on all the pieces of evidence gathered from Pharmerator, Starterator, synteny comparison, and coding potential, the possible start site for the gene is #39455. /note=Function call: PhagesDB and NCBI BlastP did not call for any significant hits. Similarly, CDD did not call for any significant hits either, so PBD HHpred was not able to predict a function for the gene. There were hits present on PECAAN that had a relatively low E-value, aligned with high probability and high coverage when compared to the other hits, but they were declared as originating from a domain with unknown function. Therefore, NKF seems to be the best function to call for this gene. /note=Transmembrane Domains: TMHMM predicted 0 transmembrane domains and as a result, the gene is definitely not a membrane protein since 2 transmembrane domains are required. /note= /note=Primary Annotator #2: Sacristan, Ariana /note=Auto-annotation: Glimmer and Genemark both indicated that the start site was 39455 and that the start codon was GTG. /note=Coding Potential:This gene demonstrates reasonable coding potential predicted within the putative ORF that contains either of the chosen starts site. /note=SD (Final) Score: The final score for the selected start site is -4.553, although this is not the best score, it is still reasonably good and a Z-score of 2.054 further supports this start site. /note=Gap/overlap: The overlap of 8 bsp is reasonably small, the length of 213 bsp is reasonably long, and it is called as the true LORF. /note=Phamerator: As of 02/06/2023 this gene is located in Pham 62962. This Pham has several other AZ cluster group members, such as Maureen and Liebe, which were used to compare against VroomVroom to examine synteny. There is no known function assigned to this gene in Phamerator. /note=Starterator: There is a reasonable start site choice that is conserved among the Pham 62962 members. The start number called the most often in the published annotations is 1 which was called in 2/2 non-draft genes in the pham. /note=Location call: The gathered evidence suggests that this is a real gene that most likely starts at 39455. /note=Function call: There was no convincing evidence defined by PhagesDB or NCBI Blastp that demonstrated any evidence of a known gene function. Additionally, HHpred did not indicate a known function for the gene and there were no indicated CDD hits. All hits with low E-values, high probability and high coverage had an unknown gene function designation or hypothetical protein description. Therefore, the gene is identified as a NKF. /note=Transmembrane domains:There were no predicted TMDs by TMHMM, therefore it is not a membrane protein. CDS 39660 - 39872 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="VroomVroom_58" /note=Original Glimmer call @bp 39660 has strength 13.7; Genemark calls start at 39660 /note=SSC: 39660-39872 CP: yes SCS: both ST: NA BLAST-Start: GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.955, -4.741583905672826, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator #1: Hoang, Ryan /note=Auto-annotation: Both call that the start site is at 39660. This is likely evidence that the start site is indeed there. /note=Coding Potential: The 3rd ORF does appear to have some coding potential, with coding potential starting at the start site. However, through the gene, there does appear to be a dip in the coding potential in the gene. This occurs in both the host trained and phage trained GeneMark. The coding potential is indeed present for the start site of the gene. Furthermore, the coding potential begins to rise just after the start site, indicating that the start site encapsulates all the coding potential. /note=SD (Final) Score: The final score is -4.742 and the Z-score is 1.955. These are both higher than the alternative start site’s score. /note=Gap/overlap: There is an overlap between the start and stop site of the upstream gene of 7bp. This is not the longest ORF that could be possible, but this is indeed the most realistic ORF as the other ORF would have a very large overlap with the upstream gene. There is a reasonable 3 bp gap between this gene and the downstream gene. /note=Phamerator: The pham it was in was 61848, and the pham was run on 2/6/2023. The pham was an orpham so there were no conserved start sites, nor any other members within the pham. There was no function called for this gene. /note=Starterator: As this was an orpham, there was no reasonable start site conserved amongst the members of this pham. Starterator as a whole, therefore, was rather uninformative. /note=Location call: Based on all this information, I would have to believe that the location call of the start site is indeed accurate. There is some possibility that due to the 7bp overlap, the gene might not be correct, and should be combined with the upstream gene, but I believe that there is enough evidence to suggest otherwise. Furthermore, the gene does appear to be real with good coding potential. However, it was not conserved in phamerator, as it was an orpham. As a result, there was really no information that we could glean from Starterator or Phamerator, and we’d go with the original start site that was auto-annotated. /note=Function call: Based on the uninformative CDD hits and high e-values for HHpred results, alongside no information from PhagesDB Blast or NCBI Blast, I think it is best that this is called as a NKF gene. /note=Transmembrane domains: There were 0 transmembrane domains that were present. /note= /note=Primary Annotator #2 Name: Scriven, Savannah /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 39660. GTG start codon. /note=Coding Potential: The ORF contains 2 sharp peaks of high coding potential with a gap of low coding potential in both GeneMark Self and Host. Coding potential starts to go up near the start site; start site includes all coding potential. /note=SD (Final) Score: -4.742. Best SD score. Best Z score of 1.955 /note=Gap/overlap: Overlap of 8bp, which is very reasonable. Not longest ORF, but longest ORF would have an unreasonable 95bp overlap. Length still reasonable at 213bp. /note=Phamerator: (02/06/23) 61848; orpham. /note=Starterator: Not informative. /note=Location call: Above evidence suggests this is a real gene at start site 39660. /note=Function call: NKF. No blastp hits on PhagesDB or NCBI. No CDD hits. Uninformative HHpred hits as all e values >>1. /note=Transmembrane domains: 0 TMDs predicted by deep TMHMM, not a membrane protein. CDS 39869 - 40252 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="VroomVroom_59" /note=Original Glimmer call @bp 39869 has strength 11.13; Genemark calls start at 39854 /note=SSC: 39869-40252 CP: yes SCS: both-gl ST: NA BLAST-Start: [hypothetical protein PQE15_gp56 [Arthrobacter phage KeAlii] ],,NCBI, q10:s2 43.3071% 8.06967E-6 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.093, -4.744199388040119, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQE15_gp56 [Arthrobacter phage KeAlii] ],,YP_010678173,54.1667,8.06967E-6 SIF-HHPRED: SIF-Syn: /note=Primary Annotator #1: Hughes, Audia /note=Auto-annotation: Both Glimmer and GeneMark find the start site. They disagree on the start site. Genemark: 39854 Glimmer: 39869 /note=Coding Potential: There is coding potential within putative ORF, start sites cover all coding potential /note=SD (Final) Score: -4.744, RBS score negligible as the overlap is -4 basepairs suggesting an operon /note=Gap/overlap: -4, Chose this option over a candidate with longer ORF as the longer ORF candidate had a bigger overlap (-19) /note=Phamerator: Date of investigation 2/17/23, Pham: 48283. Two members of pham that are both cluster AZ and are both draft genomes (phage in question and Emotion) . No function call in phamerator or phages db. /note=Starterator: Starterator is uninformative, all members of pham are draft /note=Location call: 39869 /note=Function call: Function unknown, All Phagesdb blast hits were function unknown, no CDD hits, no strong HHpred hits as smallest e-value was 34, No NCBI hits, no CDD hits /note=Transmembrane domains: No transmembrane domains predicted CDS 40245 - 40349 /gene="60" /product="gp60" /function="membrane protein" /locus tag="VroomVroom_60" /note=Genemark calls start at 40245 /note=SSC: 40245-40349 CP: yes SCS: genemark ST: NA BLAST-Start: GAP: -8 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.96, -2.6814417326944517, yes F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kim, Cindy /note=Auto-annotation: Glimmer did not call a start site. GeneMark called a start site at 40245 bp. /note=Coding Potential: Coding potential was good on the Host and Self GeneMarks, both with coding potential found on the third forward reading frame. /note=SD (Final) Score: -2.681. This is the best Final Score on PECAAN. /note=Gap/overlap: Overlap of 8 bp. This is a reasonable overlap, but is not small enough to be an operon. /note=Phamerator: As of February 3rd, 2023 this gene is in Pham 61838, but is an Orpham. The gene is conserved in other phages (YesChef and Powerpuff), both of which are also in AZ. /note=Starterator: N/A as this gene is an Orpham. /note=Location call: Based on the above evidence, this is a real gene with a likely start site at 40245 bp. /note=Function call: No conclusion can be made about this gene’s function. The top 2 hits on PhagesDB had unfavorable E values of 7.1 and NCBI Blastp yielded no results as this gene is an Orpham. CDD had no hits, and HHpred yielded no significant hits, again likely due to this gene being an Orpham. /note=Transmembrane domains: DeepTMHMM predicted 1 TMD, and thus we can confer that this gene is a membrane protein with a real TMD. /note= /note=Primary Annotator #2 Name: Smith, Steven /note=Auto-annotation: Only GeneMark called a start site at bp 40245 /note=Coding Potential: Both the Host-Trained and Self-Trained GeneMark showed good coding potential in the region before and around the stop site. /note=SD (Final) Score: -2.81, which is the best final score shown. /note=Gap/overlap: There is a 8 bp overlap which is possible, but does not signal the inclusion of the gene in an operon. /note=Phamerator: 2/7/23: this gene is in pham 61838 but it is an orpham. /note=Starterator: No data, gene is an orpham. /note=Location call: Looking at the above evidence this call is most likely a functional gene starting 40245. /note=Function call: We cannot call a function for this gene. There were no significant hits in CDD or HHpred which is most likely because this gene is an orpham. /note=Transmembrane domains: DeepTMHMM predicted 1 TMD that is 22 AA long. We can assume that this gene is a membrane protein with a real TMD. CDS 40466 - 40726 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="VroomVroom_61" /note=Original Glimmer call @bp 40466 has strength 14.51; Genemark calls start at 40466 /note=SSC: 40466-40726 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Streptomyces sp. DI166] ],,NCBI, q5:s1 39.5349% 0.0412149 GAP: 116 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.037, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Streptomyces sp. DI166] ],,WP_176711819,40.0,0.0412149 SIF-HHPRED: SIF-Syn: /note=Primary Annotator #1 Name: Kretschmer, Thomas /note=Auto-annotation: GLIMMER and GENEMARK both predict agree the start site is likely 40466 the predicted starting codon is GTG. /note=Coding Potential: There is reasonable coding potential this for this gene, and it`s start site contains the entirety of the coding potential in Host-trained, but Self trained it does not contain the whole coding potential. /note=SD (Final) Score: The final score is -2.443. This is the best final score. /note=Gap/overlap: The gap is 116bp. This is fairly large, but if we assume the gene upstream is real, then this is reasonable and the smallest gap available. if we remove this gene and switch to the 40298 start site, then the gap becomes 45pb. However, this start site`s Z score is poor. /note=Phamerator: as of 2/10/2023, pham 48339. This pham is not well conserved and only has one other draft genome with the same pham. There is no stated function for this gene. /note=Starterator: There is no reasonable start site conserved because the pham is not conserved, and the one other phage does not share this start site The start site is start 3 at 40466. 1 of 2 have this start site. Therefor starterator is uninformative. /note=Location call: This suggests the gene is real and starts at 40466 /note=Function call: NKF; no good hits on HHpred, CD hits, or BlastP. No information obtained from synteny. /note=Transmembrane domains: There were no transmembrane domains indicating that this is not a membrane protein. /note= /note=Primary Annotator #2 Name: Tosasuk, Kaemin /note=Auto-annotation: Both Glimmer and Genemark predict the start site to be at 40466, with a start codon of ATG. /note=Coding Potential: Yes, the start covers all of the coding potential from GeneMark maps. /note=SD (Final) Score: This start site has a final score of -2.443, which is the highest out of the three possible start sites. /note=Gap/overlap: This start site (40466) has a gap of 166 bp, which is the most reasonable, as start site 40709 has a larger gap of 359 bp. Furthermore, the other start site at 40298 has an overlap of -52 bp. /note=Phamerator: pham 48339. This Pham is not well conserved and only has one gene from another draft genome in it. There is no stated function for this gene. /note=Starterator: Starterator was uninformative. The Pham is not conserved and there is only one other member, which is a draft gene. The start site is start 3 at 40466, which is 1/2 found start sites. /note=Location call: This evidence suggests that the gene is real and has a start site of 40466. /note=Function call: NKF; no good hits on BlastP, HHpred, or CDD hits. No synteny with other phages. /note=Transmembrane domains: N/A, none detected CDS 40905 - 41510 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="VroomVroom_62" /note=Original Glimmer call @bp 40905 has strength 12.56; Genemark calls start at 40905 /note=SSC: 40905-41510 CP: yes SCS: both ST: NI BLAST-Start: GAP: 178 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.274, -1.953940808934884, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: There is only one other genome in the same pham, which is Emotion, another draft genome. Since that is the only other genome, there is very little useful information that can be used to determine characteristics confidently. However, it seems like the two aligns well with each other. The genes upstream and downstream in the genomes currently have NKF. /note=Primary Annotator #1 Name: Le, Vivian /note=Auto-annotation: Both Glimmer and GeneMark call for a start of 40905. /note=Coding Potential: There is reasonable coding potential in the forward direction. However, there are some in the reverse direction as well, but it is not significant enough where we would declare it not a real gene. /note=SD (Final) Score: -1.954. It is the best final score on PECAAN. /note=Gap/overlap: 178 bp. There is a gap, but there was no reasonable coding potential found in the gap. Any other gap/overlap would be unreasonable with the previous gene. /note=Phamerator: The pham number as of 02.08.2023 is 48567. The gene is conserved in Emotion, a draft gene. There were no non-draft genes with the pham. /note=Starterator: The start number 7 was called in 0/0 of non-draft genes in the pham. Start 7 is 40905 in VroomVroom. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Considering all of the evidence above, this gene is currently most likely at start 7 at 40905 bp, however, there is too little information/synteny to be confident. It is not informative enough. /note=Function call: No known function. For now, the function is unknown, because there were no hits in the NCBI BLASTp and the PhagesDB had very few hits and did not have great e-scores. The same goes for CCD and HHpred. /note=Transmembrane domains: There were no transmembrane domains. The topology graph was on the inside throughout. There was no known function predicted for this gene so the TMD makes sense. /note= /note=Primary Annotator #2 Name: Tran, Krysten /note=Auto-annotation: Both Glimmer and GeneMark; both agree on the same start site at 40905; Start codons called - GTG, ATG /note=Coding Potential: The gene does have reasonable coding potential predicted within the putative ORF and the start site does cover all the coding potential. /note=SD (Final) Score: -1.993 is the most favorable score on PECAAN. /note=Gap/overlap: There is a gap of 178 bp, which is a large gap but there is no coding potential indicated on GeneMark in this region to justify adding a new gene or changing the start site of this gene. /note=Phamerator: The gene is found in the pham 48567 as of 02/07/23. The gene is conserved in other members of the cluster AZ, such as Tuck and AEgle. /note=Starterator: The start number 7 was not called in any of the non-draft genes in the pham. Start number 7 is at position 40905 in this phage. This was the start site with the most manual annotations and is consistent with the start sites called by both Glimmer and GeneMark. /note=Location call: Based on all the evidence gathered, the start site for this gene is likely at 40905, as this site has the most reasonable gap and RBS score. Additionally, it covers all the coding potential in the region without eliminating any other coding potential based on the GeneMark. /note=Function call: I cannot hypothesize the function of the gene as the PhagesDB BLASTp did not have any good hits (all the e-values for non-draft genes were very high) and NCBI BLASTp had no results. Additionally, CDD had no hits at all for this gene. For HHpred, none of the hits meet all the requirements for being a good hit (values are not in the target range listed on the slides). The majority of the results have a low probability, low percent coverage, and high E-values. /note=Transmembrane domains: There are NO TMDs present for this protein based on the text and the topology graph as well as the probability graph. The absence of TMDs makes sense because there is no hypothesized function for this gene. CDS 41602 - 41874 /gene="63" /product="gp63" /function="hypothetical protein" /locus tag="VroomVroom_63" /note=Original Glimmer call @bp 41608 has strength 14.66; Genemark calls start at 41608 /note=SSC: 41602-41874 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_ADOLIN_66 [Arthrobacter phage Adolin]],,NCBI, q3:s2 97.7778% 2.09682E-25 GAP: 91 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.881, -2.7653702713535186, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_ADOLIN_66 [Arthrobacter phage Adolin]],,QHB36648,68.4783,2.09682E-25 SIF-HHPRED: SIF-Syn: /note=Gene (stop@41874 F) /note=Primary Annotator Name: Unanwa, Nnaemeka /note=Auto-annotation: ATG and GTG start codons predicted, which are common start codons. Glimmer and GeneMark agree with 41608 as the start. /note=Coding Potential: Yes, coding potential along the length of the gene. The Glimmer and GeneMark start site covers the entire coding potential. /note=SD (Final) Score: -3.867. Somewhat weak score, there are two other sites that have higher scores. /note=Gap/overlap: 97 BP Gap. This is a reasonable gap because a gene cannot be inserted here. /note=Phamerator: Pham 3945 on 01/27/23. Conserved in Tweety19_63 and VResidence_65. /note=Starterator: The most annotated start site for this gene is start 11 @41602 with 10 MAs. This is also the Most Annotated start site for 10-16 non draft genes in the pham. This is strong evidence that the true start site for this gene is start 11 @41602. /note=Location call: This gene is most likely real based on evidence from coding potential, synteny, etc. Real start codon may be at 41602 (z-score is closer to 0) /note=Function call: Unknown. Top hits phages on phagesDB were function unknown genes, and top hits on NCBI were hypothetical genes (also with no function). There were no hits on CDD, and the top hits on HHpred had unacceptably high e-values (lowest e-value was 60). This means that this gene`s function is still unknown. /note=Transmembrane domains: No transmembrane domains detected on DeepTMHMM, meaning that this is not a membrane protein. /note= /note=Primary Annotator #2 Name: Li, Anna /note=Auto-annotation: Both Glimmer, GeneMark; agree at the same start site (site #: 41608); start codon called: ATG and GTG /note=Coding Potential: Yes. gene has reasonable coding potential. Chosen start site covers all of the coding potential. /note=SD (Final) Score: -3.867 (3rd best FS; score may still be reasonable with Z-score > 2) /note=Gap/overlap: +97bp; gap is reasonable, no gene can be reasonably inserted into this gap /note=Phamerator: 3945 as of 2023-02-08, other non-draft members of subcluster (AZ) contained this pham (i.e. Adolin37, VResidence) /note=Starterator: Most manually annotated start site 11 in 10/16 non-draft genomes. Start site 11 @41602 in VroomVroom. /note=Location call: Gene is most likely real; start site most likely at 41602 compared to 41608 (gene displays very little synteny, former start site called by Starterator and supported by RBS and Z-score) /note=Function call: NKF: HHPred fails to call any hits with significantly small E-values (all above 90), all hits in NCBI Blast call hypothetical proteins, hits in PhagesDB that call tape measure protein have E-values of 3.3. Failure of agreement between NCBI, HHPred and PhagesDB along with insignificant hits (high E-values) makes this gene a "NKF" gene. /note=Transmembrane domains: TMHMM does not call any transmembrane domains in this region of the genome. CDS 41876 - 42163 /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="VroomVroom_64" /note=Original Glimmer call @bp 41876 has strength 11.52; Genemark calls start at 41876 /note=SSC: 41876-42163 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Arthrobacter sp. EPSL27] ],,NCBI, q15:s7 82.1053% 2.93334E-18 GAP: 1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.797, -3.080676611399484, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter sp. EPSL27] ],,WP_066435982,59.5506,2.93334E-18 SIF-HHPRED: SIF-Syn: /note=#1 /note=Primary Annotator Name: Li, Mulin /note=Auto-annotation: Both Glimmer and GeneMark call and agree on the same start site (@41876) /note=Coding Potential: Both self-trained and host-trained GeneMark predict high coding potential at the second forward codon position. /note=SD (Final) Score: the SD score is -3.081. /note=Gap/overlap: The upstream gap is 1 bp and the downstream gap is 439 bq. No coding potential is displayed in this gap. /note=Phamerator: This gene product belongs to Pham 67487, which includes 32 members and 15 of them are draft. 30 phages in this Pham belong to the AZ cluster, which make this Pham a good reference point. The phamerator was generated on 01/27/23 /note=Starterator: Two start sites are proposed for VroomVroom which are start 14 @41876 and start 30 @ 42158. Start 14 is only called by VroomVroom and is not manually annotated. This start site has high RBS score and is supported by GeneMark coding potential graphs. /note=Location call: This gene is called to start at 41876 and ends at 42163. /note=Function call: While PhageDB blast and NCBI blast report HNH endonuclease as top hits with credible e-value, HHpred and CDD analysis do not produce any aligned protein crystal structure. With not enough evidence, I will call this protein NFK. /note=Transmembrane domains: No transmembrane domains are predicted by either TmHmm or TopCons. /note=Secondary Annotator Name: /note=Secondary Annotator QC: /note= /note=#2 /note=Primary Annotator Name: Vajragiri, Shreya /note=Auto-annotation: Both Glimmer and GeneMark call a gene here, both saying the start site is 41876. /note=Coding Potential: There is strong coding potential in both Self-trained and Host-trained GeneMark; it is on the forward strand so it is a forward gene. /note=SD (Final) Score: -3.081; this is the best possible RBS score on PECAAN. /note=Gap/overlap: 1bp. The gap is very small and inconsequential. /note=Phamerator: 1/27/23 - Pham is 67487. Gene is conserved in phages Liebe, and Kaylissa, both are in the AZ cluster. Function call is HNH endonuclease, which aligns with some of the other members of the Pham in the AZ cluster (Liebe, Kaylissa, etc.) /note=Starterator: The start site is called to be 14 by 1/32 genes in the Pham (no manual annotations). This is different from the start site other AZ Pham members use, which vary. However, VroomVroom does not contain any of the other called start sites. It corresponds to start position 41876. Agrees with Glimmer/GeneMark. /note=Location call: Evidence supports that this is a real gene, and its start site is 41876. /note=Functional call: There is no strong evidence for any protein function. Most BLAST, and HHPred hits are for one kind of protein, HNH endonuclease; but the e-values are very high (>0.11). Therefore, I don’t think I can assign the protein a function. /note=Transmembrane domains: DeepTMHMM predicts 0 transmembrane domains; but that the protein is an ‘inside’ one. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: tRNA 42372 - 42446 /gene="65" /product="tRNA-Trp(cca)" /locus tag="VROOMVROOM_65" /note=tRNA-Trp(cca) CDS 42603 - 42812 /gene="66" /product="gp66" /function="hypothetical protein" /locus tag="VroomVroom_66" /note=Original Glimmer call @bp 42603 has strength 6.43; Genemark calls start at 42603 /note=SSC: 42603-42812 CP: yes SCS: both ST: NI BLAST-Start: GAP: 439 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.037, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Vanderpool, Lauren /note=Auto-annotation: Glimmer and GeneMark agree, start at 42603 /note=Coding Potential: There seems to be a great deal of coding potential based on the Host-Trained GeneMark map. /note=SD (Final) Score: -2.443, which is the best final score out of the options available. /note=Gap/overlap: 440 bp gap, which is far greater than the 50 bp limit, but there is potential for a gene /note=Phamerator: 48236 (as of 2/6/2023) /note=Starterator: Start: 1 @ 42603 (found in 2/2) /note=Location call: 42603 /note=Function call: NKF, because the e-values from the hits were far too high to be taken into account /note=Transmembrane domains: /note= /note=Primary Annotator Name: Lim, Madeleine /note=Auto-annotation: Glimmer and GeneMark: 42603 /note=Coding Potential: High (3rd reading frame in Genemark) /note=SD (Final) Score: -2.443 (higher than -4.167 in other potential start site) /note=Gap/overlap: 440 bp, but there is potential for a gene around ~42450 to 42550 /note=Phamerator: 48236 (as of 2/6/2023) /note=Starterator: Start: 1 @ 42603 /note=Location call: 42603 /note=Function call: NKF (Only two possible aligned sequences (PF12577.11 & 6YXX_E7) were found via HHPred, but both have a very low probability and coverage and a very high e-value; there were no hits for CDD) /note=Transmembrane domains: 0, appears to be an internal protein CDS 42799 - 43164 /gene="67" /product="gp67" /function="HNH endonuclease" /locus tag="VroomVroom_67" /note=Original Glimmer call @bp 42799 has strength 1.59 /note=SSC: 42799-43164 CP: yes SCS: glimmer ST: SS BLAST-Start: [HNH endonuclease [Rathayibacter sp. VKM Ac-2803] ],,NCBI, q1:s1 84.2975% 2.54995E-41 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.881, -2.827683592113848, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Rathayibacter sp. VKM Ac-2803] ],,WP_159988569,72.549,2.54995E-41 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,61.9835,97.3 SIF-Syn: Phage Consensus and ss_pred have similar sequences compared to our genes. The Blast result shows a low E value and a good score to show evidence. /note=Primary Annotator Name: Martin, Kyle /note=Auto-annotation: Glimmer starts at 42799. GeneMark was not used. /note=Coding Potential: The Coding potential is in both the forward and reverse direction. The coding potential is similar in both the Host and Self-Trained GeneMark. /note=SD (Final) Score: -2.828 (best of all candidates) /note=Gap/overlap: The gap is -14 BP, which indicates an overlap. The length of the gene is also reasonable. /note=Phamerator: The pham number is 68556. The analysis was run on 01/27/23. The pham was present in other members of the cluster AZ and was compared against phage Emotion. The function was unknown. /note=Starterator: The start site choice conserved among the members is 68556. The start site is number 35. /note=Location call: The gene is real and the start site is 42799. /note=Function call: The predicted function is HNH endonuclease, based on multiple hits from HHPRED. The NCBI Blast also indicates multiple hits with e-values of e-41 and coverage of 80% or higher. /note=Transmembrane domains: Zero /note= /note=Secondary Annotator Name: Vu, Thomas /note=Auto-annotation: Only Glimmer provides a reading with a start site at 42799. /note=Coding Potential: There is moderate coding potential throughout frame 6 (reverse direction). Both the self-trained and host-trained GeneMark corroborate each other to suggest that the nucleotide region is a real gene. /note=SD (Final) Score:-2.828 (best) /note=Gap/overlap: Gap of -14 BP. This indicates a small overlap of 14 BP between this gene and the one before it. This candidate still remains the most reasonable as it exhibits synteny and still minimizes the gap. /note=Phamerator: The Pham number is 68558 and the analysis was run on 01/27/23. The pham is shared among other members of cluster AZ as well as BD and DD. VroomVroom was compared against phage Amyev. The function is not yet determined. /note=Starterator: The most conserved start site is at start site 104 which corresponds to a BP coordinate of 42799. This was the most annotated start site with 3 MA`s. There are 757 members of which 37 are drafts. /note=Location call: This is a real gene with a likely start site of 42799. /note=Function call: Predicted function is HNH endonuclease, based on multiple hits from HHPRED such as 5H0M_A which has a probability of 97.3%, coverage of 62%, and e-value of 1.4e-3. NCBI Blast also indicate multiple hits with e-values of e-41 and coverage of 80% or higher. /note=Transmembrane domains: No TMDs were predicted by TMHMM or TOPCONs, so no transmembrane function inferred.