CDS 132 - 560 /gene="1" /product="gp1" /function="terminase, small subunit" /locus tag="JasmineDragon_1" /note=Original Glimmer call @bp 132 has strength 10.98; Genemark calls start at 132 /note=SSC: 132-560 CP: yes SCS: both ST: SS BLAST-Start: [terminase small subunit [Arthrobacter phage Emotion]],,NCBI, q1:s1 100.0% 6.2762E-73 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.303, -1.953940808934884, yes F: terminase, small subunit SIF-BLAST: ,,[terminase small subunit [Arthrobacter phage Emotion]],,WGH21350,86.6197,6.2762E-73 SIF-HHPRED: Terminase small subunit; genome packaging, bacteriophage, DNA binding, VIRAL PROTEIN; 1.4A {Enterobacteria phage HK97},,,6Z6E_A,56.338,98.5 SIF-Syn: Terminase, small subunit; no upstream gene available; downstream is terminase large subunit, which is conserved in both Emotion and VroomVroom /note=Primary Annotator Name: TRAN, MICHELLE /note=Auto-annotation start source: Glimmer and GeneMark have both annotated this gene and agree that the start site is 132 (start codon: ATG). /note=Coding Potential: The most support for the coding potential in the ORF is on the forward strand, which supports this gene being a forward gene. This is consistent across both GeneMark Self and Host. The chosen start site also includes all of the coding potential. /note=SD (Final) Score: The final score for the start site is -1.954, which is the best start site available on PECAAN. /note=Gap/overlap: 131-bp gap upstream. This gap is large – but it seems to be reasonable because it is conserved in approved phages within the same subcluster (VroomVroom and Emotion), and there is no coding potential in the gap to suggest the addition of a new gene. /note=Phamerator: As of October 30, 2023, this gene is part of pham 120420. This is conserved in Emotion (AZ4). /note=Starterator: Start site 19, which corresponds to 132 in JasmineDragon, was manually annotated in 55/66 of the non-draft genes in the pham. This evidence would agree with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, the gene is most likely to be real, with a start site of 132. /note=Function call: Terminase small subunit. The top three PhagesDB BLAST hits have this function (e-value <10e-45); the top three NCBI BLAST hits are consistent with this result (e-value <10e-55). HHpred has a hit for this function with ~98% probability, 56.338% coverage, and an e-value of 1.5e-6. CDD had no relevant hits. /note=Transmembrane domains: This is not a membrane protein because DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: CHAMORRO, MARCO /note=Secondary Annotator QC: I have QC`ed this location call and I agree with the primary Author`s assessment. Genemark and Glimmer call the same start site, and the region from 132 to 560 exhibits high coding potential in the forward direction. The Z-score is above 3 which is strong. The Final score of this particular start site is the largest compared to the other potential start sites. In addition, the start site has synteny with VroomVroom and Emotion. Thus, the start site and location call mentioned by the primary annotator is legitimate. /note=For the Module 9 QC. I agree with the primary annotator’s function call. HHpred and the NCBI BLAST both appear to have strong evidence in support of the gene being a terminase, small subunit protein. The TMD call is correct and the synteny notes are correct. CDS 550 - 2277 /gene="2" /product="gp2" /function="terminase, large subunit" /locus tag="JasmineDragon_2" /note=Original Glimmer call @bp 550 has strength 17.83; Genemark calls start at 550 /note=SSC: 550-2277 CP: yes SCS: both ST: SS BLAST-Start: [terminase large subunit [Arthrobacter phage VroomVroom]],,NCBI, q2:s3 99.8261% 0.0 GAP: -11 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.711, -3.8130952303033587, yes F: terminase, large subunit SIF-BLAST: ,,[terminase large subunit [Arthrobacter phage VroomVroom]],,WIC90152,97.9167,0.0 SIF-HHPRED: Terminase large subunit; genome packaging, bacteriophage, ATPase, nuclease, VIRAL PROTEIN; HET: BR; 2.2A {Enterobacteria phage HK97},,,6Z6D_A,92.0,100.0 SIF-Syn: Terminase, large subunit; upstream gene is terminase small subunit, downstream gene is portal protein, just like in phage Emotion and VroomVroom. /note=Primary Annotator Name: Kim, Abby /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 550. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.813. It is the best final score on PECAAN /note=Gap/overlap: -11bp overlap. It is a pretty small overlap and is reasonable because of how it is conserved in other phages such as Emotion and VroomVroom. /note=Phamerator: pham: 121169. Date 10/28/23. It is conserved and found in VroomVroom (AZ) and Emotion (AZ). /note=Starterator: Start site 47 in Starterator was manually annotated in 3 non-draft genes in this pham. Start 47 is 550 in VroomVroom and this evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location Call: Based on the evidence, this is a real gene and the most likely start site is 550. /note=Function call: Terminase large subunit. BLAST and HHpred analyses show a highly probable match to known termianse large subunits (HHpred probability 100%, E-value 6.8e-39), suggesting the ORF’s role in phage DNA packaging. This finding is supported by CDD, which indicates a terminase-like function. The predicted activity involves ATPase and nuclease actions necessary for phage replication, as evidenced by the low E-values and high similarity to functionally characterized proteins. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note= /note=Secondary Annotator Name: Kalliomaa, Kira /note=Secondary Annotator QC: /note=I have QC’ed this annotation and agree with this annotation. However: /note= -FunctionCall: Are the “probable matches” your top hits? How many hits did this agree with on Blast and HHpred? Is ther HHpred alignment and to what are they aligned with? CDS 2299 - 3663 /gene="3" /product="gp3" /function="portal protein" /locus tag="JasmineDragon_3" /note=Original Glimmer call @bp 2299 has strength 18.27; Genemark calls start at 2299 /note=SSC: 2299-3663 CP: yes SCS: both ST: SS BLAST-Start: [portal protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 21 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.303, -2.033982896655645, yes F: portal protein SIF-BLAST: ,,[portal protein [Arthrobacter phage VroomVroom]],,WIC90153,98.4615,0.0 SIF-HHPRED: Portal protein; Bacteriophage, SPP1, Portal Protein, Head completion proteins, Connector Complex, DNA Channel, VIRAL PROTEIN; 2.7A {Bacillus subtilis},,,7Z4W_L,91.8502,100.0 SIF-Syn: Portal protein, upstream is terminase large subunit, downstream is NKF. conserved in phage Emotion. /note=Primary Annotator Name: To, Nathan /note=Auto-annotation start source: Glimmer and Genemark both agree on start site 2299 with start codon ATG. /note=Coding Potential: Both Genemark Self and Host show coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. /note=SD (Final) Score: -2.034, this is the best RBS final score available. /note=Gap/overlap: 21bp, this is reasonable, it is the LORF, giving a gene length of of 1365, which is acceptable. /note=Phamerator: This gene is in pham 121167 as of 10/29/31. This pham is in other members of cluster AZ including Emotion and Vroom Vroom. There is a consistent call for this gene as a portal protein. /note=Starterator: The called start site is conserved among other phages in the pham. The called start is start 98 @2299. 158 of 1634 genomes call site #98. When site #98 is present, it is called 94.3% of the time. /note=Location call: The gathered evidence suggests that this is a real gene, with start site @2299. This gene is conserved, has good coding potential, and does not have large gaps before or after it. The start site 2299 seems most likely due to its conservation among other genomes in the same cluster and high likelihood of being called when present. /note=Function call: Predicted function is portal protein, based on multiple PhagesDB BLASTs with predicted function portal protein and E-value=0. Additionally, CDD returned function call of portal protein with E-values of 1.87e-31. HHpred called function portal protein with E-value 5.4e-34, probability 100 and % coverage 91.8502. /note=Transmembrane domains: 0, This makes sense because the portal protein exists within the capsid, binding the capsid to the tail and is not a transmembrane protein. /note=Secondary Annotator Name: Mahadev, Anirudh /note=Secondary Annotator QC: I have QC`d this gene and agree with the Primary Annotators call that this is a real gene and should start at position 2299. After reviewing the Host Trained GeneMark, I agree with the claims about strong coding potential, and both the highest Z score of 3.303 and least negative final score of -2.034 support the claim that position 2299 is the most reasonable start site. Both Glimmer and GeneMark agree on the start site at 2299, which the Primary Annotator claims is the best start site. Based on the Primary Annotator`s justification of the start site at position 98 (2299) in their annotation notebook, I agree that Start Site 98 is the best start site even though the most annotated start site is Start Site 71, because JasmineDragon does not have the most annotated start site at position 71, and because Start Site 98 is called 94.3% of the time when it is present. For the gap, a gap of 21 bp is reasonable and no gene needs to be inserted before this gene. In addition, there is no coding potential in the gap, and the other ORF`s support this conclusion. The only thing I think the Primary Annotator needs to change is to fill in the "All GM Coding Capacity" drop down menu, and I think it should be "Yes" because of the high coding potential. /note=The following comments are for the Module 9 QC. I agree with the primary annotator’s function call and also believe this is a portal protein. Please make sure to check off the evidence boxes under HHpred and the NCBI BLAST, as they both appear to have strong evidence in support of the gene being a portal protein. Also, make sure to check the coding potential drop down menu to say “yes”, as there is coding potential within the ORF of the gene. CDS 3667 - 4464 /gene="4" /product="gp4" /function="hypothetical protein" /locus tag="JasmineDragon_4" /note=Original Glimmer call @bp 3667 has strength 10.93; Genemark calls start at 3667 /note=SSC: 3667-4464 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_4 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 3 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.985, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_4 [Arthrobacter phage VroomVroom]],,WIC90154,99.2453,0.0 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Valente, Nina /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 3667. /note=Coding Potential: Coding potential on this ORF is forward strand only, indicating that this is a forward gene. The selected start site covers all area of this gene. /note=SD (Final) Score: -2.60. It is the best final score on PECAAN. /note=Gap/overlap: Gap: 3bp, extremely small gap, and is conserved in VroomVroom and Emotion. /note=Phamerator: 118017, pham: 74806. Date 10/23/2023. It is conserved; found in VroomVroom (AK) and Emotion (AK). /note=Starterator: Start site 3 was called in 6/10 non-draft genes in this pham. Start site 3 is 3667 in JasmineDragon. Suggested start is the same, 3667. /note=Location call: Based on the above evidence this is a real gene with a start site at 3667. /note=Function call: The top two PhagesDB hits are both listed as having an unknown function. The first, VroomVroom with an e-value of E-144 and 94% alignment, and the second being Emotion which has an e-value of E-120 and 80% alignment. The NCBI Blastp also showed that VroomVroom (e-value 0, 95% alignment) and Emotion (e-value 5E-152, 81% alignment) are the most similar, both being listed here as hypothetical proteins as well. There were no results for CDD, and only one result from HHpred with very low probability. Based on this evidence function call - NKF. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Second Annotator Name: Hon, Darren /note=Second Annotator QC: The coding potential was identified as good, and the GeneMark maps support this claim. According to the table, the gap was 3, indicating no necessary gene insertion is necessary as the gap is sufficiently small. The Z-score of 2.985 and the Final Score is -2.601, and these values are optimal. There is synteny found when comparing this pham to Emotion and VroomVroom. The start site was sufficiently supported by evidence of the starterator and phamerator reports. The only necessary addition to this annotation is citing the GM Coding Potential under the gene candidates table. CDS 4542 - 5126 /gene="5" /product="gp5" /function="scaffolding protein" /locus tag="JasmineDragon_5" /note=Original Glimmer call @bp 4542 has strength 16.34; Genemark calls start at 4542 /note=SSC: 4542-5126 CP: yes SCS: both ST: SS BLAST-Start: [scaffolding protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.9385E-104 GAP: 77 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.985, -2.66371296573402, yes F: scaffolding protein SIF-BLAST: ,,[scaffolding protein [Arthrobacter phage VroomVroom]],,WIC90155,92.8205,1.9385E-104 SIF-HHPRED: Scaffold protein; major capsid protein, HK97-like fold, scaffolding protein, procapsid, VIRUS; 3.72A {Staphylococcus phage 80alpha},,,6B0X_b,57.2165,96.9 SIF-Syn: Scaffolding protein, upstream gene is portal protein, downstream gene is major capsid protein, just like in phage emotion /note=Primary Annotator Name: Laureano, Ryan /note=Auto-annotation: Glimmer and Genemark both call the start site at 4542 and ATG is the start codon. /note=Coding Potential: The gene has coding potential under the putative ORF as >80% of it covers the start to stop codons. The 4542 start site covers most of the potential. /note=SD (Final) Score: The SD score is the best one available as it is the only one available. This SD score is -2.664. /note=Gap/overlap: The gap upstream the gene is a little longer than expected at 77 bps. This is still the longest ORF. The length of the gene is acceptable at over 500 bps. The gap is present in other non-draft Pham members. /note=Phamerator: The gene is in Pham 1850 as of 11/1/23. This gene shares this pham with Phives_7 which are both part of the AZ cluster. There is no function called with this gene but other genes in the pham are called as scaffolding proteins. /note=Starterator: There is a conserved start site that is reasonable. The start number is 17 and the start site is 4542. 47/50 call start #17. /note=Location call: The gene is a real gene and the marked start site is the only possible start site. The coding potential is mostly covered and it is the longest possible ORF. The start site is at 4542. /note=Function call: The predicted function is a scaffolding protein. The phagesDB and NCBI Hit BLASTp runs showed hits with VroomVroom_5 which is a scaffolding protein. The e value is low at 7 e-96 and identity at 89%. The HHpred PDB hit has a high probability at 96.9% and a high coverage at 57.21% . The hit called for a scaffolding protein that interacts with the major capsid protein. There is no CDD data. /note=Transmembrane domains: There are no TMRs called by DeepTMHMM. /note=Secondary Annotator Name: Valente, Nina /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. The manual and auto-annotated start sites agree, coding potential is almost entirely covered by the ORF. Just make sure to include the actual SD final score and check if the gap is conserved in other non-draft pham members. /note=I agree with this annotation (11/16).The PhagesDB and NCBI Blastp both show the most significant non-draft alignment as VroomVroom, which is listed in both as a scaffolding protein. All other subsequent hits are listed as a scaffolding protein, or head scaffolding protein. Only suggestion would be to add a note about the CDD results, or lack thereof. Also, maybe add something about VroomVroom in the synteny box. CDS 5155 - 6096 /gene="6" /product="gp6" /function="major capsid protein" /locus tag="JasmineDragon_6" /note=Original Glimmer call @bp 5155 has strength 16.05; Genemark calls start at 5155 /note=SSC: 5155-6096 CP: yes SCS: both ST: SS BLAST-Start: [major capsid protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 28 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.76, -3.123974469270304, no F: major capsid protein SIF-BLAST: ,,[major capsid protein [Arthrobacter phage VroomVroom]],,WIC90156,99.0415,0.0 SIF-HHPRED: P22_CoatProtein ; P22 coat protein - gene protein 5,,,PF11651.12,91.6933,100.0 SIF-Syn: Major capsid protein, upstream gene is a scaffolding protein, downstream gene is a head-to-tail adaptor, just like in phage VroomVroom. /note=Primary Annotator Name: Ryan, Kaitlin /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 5155. /note=Coding Potential: High coding potential in the forward direction as indicated by the direct sequence on both GeneMark Host and GeneMarkS output. /note=SD (Final) Score: -3.124. It is the least negative and thus the best final score on PECAAN. /note=Gap/overlap: 28 base pairs. This gap is conserved in other phages (VroomVroom, Emotion) and there is no coding potential for the gap that indicates it is a new gene. /note=Phamerator: pham: 228. Date: 10/28/23. It is conserved; found in VroomVroom (AZ) and Emotion (AZ). /note=Starterator: Start site 8 was manually annotated in 134/252 of the non-draft genes in this pham, including JasmineDragon. /note=Location call: Based on the above evidence, this is a real gene and the start site is 5155. /note=Function call: Major capsid protein. The top two phagesdb BLAST hits have the function of major capsid protein (E-value <10^-44), and 3 out of 5 top NCBI BLAST hits also have the function of major capsid protein. (78%+ identity, and E-value <10^-175). HHPRED had a hit for major capsid protein with 99% probability, 92.97% coverage, and an E-value of 4.6e-27. CDD had a relevant hit belonging to the Coat Protein superfamily, with an E-value of 3.52e-10. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs; therefore, this is not a membrane protein. /note= /note=Secondary Annotator Name: Sanchez, Kayla /note=Secondary Annotator QC: I agree with the annotation, and all of the evidence has been considered appropriately. CDS 6172 - 6573 /gene="7" /product="gp7" /function="head-to-tail adaptor" /locus tag="JasmineDragon_7" /note=Original Glimmer call @bp 6172 has strength 11.41; Genemark calls start at 6172 /note=SSC: 6172-6573 CP: yes SCS: both ST: SS BLAST-Start: [head-to-tail adaptor [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 99.2481% 1.48183E-87 GAP: 75 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.303, -1.953940808934884, yes F: head-to-tail adaptor SIF-BLAST: ,,[head-to-tail adaptor [Arthrobacter phage VroomVroom]],,WIC90157,99.2481,1.48183E-87 SIF-HHPRED: Phage_Gp19 ; Phage protein Gp19/Gp15/Gp42,,,PF09355.14,77.4436,99.4 SIF-Syn: Head-to-tail adaptor, upstream gene is major capsid protein, downstream is NKF, just like in phage Emotion and VroomVroom /note=Primary Annotator Name: Sanchez, Kayla /note=Auto-annotation start source: Both Glimmer and Genemark. Both call the start at 6172. /note=Coding Potential: Good coding potential is found both in GeneMark Self and Host. Our start and stop is outside of the coding potential. Coding potential is in the forward region so we can say that our gene is a forward gene. /note=SD (Final) Score: -1.954. It is the best final score on PECAAN because it is the lowest negative value. /note=Gap/overlap: Gap of 76bp. This is a reasonable gap that remains conserved in other phages (Emotion). Acceptable gene length of 401bp. Though the gap is larger than the 50 bp, this is still a reasonable gap to have because it is conserved in other genes like VroomVroom. /note=Phamerator: pham: 121482. Date 11/1/23. It is conserved; found in VroomVroom (AZ4) /note=Starterator: Start site 7 in Starterator was manually annotated in 32/60 non-draft genes in this pham. Start 7 is 6172 in JasmineDragon. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Function call: Head-to-tail adaptor. The top two PhagesDB BLASTp hits have the function of a head-to-tail adaptor (E-value <10^-54), and the top two NCBI BLAST hits also have the function of head-to-tail adaptor. (79%+ identity, and E-value <10^-67). One of the PDB databases names a head completion protein. There is also an HHpred alignment to SPP1, HK97, and YqbG which is a requirement on the Approved Functions List. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. It predicts that the protein is found on the inside of the membrane which aligns with our function call hypothesis. /note= /note= /note=Secondary Annotator Name: Kalliomaa, Kira /note=Secondary Annotator QC: /note=I have QC’ed this annotation and agree with this annotation`s location and function call. However: /note= -You only need 2-3 pieces of evidence from each tool. 3 max. so under the HHpred section, I would narrow down to your top 3 pieces of evidence, and most likely leave the one with the e^-7 e-value out because while the other evidence in that piece is good, that one is really not and you have better choices of evidence to look over. CDS 6570 - 6671 /gene="8" /product="gp8" /function="hypothetical protein" /locus tag="JasmineDragon_8" /note=Genemark calls start at 6570 /note=SSC: 6570-6671 CP: yes SCS: genemark ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_8 [Arthrobacter phage VroomVroom]],,NCBI, q5:s4 75.7576% 0.00455561 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.88, -2.8779978280336396, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_8 [Arthrobacter phage VroomVroom]],,WIC90158,63.6364,0.00455561 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Hon, Darren /note=Auto-annotation: Gene 8 (stop@6671 F) /note=Coding Potential: Good coding potential as the potential maps demonstrate a single frame of a protein-coding gene as well as only a single forward gene orientation is present in the protein-coding gene. The only reservation is that there is no Glimmer prediction. /note=SD (Final) Score: -2.878, and there is no other final score available for comparison. /note=Gap/overlap: -4, indicating good gap and no additional gene insertions are needed. /note=Phamerator: As of 11/1/23, the phamerator number is 121982. This pham is in cluster AZ and is present in other phages, including Adolin_9 and DrManhattan_9. /note=Starterator: As of 11/1/23, the starterator number is 121982. As the manually annotated start number of 1 supports the auto-annotated number of 1, within 4 non-draft genes of the pham, as well as encompassing all coding potential, the start site number of 1 is sufficient. /note=Location call: Based on the evidence above, this is a real gene with a stop site at 6,671 and a likely start site at 6570. /note=Function call: All data point to this gene having no known function. Within the blastp hits, they non-draft sequences determine that they have no known function. Additionally, the HHpred indicate that there is no significant hits considering they are all greater than 1. CDD indicated no domains. /note=Transmembrane domains: There are no predicted TMDs. Thus, no TMhelix nor “M” are in the text. The sequence indicates a signal within the center. The absence of TMDs make sense, as this aligns with this protein having no known function. /note=Secondary Annotator Name: Hosford, Ryan /note=Secondary Annotator QC: The gene looks to have a start site within an operon with the -4 gap. GeneMark calls the start location of 6570 and this has confirmation with one other manual annotation in the AZ cluster. Shared start number 1 with the other 7 genes within the pham 121982 with VroomVroom and Adolin being notable comparisons 4 manual annotations exist. Syntany observed with VroomVroom having the same phams surrounding it, 121482 before and 121512 after it. The start gap is -4 which indicates the start is on an operon. Z-score of 2.88 and Final score of -2.878 also support, thus I agree that this is the start site for this gene. /note=HHpred had no decisive hits, CDD also had nothing with any considerable E-Value either. The BLASTp also pointed to NKF and so I agree with the primary annotator that there is NKF CDS 6658 - 6999 /gene="9" /product="gp9" /function="head-to-tail stopper" /locus tag="JasmineDragon_9" /note=Original Glimmer call @bp 6658 has strength 17.83; Genemark calls start at 6697 /note=SSC: 6658-6999 CP: yes SCS: both-gl ST: SS BLAST-Start: [head-to-tail stopper [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 4.50288E-71 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.905, -2.827683592113848, yes F: head-to-tail stopper SIF-BLAST: ,,[head-to-tail stopper [Arthrobacter phage VroomVroom]],,WIC90159,98.2301,4.50288E-71 SIF-HHPRED: Stopper protein Rcc01689; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_E,97.3451,99.7 SIF-Syn: Head-to-tail stopper protein, Upstream gene is unknown function, downstream gene is unknown function, just like phage VroomVroom /note=Primary Annotator Name: Claire, Monjov /note=Auto-annotation: Glimmer and GeneMark both have similar start sites. Glimmer is 6658 while Gene mark is 6689 /note=Coding Potential: There is high coding potential for the gene for both programs. More than 80 percent of the gene has potential. Also, the start site it just outside the coding potential which is another good sign. The coding potential is in the forward direction and these gene is a forward gene so it checks out. /note=SD (Final) Score: -2.828. Not super negative which is a good sign. There are other final scores that are much more negative. -2.828 is the highest final score out of the candidates which indicates a great sequence match. /note=Gap/overlap: 7 base pairs. Not a large gap and very reasonable. This gene is conserved in many other phages and that gap is seen in some of those phages as well. /note=Phamerator: Pham 121512 (10/30/23), many members of the Pham have been annotated, many members are also AZ and AZ4. I used VroomVroom (AZ4) and Emotion (AZ4) phages to compare because they are in the same cluster and subcluster of JasmineDragon. 56/69 of the other phages are also in the AZ cluster. Many of the phages also called a head-to-tail stopper function for the gene. /note=Starterator: There were a total of 50 non-draft phage members in pham 121512, and 43 of them called the start site 8. Start site 8 corresponds to base pair start of 6658 for JasmineDragon. Strong evidence in favor of the start site and base pair location start. /note=Location call: 6658. From the evidence presented, the gene itself is real and there is strong support for 6658 to be the start site. With many manual annotations and conservation amongst the phams, this is the best start site in my opinion. /note=Function call: Based on the BLAST results from both NCBI BLASTp and PhagesDB BLASTp, I can conclude that the most probable function of this gene is a head-to-tail stopper. Both websites say that Emotion and VroomVroom have a conserved gene that is super similar to the gene in phage JasmineDragon, and that function of the gene is known. There is strong evidence to point that this gene is a head-to-tail stopper. The HHpred hits also agree with the head-to-tail stopper protein function. All of the data points to this gene coding for a head-to-tail stopper protein. PhagesDB, BLAST, and HHpred all have evidence claiming that the function is a head-to-tail stopper protein and that is solid evidence because more than one source is making those claims. /note=Transmembrane domains: 0 TMD`s found on deepTMHMM. This does make sense because we called a head-to-tail stopper protein which works with DNA and capsid exiting. This protein functions inside the cell during the infection stage and doesn’t function through the membrane so this evidence makes sense in the context of our results. /note=Secondary Annotator Name: Indiresan, Neeti /note=Secondary Annotator QC: It seems that some of the information such as the gap and pham number have updated between the primary annotation and my annotation, however I still agree with the annotations done and the chosen start site of 6658. I agree with the function call and that all the evidence was considered. All the boxes are filled and the evidence checked off supports the functional call made. CDS 7008 - 7289 /gene="10" /product="gp10" /function="hypothetical protein" /locus tag="JasmineDragon_10" /note=Original Glimmer call @bp 7008 has strength 15.01; Genemark calls start at 7008 /note=SSC: 7008-7289 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_10 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 4.32718E-37 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.755, -3.7234890726758456, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_10 [Arthrobacter phage VroomVroom]],,WIC90160,87.0968,4.32718E-37 SIF-HHPRED: SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: Bhattarai, Aryan /note=Auto-annotation: Glimmer Start and GlimmerMark Start are both 7008. /note=Coding Potential: There is high coding potential in the forward direction, as indicated by the graphical line of coding potential in both the Host-Trained GeneMark, and Self-Trained GeneMark. /note=SD (Final) Score: -3.723. It is the best final score on PECAAN. /note=Gap/overlap: 8 bps. Gene isn’t likely part of an operon. /note=Phamerator: As of 11/01/2023, the pham number is 122627. The gene is conserved in phages VroomVroom and Emotion, which are both in the same cluster (AZ4) as JasmineDragon. The function call for the gene is not explicitly stated, nor indicated when observing synteny among non-draft genes of the same subcluster. /note=Starterator: Start site 27 in Starterator was manually annotated in 47/94 non-draft genes in this pham. Start 27 in JasmineDragon is 7008. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 7008. /note=Function call: No known function. This gene has a strong synteny with non-draft phages VroomVroom and Emotion, which both have no known function for their homologous gene. The only strong hits on PhagesDB BLASTp and NCBI BLASTp correlate to genes with no known function. Additionally, CDD and HHpred are both uninformative. The CDD page for this gene is blank, and all HHpred hits, but one (which had a high e-value of 0.0021), yield results with e-values >1. /note=Transmembrane domains: 0. This is not a membrane protein because DeepTMHMM does not predict any TMDs. Additionally, because this gene has no known function (NKF), and synteny cannot be used to compare it to the non-draft phages within its subcluster (VroomVroom and Emotion), it is currently not possible to determine this protein’s function beyond it being an external protein. /note=Secondary Annotator Name: Potter, Sofia /note=Secondary Annotator QC: I don`t disagree with any of your points in your PECAAN notes, but I noticed that this gene is listed twice with the same start and end sites in PECAAN, and wasn`t sure if this had been addressed by you and/or the instructional staff since it wasn`t mentioned above. My version of the PECAAN notes is below. Second QC 11/17/23: I also see synteny with VroomVroom and Emotion`s NKF genes in the same pham and agree with the primary annotator that the correct function call here is NKF. All evidence provided looks good and thorough, along with the PECAAN notes. /note= /note=Auto-annotation: Both Glimmer and GeneMark call the start at 7008. /note=Coding Potential: There is good, visible coding potential in GeneMark Self and Host. /note=SD (Final) Score: -3.723, the lowest final score on PECAAN. /note=Gap/overlap: 8 bp gap /note=Phamerator: As of November 3, 2023, the pham number is 122627. There are corresponding genes within the same pham in both VroomVroom and Emotion, the two non-draft phages in the same subcluster, AZ4. /note=Starterator: Starterator is not loading (may be updating), so I am not able to verify the start site numbers. /note=Location call: Based on the (currently available, minus Starterator) evidence, this appears to be a real gene with a start site of 7008. However, in PECAAN, the start and end sites for this gene appear to be listed twice, as genes 11 and 12. Gene 11 has the Glimmer and GeneMark start calls, but Gene 12 does not list either of those calls. Is this an error with PECAAN? CDS 7286 - 7681 /gene="11" /product="gp11" /function="tail terminator" /locus tag="JasmineDragon_11" /note=Original Glimmer call @bp 7286 has strength 12.27; Genemark calls start at 7286 /note=SSC: 7286-7681 CP: yes SCS: both ST: SS BLAST-Start: [tail terminator [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.89119E-83 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.551, -4.3934175870462076, yes F: tail terminator SIF-BLAST: ,,[tail terminator [Arthrobacter phage VroomVroom]],,WIC90161,96.9466,1.89119E-83 SIF-HHPRED: Tail terminator protein Rcc01690; "neck", "portal", "capsid", "tail tube", VIRUS; 3.58A {Rhodobacter capsulatus},,,6TE9_F,96.9466,99.5 SIF-Syn: Tail terminator, upstream gene is NKF, downstream is major tail protein, just like in phage VroomVroom. /note=Primary Annotator Name: Kang, Alix /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 7286. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.393. It is the best final score on PECAAN. /note=Gap/overlap: Overlap: 4bp. Gene is likely part of an operon. /note=Phamerator: The pham number as of October 30, 2023 is 121479. The gene is conserved in phages VroomVroom and Emotion, all in the same cluster as JasmineDragon. The function call for the gene is a tail terminator and it is consistent between Phamerator and the phams database. It is on the approved SEA-PHAGES list. /note=Starterator: Start site 10 in Starterator was manually annotated in 1/62 non-draft genes in this pham. Start 10 is 7286 in JasmineDragon. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 7286. /note=Function call: Tail terminator. The top three PhagesDB BLAST hits (not including those with an unknown function) have this function, and E-values of 2e-38 to 2e-66. The top two NCBI BLAST hits also have the function of tail terminator. (98-100% coverage, 77%+ identity, and E-value <7e-67). HHpred had a hit for tail terminator protein with 99.5% probability, 96.94% coverage, and E-value of 8.3e-13. CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Zaragoza, Evelin /note=Secondary Annotator QC: /note=Auto-annotation: GeneMark and Glimmer. Both call the start at 7286 which produces the LORF. Gene has all the coding potential with a large Z-score over 2 and the highest final score that suggests the best sequence match. The start site of GTG is also not uncommon. /note=Coding Potential: Chosen start site has all the coding potential. Start site at 7286 and end site have all of the coding potential. There is no coding potential on the complementary strand. /note=SD (Final) Score: Final score is the highest at -4.393 and the z-score is favorable at a value over 2. This suggests the called start site is likely. /note=Gap/overlap: There is overlap of 4 bp upstream and 12 bp gap with the downstream, neither of which are significant. It is possible the gene is part of an operon due to this 4 bp overlap and favorable RBS score. Strong synteny expressed with VroomVroom and Emotion. /note=Phamerator:The pham number is 121479 as of 10/28/2023. It is conserved in phage such as VroomVroom and Emotion; all are non-draft phage. /note=Starterator: Start site 8 was the most manually annotated in 34/62 non-draft genes in this pham. Start 8 is not present for JasmineDragon. Start site 10 which corresponds to 7286 is called for in Starterator and the only other option is start site 38 which corresponds to 7478. This evidence suggests that the site predicted by Glimmer and GeneMark is still the best option because 7478 would not include all of the coding potential. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 7286.The gene does not need to be deleted because the coding potential proves that there needs to be a gene there and the start site of this particular gene has all of the coding potential. /note=Function call: I agree with the primary annotator. The function is tail terminator. VroomVroom and Emotion in PhagesDB BLAST has this function with significant e-values. The top NCBI BLAST and HHpred hits also have the function of tail terminator with significant coverages, identities, and e-values; however, CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM did not predict any TMDs, so it is not a membrane protein. /note=Note: Primary annotator should be more descriptive in synteny box (refer to lab manual) CDS 7693 - 8247 /gene="12" /product="gp12" /function="major tail protein" /locus tag="JasmineDragon_12" /note=Original Glimmer call @bp 7693 has strength 17.72; Genemark calls start at 7693 /note=SSC: 7693-8247 CP: yes SCS: both ST: SS BLAST-Start: [major tail protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.2294E-127 GAP: 11 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.314, -1.993391246735709, yes F: major tail protein SIF-BLAST: ,,[major tail protein [Arthrobacter phage VroomVroom]],,WIC90162,98.913,1.2294E-127 SIF-HHPRED: YSD1_22 major tail protein; Bacteriophage tail, helical assembly, VIRAL PROTEIN; 3.5A {Bacteriophage sp.},,,6XGR_L,90.7609,98.4 SIF-Syn: The function of this gene is major tail protein in the Pham 120341. The upstream gene is a tail terminator in the Pham 124381. The downstream gene is a tail assembly chaperone in the Pham 124366. These characteristics are seen in both VroomVroom and Emotion. /note=Primary Annotator Name: Saud, Arnav /note=Auto-annotation: Glimmer and Genemark both call the start at #7693. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this gene is a forward gene. Coding potential is found in both GenemarkHost and GenemarkSelf. /note=SD (Final) Score:-1.993. PECANN calls this score as the best one. /note=Gap/overlap: The gap between start site and previous stop is 11 base pairs long. This gap is small and is conserved between other phages within the same subcluster AZ4 (VroomVroom, Emotion). There is no coding potential in this small gap. /note=The gap between the gene of interest`s stop site and next start site is 106 base pairs. This gap is relatively large, but it is also conserved between other phages within the same cluster (VroomVroom, Emotion). There is no coding potential demonstrated within this gap between the two genes. /note=Phamerator: pham: 120341 Date: 10/30/2023. The gene is conserved; found in VroomVroom (AZ) and Emotion (AZ) /note=Starterator: Start site 8 in Starterator was manually annotated in 99/101 non-draft genes in this pham. Start 8 is 7693 in JasmineDragon. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this gene is real and the most likely start site is 7693. /note=Function call: Major Tail Protein. The top two phagesDB BlastP hits have the function of major tail protein (E-value < 10^-101), and the top 2 NCBI BLAST hits also have been assigned the function of major tail protein( +96% coverage, +70% identity, and e-value < 1e-104). HHpred had a hit for major tail protein with 98.4% probability, 90.76% coverage, and E-value of 0.000036. CDD had no valid hits. /note=Transmembrane Domains: DeepTMHMM does not predict any transmembrane domains for /note=this gene; therefore, it is not a membrane protein. /note=Secondary Annotator name: Ryan, Kaitlin /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered and agree with the annotation. I noticed that the Starterator menu and GM coding capacity menu have not been filled out yet. // All the evidence supports the function call and annotation. I agree with the evidence; good job! (11/16/23) CDS 8354 - 8620 /gene="13" /product="gp13" /function="tail assembly chaperone" /locus tag="JasmineDragon_13" /note=Original Glimmer call @bp 8354 has strength 14.75; Genemark calls start at 8354 /note=SSC: 8354-8620 CP: yes SCS: both ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 6.14824E-47 GAP: 106 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.314, -2.0111200136961407, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage VroomVroom]],,WIC90163,92.0455,6.14824E-47 SIF-HHPRED: SIF-Syn: tail assembly chaperone; upstream gene is major tail protein and no downstream gene available, but the upstream gene is conserved in VroomVroom and Emotion. /note=Primary Annotator Name: Daniel, Mila /note=Auto-annotation: Glimmer and GeneMark both call the start at 8354. /note=Coding Potential: Coding potential is on the forward strand, and there is good coding potential found on both GeneMark Self and Host as indicated by the solid blocked out portion between START and STOP sites. /note=SD (Final) Score: -2.011. It is the best final score on PECAAN. /note=Gap/overlap: Gap is 106 bp. This is relatively large, but the gap is also conserved in other complete phage genomes (VroomVrooom & Emotion), and there is not evidence of coding potential in this gap so a new gene is not recommended. /note=Phamerator: Pham 120445. Dare 10/31/2023. It is conserved; found in both Eraser (AZ) and Lego (AZ). /note=Starterator: Start site 6 in Starterator was manually annotated in 48/53 non-draft genes in this pham. Start 6 is 8354 in JasmineDragon. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 8354. /note=Function call: tail assembly chaperone; the top two phagesdb BLAST hits have the function of tail assembly chaperon (E-value <10^-38), and the top 2 NCBI BLAST hits also have the function of tail assembly chaperon (100% coverage, 82%+ identity, and E-value <10^-47). HHpred (on Pfam) had a hit for phage tail assembly chaperone with 94.49% probability, 79.5455% coverage, and E-value of 1.2 (despite this high e-value, this data supports other evidence for this function). CDD had no relevant hits. /note=Secondary Annotator Name: Indiresan, Neeti /note=Secondary Annotator QC: I agree with this annotation and chosen start site. All the evidence categories have been considered. I agree with the functional call and believe that all evidence has been taken into account. The boxes have been filled and the evidence checked for the blasts supports the functional call. CDS join(8354..8611,8611..8961) /gene="14" /product="gp14" /function="tail assembly chaperone" /locus tag="JasmineDragon_14" /note= /note=SSC: 8354-8961 CP: no SCS: neither ST: SS BLAST-Start: [tail assembly chaperone [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 5.11497E-133 GAP: -267 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.314, -2.0111200136961407, yes F: tail assembly chaperone SIF-BLAST: ,,[tail assembly chaperone [Arthrobacter phage VroomVroom]],,WIC90164,96.0396,5.11497E-133 SIF-HHPRED: SIF-Syn: The gene directly upstream is a tail assembly gene and overlaps with this gene. The gene downstream is a tape measure gene. There is good synteny observed. The genes further upstream are all related to tail function solidifying the functional call of this gene. This comparison is made based on the final phages VroomVroom and Emotion, both of which demonstrated the same trends. /note=Gene 15 (Stop@8961 F) /note= /note=Primary Annotator Name: Labib, Youstina /note=Auto-annotation: Glimmer identifies start at 8689 (no GeneMark start given). The start site was modified to 8354 due to the presence of a program translational frameshift mutation. /note=Coding Potential: There is good coding potential as demonstrated by the graph in the first open reading frame /note=SD (Final) Score: -2.011 The SD score listed is the only one listed for the gene candidates and is the least negative compared to prior options. /note=Gap/overlap: 267 bp overlap with gene upstream, no overlap with the next downstream gene. The genes upstream are annotated as tail assembly chaperones genes which gives us insight on the potential function of this gene. The instructional team modified the start site of this gene as it needed to be moved upward essentially filling the gap that was previously present. /note=Phamerator: Pham: 107294, observed on 10/31/2023, it is conserved. This pham has five members and all 5 of them are drafts. The pham that my gene belongs to is present in other members of the cluster/subcluster in which my phage belongs to. Phages that I used for comparison include MiniMommy and MissSwiss. These are both within the subcluster AZ as is my assigned gene. /note=Starterator: 10/28/23, The start site that was called most often in published annotations was start 12, shown in 3 of 5. There were no non-draft phage annotations that called it. The auto-annotated start number is 5 and the start position is 8689. The start site was modified (11/13/23) due to a program translational frameshift mutation. /note=Location call: Based on the above evidence, this is a real gene and the start site is not likely 8689. It will need to be changed using DNAMaster to start site 8354 as discussed with the instructional staff. /note=Starterator Drop-Down Menu: Based on all the evidence I have collected so far, I do not agree with the auto annotated start site. After discussion with the instructional, it appears that the start site may need to be extended upstream to position 8354. We have discussed that we may need to go into DNA Master to manually change the start site of the gene based on the number of base pairs displayed relative to the final comparison. I would propose that the new start site be 8354, which is the start site of the gene upstream. /note=Coding Potential Drop-Down: The gene has reasonable coding potential predicted within the putative ORF. The chosen start site does not include all coding potential. /note=DNA Master. /note=Function call: Tail assembly chaperone - based on the corresponding pham maps and BLAST data. The HHpred and CDD data supports this function call as a couple of the top hits with the greatest %coverage and probability are also tail assembly chaperones. /note=Transmembrane domains: There are no transmembrane domains. The graphs indicate all signals are inside indicating that there are no significant TMDs and this gene does not have transmembrane significance. /note= /note=Secondary Annotator Name: Potter, Sofia /note=Secondary Annotator QC: I see nothing wrong with the work you`ve done, although there is some disparity with the pham number between 10/31 and today (11/3), and it looks like Starterator is being updated to reflect new info since I couldn`t access it. It looks like you`re working out the confusing parts (which I also found confusing) with the instructional staff. My version of the PECAAN notes are below. Second QC 11/17/23: The lack of conclusive results for CDD and HHPred on this gene make it a little iffier to assign a function to this gene, and the top BLAST hits only having alignment scores as high as 50-80 is not helpful on that front either. However I do agree that what evidence there is points to this being a tail assembly chaperone as the primary annotator concluded. There is strong synteny with other AZ phages, and the fact that the surrounding genes are also being classified as tail assembly related is strong evidence as well. /note= /note=Auto-annotation: Glimmer calls start at 8689, GeneMark does not call a start. /note=Coding Potential: I see good coding potential in the forward direction at a corresponding start and stop site in the 8689-8961 region on GeneMark. /note=SD (Final) Score: Of the two possible start sites given, the start at 8689 does have a worse/more negative final score by about -1, but the z-score is only at an acceptable level (above 2) in the selected start site (2.379 vs. 1.946). /note=Gap/overlap: 68 bp (gap) /note=Phamerator: As of November 3, 2023, this gene is in pham 123103. There is no synteny with this gene and any gene in VroomVroom or Emotion, the only AZ4 non-draft phages. Quickly surveying other AZ cluster phages, I did not find any synteny either. /note=Starterator: The Starterator report is not currently available (11/3/23), possibly because it is being updated. /note=Location call: Neither of the auto-generated possible start sites on PECAAN appear to be excellent options. This combined with a lack of synteny evidence might mean that there’s a separate explanation for this gene, even though the coding potential appears to be there on GeneMark. CDS 8971 - 11214 /gene="15" /product="gp15" /function="tape measure protein" /locus tag="JasmineDragon_15" /note=Original Glimmer call @bp 8971 has strength 13.33; Genemark calls start at 8971 /note=SSC: 8971-11214 CP: yes SCS: both ST: SS BLAST-Start: [tape measure protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.139, -4.325675514574479, no F: tape measure protein SIF-BLAST: ,,[tape measure protein [Arthrobacter phage VroomVroom]],,WIC90165,94.5114,0.0 SIF-HHPRED: Tape Measure Protein, gp57; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_CF,12.5837,99.9 SIF-Syn: Tape measure protein, upstream gene is tail assembly chaperone, downstream is minor tail protein, just like in phage VroomVroom and Emotion (AZ4). /note=Primary Annotator Name: Carnes, Julianne /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 8,971. /note=Coding Potential: Coding potential is found both in GeneMark Self and Host. Forward strand coding potential is primarily indicated by the ORF. /note=SD (Final) Score: -4.326, this is the least negative final score. The Z-score is also above 2. /note=Gap/overlap: 9 base pairs: Gap is small and reasonable, and this gene is conserved across clusters and among Emotion and VroomVroom (AZ4). /note=Location call: Based on the evidence, it is likely that this is a real gene the start site of this gene is at 8791 base pairs. /note=Phamerator: Pham 121521. Date reported: 10/31/23. This start site is conserved, and it is found in both VroomVroom (AZ4) and Emotion (AZ4). It is also the most-annotated start site for this gene. /note=Starterator: Start site 2 in Starterator was manually annotated in 34/45 of the non-draft genomes, and was "Most Annotated". Start 2 is 8917 in JasmineDragon. This evidence also corresponds to the site predicted by both Glimmer and GeneMark. /note=Location call: Based on the evidence, the start site for this gene is @8971 and is a real gene. /note=Function call: Tape measure protein. Blastp suggests similar hosts and similar lengths for this protein. This protein is seen across multiple clusters with the same function as a tape measure protein. The first NCBI hit was from Arthrobacter phage VroomVroom for a tape measure protein that had an e-value of 0.0 and 90.90% identity. A CDD hit TMP_3 pfam20155 has an e-Value of 1.61e-27 and excellent sequence alignment for tape measure protein. An HHpred PDB hit of 6V8I_CF showed a probability of 99.83% and an e-value of 8.8e-12 with a 78% coverage for tape measure protein. /note=Transmembrane domains: There is evidence of 1 transmembrane domain. This makes sense within the context of the function call as the tape measure protein. /note=Secondary Annotator Name: Kibria, Kamille /note=Secondary Annotator QC: Agree that the gene is real and with the start site. PhagesDB Blast and NCBI pBLAST generated hits within the same cluster (VroomVroom and Emotion) with high scores (between 700-1000) and e value of 0. Be sure to explicitly list the function generated by the hits in different blast systems. CDD generated multiple significant hits that also suggest it is a tape measure protein. Multiple significant PDB hits generated in HHPred, Probabilities for significant hits have a probability of about 99% however the coverage values are generally under 50% (which is the threshold). This may be something to further investigate. Also in your AnnoNB, you are missing the coverage value for 6V8I_CF as it may have been missing at time of your investigation but it is in PECAAN now. Really good evidence that this gene is a tape measure protein. Good evidence for TM domain in TMHMM. Be sure to include screenshot of THMHMM from PECAAN. CDS 11227 - 12087 /gene="16" /product="gp16" /function="minor tail protein" /locus tag="JasmineDragon_16" /note=Original Glimmer call @bp 11227 has strength 13.55; Genemark calls start at 11227 /note=SSC: 11227-12087 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 12 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.303, -1.953940808934884, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage VroomVroom]],,WIC90166,95.8042,0.0 SIF-HHPRED: Distal Tail Protein, gp58; phage tail, tail tip, tape measure protein, VIRAL PROTEIN; 3.7A {Staphylococcus virus 80alpha},,,6V8I_BD,98.951,100.0 SIF-Syn: Upstream of this gene is a tape measure protein, and downstream of this gene are more minor tail proteins. This group of genes tend to be group together which provides further evidence for the function call. This trend can be observed in phages VroomVroom and Emotion (AZ4) which are apart of the same cluster as JasmineDragon, and exhibit synteny for this particular gene. /note=Primary Annotator Name: Chamorro, Marco /note=Auto-annotation: Glimmer and GeneMark both suggest 11227 as the start site. /note=Coding Potential: Coding potential is on the forward strand. GeneMarkS and Host suggest strong coding potential throughout the gene. /note=SD (Final) Score: The SD score is -1.954 which is strong. /note=Gap/overlap: There is no overlap, but there is a gap of 14 nucleotides from the start site and the previous stop site. This is a reasonable gap. /note=Phamerator: As of November 1st, this gene is part of Pham 121425, which is conserved in Emotion (AZ4) /note=Starterator: The start site number 8 was the most annotated, and it was called in 33 of the 81 non-draft genes. This evidence disagrees with the start site called by glimmer and genemark (10) /note=Location call: Based on the above evidence, this is a real forward gene and the most likely start site is 11227. /note=Function Call: The predicted function is minor tail protein because of the multiple hits on PhagesDB BLASTp with VroomVroom (e-value = e^-152) and Emotion (e-value = e-119), both of which belong to the minor tail protein class of genes. NCBI Hit predicts the same results but with e-values of 0 and 2e-135 respectively. HHpred alignments indicate that this protein could be a 80a baseplate protein, a protein involved in the folding of the tail tube. PhagesDB BLASTp, NCBI Hits, and HHpred collectively agree that the gene translates to a minor tail protein. /note=Transmembrane Domains: According to the DeepTMHMM this protein has no transmembrane domains and it is not a transmembrane protein. /note= /note=Secondary Annotator Name: Labib, Youstina /note=Secondary Annotator QC: /note=Auto-annotation: Glimmer and GeneMark both have the same suggested call site at position 11227 bp. /note=Coding Potential: There is strong coding potential as demonstrated in both GeneMark and Host. The is positive signal observed within the first ORF of the forward gene indicating good coding potential and that this gene appears real. /note=SD (Final) Score: The SD score of -1.954 is good. Again, the objective is to have the least negative value and hence from the other SD Scores, this score of -1.954 is the least negative and hence the best match. The z-score is 3.303 which is the highest of the values from the other options and is strong considering it is over the idea threshold of 2. /note=Gap/Overlap: There is a gap of 12 basepairs but this is considered to be minor and won`t need an insertion to be filled. /note=Phamerator: This gene is a member of pham number 121425 which consists of 104 other members within this same phamily. There are a variety of genes conposiing of the same cluster of AZ, but there are many other clusters as well listed. Other gene drafts within the same cluster include Asa16 and Berrie. /note=Starterator: 11/3/2023 The start site that was called most often was start site 8 which was called in 33 of 81 non-draft genes. This suggests the start site to be 11227. /note=Location call: Based on the evidence, this appears to be a real gene and the suggested start site is 11227 bp. /note=Notes: Select from starterator menu, select the All GM Coding Capacity, finalize location call, and review gap/overlap inconsistencies. /note=Function call: The original function call appears correct. I agree with the function call of minor tail protein as supported by the Phagesdb function frequency, Phagesdb Blast, NCBI Blast, and HHpred calls. /note=Transmembrane domains: Deep TMHMM does not provide evidence of TMSs thus we can confirm that this is not a membrane protein. /note=Note: 11/16/23 The location and function call are both correct and there is no evidence of TMDs thus validating the functional call. CDS 12097 - 13074 /gene="17" /product="gp17" /function="minor tail protein" /locus tag="JasmineDragon_17" /note=Original Glimmer call @bp 12097 has strength 12.38; Genemark calls start at 12097 /note=SSC: 12097-13074 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.314, -2.0111200136961407, yes F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage VroomVroom]],,WIC90167,98.1538,0.0 SIF-HHPRED: Receptor Binding Protein; beta sandwich domain, phage receptor binding protein, Lactococcus lactis pellicle cell wall polyphosphosaccharide, VIRAL PROTEIN; 1.75A {Lactococcus phage 1358},,,4L9B_A,50.4615,98.8 SIF-Syn: This gene has synteny because in comparison to VroomVroom, another phage in the AZ4 cluster, both phages have the minor tail protein gene in the same location. They are also located adjacent to other minor tail proteins, which supports their function call. Both phages also have the tape measure protein directly before the minor tail protein, which also supports a similar order of genes. /note=PECAAN Notes /note=Primary Annotator Name: Mahadev, Anirudh /note=Auto-annotation: Both Glimmer and GeneMark auto-annotated this gene. They both agree on the forward start site at 12097. An ATG Start codon was called. /note=Coding Potential: The gene has reasonable coding potential between the putative ORF, and the chosen start site does cover the coding potential. Two other start codons appear to lie within the gene, but neither are chosen by Glimmer or GeneMark. /note=SD (Final) Score: The Z score of the called start site is 3.314, which is much better than the other five options. This appears to suggest a credible ribosome binding site and the right starting point of the gene. /note=Gap/overlap: The gap is 9 base pairs, which seems reasonable. The other options have gaps that are far longer, some over 500 base pairs in length. The gene is also considerably long, at a length of 978 base pairs. /note=Phamerator: The gene is found in Pham 122021 on 10/31/2023. This Pham is highly conserved and was compared to VroomVroom and Emotion, both of which belong to the same Pham. PhagesDB Blast stated that the function is unknown and I did not see a function on the Phamerator assigned to the gene. However, the gene appears to be a minor tail protein among other phages in the Pham, so I suspect it has a similar function in JasmineDragon. /note=Starterator: The start site at 12170 seems most reasonable because it is conserved across all six members of the Pham. It is Start Site 1, and the coordinates are (1, 12170). In the Pham, 6/6 call Start Site 1. The Starterator is informative, as the three Final annotations agree on the same start site, as well as the three draft annotations. /note=Location call: Taken together, the gathered evidence suggests that this is a real gene. The agreement on the start site, good coding potential and synteny all support that conclusion, and the start site at 12097 seems the most likely. /note=Function call: Predicted function is minor tail protein because of multiple hits on the PhagesDB Blast with other phages in the same cluster, such as Vroom Vroom (e value 1e-179, 94.77% identity) and Emotion ( e-value: 1-166, 87.1% identity). In addition, HHpred shows protein sequence alignments that match the SEA-PHAGES requirements for this gene, such as a glycine rich domain. While HHpred and CDD did not specifically mention it as a minor tail protein, they called this gene as a large tegument protein and a receptor binding protein, which indirectly supports the function as the minor tail protein may also have these enzymatic activities. /note=Transmembrane domains: According to the DeepTMHMM this does not have any trans membrane domains and is not a transmembrane protein. /note= /note=Secondary Annotator Name: Kim, Abby /note=Secondary Annotator QC: I agree with this annotation. The gene’s start site at 12097 is called by both Glimmer and GeneMark and based on the Host-Trained GeneMark, it has good coding potential in the first reading frame, however, there are noticeable gaps in the other two reading frames. Synteny is observed with VroomVroom and Emotion in which is involved with the same Pham 122021, however, on the PECAAN notes there is a minor error for the SD (Final) Score in which the Z-score for the called start site is not -2.011 but 3.314. The most called start number in the published annotations was 1 at position 12170 which is the same as the “most annotated” start, further confirming that the start site for this gene is 12097. /note=Note: 11/17/23 I agree with the function call of minor tail protein which is supported by Phagesdb function frequency, Phagesdb Blast, NCBI Blast, HHpred calls, and CDD call. Additionally, there is no evidence of TMDs thus validating the functional call. /note=I have QC’ed this annotation and agree with this annotation. CDS 13074 - 14306 /gene="18" /product="gp18" /function="minor tail protein" /locus tag="JasmineDragon_18" /note=Original Glimmer call @bp 13074 has strength 15.15; Genemark calls start at 13074 /note=SSC: 13074-14306 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.236, -6.166667396375102, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage VroomVroom]],,WIC90168,98.2968,0.0 SIF-HHPRED: Tail protein, 43 kDa; tail protein, structural genomics, PSI, MCSG, Protein Structure Initiative, Midwest Center for Structural Genomics, UNKNOWN FUNCTION; 2.1A {Neisseria meningitidis MC58} SCOP: b.106.1.1,,,3D37_A,95.8537,99.6 SIF-Syn: There exhibits synteny with VroomVroom. Both genes on either side of this are minor tail proteins, which adds to the viability of the function call of minor tail protein. /note=Primary Annotator Name: Hosford, Ryan /note=Auto-annotation: Glimmer and GeneMark both agreed with the start site called at 13074. This is an ATG and the LORF /note=Coding Potential: Highly likely based on pham maps and both host trained and self trained models show the coding potential in the forward direction, the syntany it shares with the two other known AZ4 clustered phages add to the validity. Also, the gene has an overlap of -1, which could indicate an operon and this is the LORF which minimizes gaps. /note=SD (Final) Score: The final score is -6.167 which is far from the best but since the gap is -1 it seems to be within an operon and therefor the final score is not as relevant. /note=Gap/overlap: The Gap is -1 which is indicative of the start site being within an operon. with this also being the LORF it makes a strong case for being the rational start site. /note=Phamerator: The pham this gene is currently under is the AZ4 cluster pham 121434 on 10/30/23, this is a commonly annotated pham with 101 other genes annotated to date. I used VroomVroom, and Emotion when comparing the pham. The function was not explicitely called but for the notes of all the other genes in this pham, minor tail proteins was indicated. /note=Starterator: There is a reasonable start site conserved across multiple genes in this pham of 11 which corresponds to the autoannotated site of 13074. for the Most Annotated site of 14, 38 of the 78 non draft phages called this. /note=Location call: GeneMark and Glimmer both called the gene at the same start site of 13074. Which has a start codon of ATG, this with the gap of -1 meaning it could be within an operon, and having complete coding potential within start and stop codons I believe this site is the correct start site. Starterator also shows other genes in the pham have similar start sites and 2 MA indicate the site as the correct start /note=Function call: The function of this gene is a minor tail protein based off the evidence in BLASTp that shows many similar sequences in the same cluster AZ and in others. VroomVroom and Emotion appear to have the same sequence wiht E values of 0.0 and all evidence points to minor tail protein being the function.Looking at the HHpred, it shows data of comparable genes with high probability and low E values to support that the function is a minor tail protein. Also using the BLASTp and comparing it to VroomVroom and Emotion both which have a similar protein sequence, shows that there is some conservation of the gene and it has been mapped to be a minor tail protein. Using the collection of data along with the similarities from PhagesDB which shows many different clusters having a similar sequence, I felt confident calling the function that of a minor tail protein. /note=Transmembrane domains: No transmembrane domains present /note=Secondary Annotator Name: Bhattarai, Aryan /note=Secondary Annotator QC: I agree with the annotation and location calls made by the primary annotator. All of the evidence categories have been considered. I also agree with the function call made by the primary annotator. Note: annotator needs to correctly fill out the synteny for their gene. They need to mention the upstream and downstream gene’s function, as shown on the sample annotated notebook. Also for the HHpred hits, the primary annotator needs to give the job a name. CDS 14321 - 17332 /gene="19" /product="gp19" /function="minor tail protein" /locus tag="JasmineDragon_19" /note=Original Glimmer call @bp 14321 has strength 18.0; Genemark calls start at 14321 /note=SSC: 14321-17332 CP: yes SCS: both ST: SS BLAST-Start: [minor tail protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.364, -3.9301544432541915, no F: minor tail protein SIF-BLAST: ,,[minor tail protein [Arthrobacter phage VroomVroom]],,WIC90169,96.1117,0.0 SIF-HHPRED: SIF-Syn: minor tail protein, upstream gene is minor tail protein and downstream has no function call, just like in phage Emotion and VroomVroom. /note=Primary Annotator Name: Indiresan, Neeti /note=Auto-annotation: Both Glimmer and Genemark, start site 14312, start codon ATG /note=Coding Potential: The gene shows coding potential in the forward direction according to both host and self. The chosen start site covers this coding potential. /note=SD (Final) Score: -3.930. It is the second best RBS Final Score on PECAAN, but the best is very close, at -3.703. /note=Gap/overlap: 14 bp, which is not too large, contains no coding potential, and is similar to the corresponding genes in other phages in the cluster. /note=Phamerator: pham: 122052. Date 11/01/2023. It is conserved; found in VroomVroom (AZ) and Emotion (AZ). /note=Starterator: Start site 2 in Starterator was manually annotated in 2/3 non-draft genes in this pham. Start 2 is 14321 in JasmineDragon. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence shown above, this is likely a real gene and the most reasonable start site is 14312. /note=Function call: Minor tail protein. The top phagesDB hits all have the function of minor tail protein (e-values of 0), and the top 2 NCBI hits are also minor tail proteins (e-values 0, coverage 100% and 56.7298%, and identity 91.326% and 34.5794%). CDD had no hits and HHPred had no informative hits. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Daniel, Mila /note=Secondary Annotator QC: I agree with this annotation. All of these evidence categories have been considered, and the primary annotator made sure to fill out the two drop-down menus. However, note that you only need to mark 2-3 pieces of evidence from each category. Furthermore, make sure to include information about phagesDB BLAST e-values, and the NCBI BLAST coverage, identity, and e-values. CDS 17408 - 17749 /gene="20" /product="gp20" /function="hypothetical protein" /locus tag="JasmineDragon_20" /note=Original Glimmer call @bp 17408 has strength 12.58; Genemark calls start at 17444 /note=SSC: 17408-17749 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_20 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 4.5474E-67 GAP: 75 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.808, -3.4897410997597818, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_20 [Arthrobacter phage VroomVroom]],,WIC90170,95.5752,4.5474E-67 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kalliomaa, Kira /note=Auto-annotation: Glimmer and GeneMark. Glimmer calls start at 17408. GeneMark calls start at 17444. /note=Coding Potential: Coding potential in this open reading frame (ORF) is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.490. This is the best final score on PECAAN. /note=Gap/overlap: 75 bp. This is a somewhat large gap (larger than the 50bp limit) but in the end is reasonable due to other phages (VroomVroom and Emotion) conserving the gap. There is also no coding potential in the gap that could indicate a new gene. /note=Phamerator: pham: 2534. Date 11/04/2023. It is conserved; found in VroomVroom (AZ). /note=Starterator: Start site 12 in Starterator was manually annotated in 26/31 non-draft genes in this pham (2534). Start 12 is 17408 in JasmineDragon. This evidence agrees with the site predicted by Glimmer. /note=Location call: Based on the prior evidence, this is a real gene with a start site of 17408. /note=Function call: NKF /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Woodward, Lauren. /note=Secondary Annotator QC: I agree with the called start site, based on the evidence listed above. I agree with the function call. CDS 17763 - 18488 /gene="21" /product="gp21" /function="endolysin" /locus tag="JasmineDragon_21" /note=Original Glimmer call @bp 17763 has strength 13.96; Genemark calls start at 17763 /note=SSC: 17763-18488 CP: yes SCS: both ST: SS BLAST-Start: [endolysin [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 3.89898E-110 GAP: 13 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.314, -2.2821867859826788, yes F: endolysin SIF-BLAST: ,,[endolysin [Arthrobacter phage VroomVroom]],,WIC90171,89.2562,3.89898E-110 SIF-HHPRED: Endolysin; amidase-2 domain, HYDROLASE; HET: ZN; 2.27A {Staphylococcus phage GH15},,,4OLS_B,76.3485,99.2 SIF-Syn: Endolysin, upstream gene is NKF in Pham 2534, downstream gene is membrane protein in Pham 124430, just like in non-draft fellow Cluster AZ phage VroomVroom. /note=PECAAN Notes /note=Primary Annotator Name: Potter, Sofia /note=Auto-annotation: Glimmer and GeneMark, both call the start at 17763. /note=Coding Potential: In the ORF, coding potential is in the forward strand only, indicating this is a forward gene. Coding potential found in both GeneMark Self and Host. /note=SD (Final) Score: -2.282, which is below the acceptable threshold of -2 /note=Gap/overlap: 13 bp (gap), a gap of expected length for a sequence of forward genes. Does not indicate an operon /note=Phamerator: As of November 1, 2023, the pham number is 89745. This gene is conserved in VroomVroom and Emotion, the only two non-draft phages also in AZ4. The function call is an endolysin. /note=Starterator: There are only two non-draft phages in the same cluster, both of which call start site number 2, corresponding to 17763 bp in JasmineDragon. /note=Location call: Based on the given evidence, this is a real gene with a likely start site of 17763. Starterator, Glimmer, and GeneMark all agree on this start site. /note=Function call: Based on the PhagesDB and NCBI BLASTp results being very closely aligned with each other, with very high matches to known function in Arthrobacter phages, I believe that there is enough data to hypothesize that the function of this gene is an endolysin. There are no explicit requirements on the Approved Functions List for marking a gene function as an endolysin. All significant hits I have reviewed from BLASTp, HHPred, and CDD have had functions consistent with lysins, and in VroomVroom and Emotion there is significant synteny between this gene and the genes marked as endolysins in their genomes. /note=Transmembrane domains: DeepTMHMM did not predict any transmembrane domains, so we can conclude that this is not a membrane protein. /note=Secondary Annotator Name: Claire, Monjov /note=Secondary Annotator QC: Both Glimmer and GeneMark call the start site at 17763. For the ORF in the Host-Trained GeneMark tab and GeneMarkS, there is super high coding potential throughout the gene and the start site is right outside of that coding potential. This is a very good sign. The final score is -2.282 is relatively high compared to the other scores. This is the most positive value on PECAAN, and it does have a z-score of 3.314 which is the highest one. Since the final score is the most postive and the z-score is relatively high, I would consider this a good indicator of the gene being real and starting at 17763. There is a 14 base pair gap between this gene and the one upstream (primary annotator wrote 13 but it’s not a huge difference, maybe just a math error). The length of the gene is 726 base pairs which is an acceptable range as well. The pham is 89745(11/1/2023) and there are 4 total members, 2 of which are drafts. Both of the non-draft phages are in the same cluster (AZ) and they both call the start site 2, corresponding to a base pair start of 17763 for JasmineDragon. This gene was manually annotated 2 times and was called 100% of the time when present. From what I see, I think the gene start site is 2 and it is at 17763. All of the evidence like the high coding potential, manual annotations, auto-annotation, and more point to this being a real gene and the start site being 17763. For function, there is strong evidence to state that this gene codes for endolysin. Endolysin are hydrolases that degrade the peptidoglycan layers on the host bacterium at the end of the lytic cycle. When looking at PhagesDB, the function endolysin is called by both Emotion and VroomVroom, 2 phages that we’ve seen constantly and are closely related to JasmineDragon. On NCBI BLAST, we saw very similar results agreeing that this gene is an endolysin gene. Phage VroomVroom even had 100 percent coverage making this a great indicator of the function. When looking at the google sheet for the approved function, endolysin is an approved function and also has no prerequisites that needed to be checked. Even though this is a gene that codes for the lysis at the inside of the host cell, HHpred did not call this gene a transmembrane protein so it doens’t cross the membrane barrier. This makes sense because it only needs to break the peptidoglycan layer from the inside. I agree with the primary annotators work and agree that this is the suggested start. I would also suggest that they should select “yes” to the All GM Coding Capacity which is unselected currently because of the coding potential map and how the gene was more than 80% covered by coding potential in the forward direction. CDS 18506 - 18772 /gene="22" /product="gp22" /function="membrane protein" /locus tag="JasmineDragon_22" /note=Original Glimmer call @bp 18506 has strength 10.81; Genemark calls start at 18491 /note=SSC: 18506-18772 CP: yes SCS: both-gl ST: SS BLAST-Start: [membrane protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s6 100.0% 1.13455E-49 GAP: 17 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.134, -4.687276900021401, no F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage VroomVroom]],,WIC90172,89.3617,1.13455E-49 SIF-HHPRED: SIF-Syn: Synteny with VroomVroom and Emotion exists at this gene. Based on information from NCBI, VroomVrom has an identity of 90.91%, 100% coverage, and e-value of 1e-49. Emotion has an identity of 82.95%, 100% coverage, and an e-value on 2e-43. These are all significant and provide evidence the gene on JasmineDragon is also a membrane protein. This synteny also exists downstream of this gene (gene 24), where both VroomVroom and Emotion also call a membrane protein for it. /note=Primary Annotator Name: Salinas Pinon, Juan Carlos /note=Auto-annotation start source: Glimmer & GeneMark disagree on this call at start site 18,506 and 18491 respectively; however this is not the LORF. The Glimmer call has the highest final score out of the two at -4.687, while the GeneMark call has a final score of -5.576. The start codon is ATG, which is preferred due to its commonality, but GTG is also a favorable option. The length of the Glimmer call is 267bp long; however, there are longer options listed on PECAAN (GeneMark). The GeneMark calls a gene that is 287bp long. /note=Coding Potential: This has a high coding potential in the forward direction only. There is no coding potential in the complimentary direction, suggesting that there are no overlapping reverse genes. The coding potential covers both of the auto-annotated calls. /note=SD (Final) Score: -4.687. It is not the best score on PECAAN. The best final score is -3.486; however this start site is not conserved in any other phage, suggesting unfavorable evidence for it. /note=Gap/overlap: 17. This gap is fairly small. No other starting site would close this gap as there is no coding potential in this area that would be indicative of a new gene. The other call has a gap of 2 which is an odd and unfavorable gap. /note=Phamerator: pham:121560 as of 10/28/2023. It is conserved in AZ4 cluster, including phages such as Emotion and Aegle. There are 54 other members; 37 of these are non-drafts. /note=Starterator: The most called start site is site 12 as of 11/03/2023. Given the evidence and its conservation 2 other non-draft genes, site 21 is a better option as it has been manually annotated twice as of 11/03/2023. Starterator agrees with the Glimmer start call at 18506. /note=Location call: Glimmer @ 18506. This start site is the most likely call site based on the evidence above. /note=Functional call: Membrane protein. The hits from NCBI have low e-values (1e-49, 1e-43) & high identity percentages (87-90%) suggesting that the function may be membrane protein. However, phagesDB does not agree with this, and its best hits, with very high e-values, suggest that the protein is a tail protein. Suggestions by phagesGB BLAST will not be considered. The strongest evidence of a function is from NCBI and I would call this a membrane protein. Additionally, TMHMM calls 1 transmembrane domain, providing additional evidence that this is a membrane protein. /note=Transmembrane domains: There is one transmembrane domain as suggested by TMHMM. The length is , which is within the favorable range. This furthers the evidence that suggests that this is a membrane protein. /note=Secondary Annotator Name: Kang, Alix Kang /note=Secondary Annotator QC: Please select the Starterator report (I would select NI) and coding potential (I would select YES). I would choose the Glimmer start site 18506 over 18491 because start site 18491 has an odd gap of 2. Therefore, the gap would be 17, and SD score would be -4.687. Also, I do not think the most manually annotated start site is 14. The start number called the most often in the published annotations is 12, it was called in 10 of the 37 non-draft genes in the pham. Start site 18 in Starterator has no manual annotations in non-draft members. 2 other draft members call start site 18, which correlates to a start site of 18506 bp for JasmineDragon. /note= /note=Please review the “Starterator.” I noted the following: Start site 21 in Starterator was manually annotated in 2 non-draft genes in this pham. Start 21 is 18506 for JasmineDragon. This evidence agrees with the site predicted by Glimmer. Please review “Location call.” Considering all of the evidence above, this gene is a real gene and has a start site at 18506 bp. Starterator agrees with Glimmer (not GeneMark as first annotator has noted). Additionally, the secondary annotator may want to follow the given format for "Synteny," "Membrane protein, upstream gene is endolysin, downstream is membrane protein, just like in phages Emotion and VroomVroom". DeepTMHMM predicts 1 TMD. I agree with the first annotator that this gene can be assumed to have a real TMD and is therefore a “membrane protein.” CDS 18773 - 19057 /gene="23" /product="gp23" /function="membrane protein" /locus tag="JasmineDragon_23" /note=Original Glimmer call @bp 18773 has strength 14.38; Genemark calls start at 18773 /note=SSC: 18773-19057 CP: yes SCS: both ST: SS BLAST-Start: [membrane protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 2.85016E-52 GAP: 0 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.957, -4.7784408226142965, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage VroomVroom]],,WIC90173,97.8723,2.85016E-52 SIF-HHPRED: SIF-Syn: Phage such as VroomVroom and Emotion contain this gene in the same neighborhood. NCBI BLAST shows VroomVroom has 94.6809% identity with 100% coverage and e-value 2.85016e-52 which makes it significant. The pham number of the similar gene is also 122805 and the gene upstream compared to JasmineDragon`s upstream gene 22 both have pham number 121560. This upstream JasmineDragon gene (23) is specified by NCBI BLAST to be a membrane protein, which is also expressed in the related VroomVroom and Emotion genes. NCBI BLAST shows Emotion has 82.9545% identity with 100% coverage and e-value 3.2932e-38 which also makes it significant. The pham number of the similar gene is not the same but 121560 is the pham number for JasmineDragon`s upstream gene 22 and the upstream gene in Emotion. NCBI BLAST also shows that downstream gene 25 has NKF in VroomVroom. These are two phage that have been consistently used as evidence. /note=Primary Annotator Name: Zaragoza, Evelin /note=Auto-annotation: Both Glimmer and Genemark agree that the start site is at 18773 /note=Coding Potential: There is no coding potential overlap with surrounding genes; the suggested start site has all of the coding potential and further evidence, such as synteny with VroomVroom, suggests that this is the start site. There is no coding potential on the complimentary strand (in the reverse direction). /note=SD (Final) Score: -4.778 is the final score and is the highest compared to the other value; therefore it is reasonable to assume this start site. /note=Gap/overlap: There is no gap upstream. The gap downstream (12 bp) is too small to be of any concern. This is the longest reasonable ORF because there is no coding potential to be filled. Synteny with VroomVroom also suggests that there needs to be another gene added. This gene appears to be part of an operon because the preferred start site has a favorable RBS score and a gap of 0 bp. /note=Phamerator: The pham number is 121564 as of 10/28/2023. It is conserved in phage such as VroomVroom and Emotion; similar genes include Ascela_23, Asa16_24, and Amyev_23 which are members of the AZ cluster and of the pham but are not part of the same subcluster; all are non-draft genes. /note=Starterator: Start site 7 in Starterator was manually annotated in 26/36 non-draft genes in this pham. Start 7 is not present for JasmineDragon. This evidence suggests that the site predicted by Glimmer and GeneMark is still the best option. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 18773.The gene does not need to be deleted because the coding potential proves that there needs to be a gene there and the start site of this particular gene has all of the coding potential. /note=Function call: membrane protein. The top phagesdb BLAST hits have no known function (E-value <10^-3) but NCBI BLAST gives several significant hits with genes that are membrane proteins (e-value less than 10^-3, coverage over 70%). HHpred had no significant hits (e-values were in the whole numbers) and CDD had no hits at all. Two TMDs have been identified, one being 19 aa long and the other 21 aa long, which further supports this hypothesis. /note=Transmembrane domains: DeepTMHMM predicts two TMDs. One is 19 aa long and the other 21 aa long. Based on this evidence this gene can be assumed to have a real TMD and is, therefore, a “membrane protein”, which is in line with our previous call. /note= /note=Secondary Annotator Name: Salinas Pinon, Juan Carlos /note=Secondary Annotator QC: /note= /note=Auto-annotation: Glimmer and GeneMark agree and both call at 18773 /note=Coding Potential: There is coding potential in the are of interest, and both upstream and downstream of this start site.Primary annotator states that there is overlapping coding potential with other genes but that is not observed. Coding potential only exists in the forward direction, suggesting a forward gene. There is no coding potential in the complimentary side. /note=SD (Final) Score: -4.778. This is the highest final score on PECAAN. This is supporting evidence for the called start site. /note=Gap/overlap: The gap upstream of the gene is 0. Primary annotator starts there is a gap downstream of the gene of 384bp; however, the gene downstream only has a gap of 11bp. Regardless, this does not change the strong evidence supporting the call made by the annotator. /note=Phamerator: pham: 121564. This pham is highly conserved within the AZ cluster (54 members), in pages such as DrManhattan, Emotion and BaileyBlu. /note=Starterator: Although this start site is not the most annotated, it has been manually annotated twice. The most annotated start site is not called in this gene. /note=Location call: I agree with the primary annotator and would call this gene at start site 18773. /note=Function call: Membrane Protein. I agree with the primary annotator as HHPRED calls this function as a membrane protein on two occasions: Emotion and VroomVroom. Both of these phages show synteny with JasmineDragon, and have low e-values ( 2.85016e-52 VroomVroom, 3.2932e-38 Emotion) and high indentity percentages of higher than 80%. /note=Transmembrane domains: 2. After running DeepTMHMM, 2 transmembrane domains were called. The length of these are 19 and 22. Hence this leads me to agree with the primary annotator and call this a membrane protein. CDS complement (19109 - 19339) /gene="24" /product="gp24" /function="membrane protein" /locus tag="JasmineDragon_24" /note=Original Glimmer call @bp 19339 has strength 4.19; Genemark calls start at 19339 /note=SSC: 19339-19109 CP: no SCS: both ST: NI BLAST-Start: GAP: 101 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.948, -5.367638368103365, no F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=AF: Tricky call. We examined this area in VroomVroom and determined that VV should have a reverse gene in this location, and I think the same is true here. The HHpred hits are better and match to a membrane protein, and there is one TMD predicted. Good coding potential. F gene has less supporting evidence. /note= /note=Primary Annotator Name: Kibria, Kamille /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 19339. /note=Coding Potential: Both host trained and self trained GeneMark have reasonable coding potential in the ORF. The self trained GeneMark does not include all of the coding potential within the chosen start site but the host trained GeneMark chosen start site does. /note=SD (Final) Score: -5.368. This is not the best score on PECAAN. However, the start site with the best final score of -3.306 has a start codon of TTG which is used very rarely in bacteriophages. /note=Gap/overlap: 101. This is larger than expected, however the PhamMap shows that my gene of interest is in a cluster of genes transcribed in the forward direction, so a sizable gap (>50) bp is required for transcription promoters. /note=Phamerator: /note=Starterator: /note=Location call: /note=Function call: /note=Transmembrane domains: /note=Secondary Annotator Name: Carnes, Julianne /note=Secondary Annotator QC: CDS 19441 - 19806 /gene="25" /product="gp25" /function="hypothetical protein" /locus tag="JasmineDragon_25" /note=Original Glimmer call @bp 19441 has strength 18.17; Genemark calls start at 19441 /note=SSC: 19441-19806 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein PQD70_gp080 [Brevibacterium phage Cantare] ],,NCBI, q2:s12 99.1736% 8.53441E-40 GAP: 101 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.063, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein PQD70_gp080 [Brevibacterium phage Cantare] ],,YP_010676655,71.3178,8.53441E-40 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: TRAN, MICHELLE /note=Auto-annotation start source: Both Glimmer and GeneMark designate the start site for this gene as 19441 (start codon: ATG). /note=Coding Potential: The strongest coding potential in the ORF is on the forward strand, which supports this gene being a forward gene. This is consistent across both GeneMark Self and Host. The chosen start site also includes all of the coding potential. /note=SD (Final) Score: The final score proposed for the mentioned start site on PECAAN is -2.443, which is the best value available. /note=Gap/overlap: As of November 15, 2023, there is a 39-bp gap between this gene and the gene upstream. (The gap used to be 102 bp, then was expanded to 383 bp when the gene upstream was deleted. The initial upstream gene has since been replaced with a forward counterpart, therefore reducing the gap to 39 bp.) This gap is reasonable because it is much too small for another gene to be fitted into it. /note=Phamerator: As of November 15, 2023, this gene is part of pham 6677. This is not conserved in any non-draft phages within JasmineDragon’s subcluster (AZ4), but it is conserved in MiniMommy. /note=Starterator: The Starterator results appear to be uninformative as of October 30, 2023, because the only phage within the same subcluster and pham as JasmineDragon is a draft phage. Starterator reports the following gene start sites for JasmineDragon in its pham: (9, 19441), (21, 19624), (27, 19702). /note=Location call: Based on the above evidence, this gene is likely to be real, with a start site of 19441. /note=Function call: The product for this gene likely has no known function. The only strong hits on PhagesDB BLASTp and NCBI BLASTp are attributed to proteins with no known function. On those programs, proteins with a known function have too high of an e-value and too low of a score to be relevant to the protein. CDD and HHpred are similarly uninformative; the CDD page for this protein leads to a blank article, while the HHpred job only yields results with e-values >1. /note=Transmembrane domains: This is not a membrane protein because DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: KIBRIA, KAMILLE /note=Secondary Annotator QC: Gap for start site at 19441 is 39 as of 11/1/23. This is a much more reasonable gap than previously, providing more evidence that the gene is real and that the start site is correct. Other than that I agree with the annotation. I agree function is NKF. I have nothing to add to primary analysis as of 11/15/23. CDS 19923 - 20534 /gene="26" /product="gp26" /function="deoxynucleoside monophosphate kinase" /locus tag="JasmineDragon_26" /note=Original Glimmer call @bp 19923 has strength 17.04; Genemark calls start at 19923 /note=SSC: 19923-20534 CP: no SCS: both ST: SS BLAST-Start: [deoxynucleoside monophosphate kinase [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 5.05977E-128 GAP: 116 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.063, -2.523003374675015, yes F: deoxynucleoside monophosphate kinase SIF-BLAST: ,,[deoxynucleoside monophosphate kinase [Arthrobacter phage VroomVroom]],,WIC90176,93.6274,5.05977E-128 SIF-HHPRED: DEOXYNUCLEOSIDE MONOPHOSPHATE KINASE; TRANSFERASE, PHOSPHOTRANSFERASE; HET: OCS, DGP; 2.0A {Enterobacteria phage T4} SCOP: c.37.1.1,,,1DEK_A,91.133,99.8 SIF-Syn: Deoxynucleoside Monophosphate Kinase; upstream gene has an unknown function, downstream gene has an unknown function, just like in phage Emotion and VroomVroom. /note=Primary Annotator Name: Kim, Abby /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 19923. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -2.523. It is the best final score on PECAAN. /note=Gap/overlap: 116 bp which is somewhat large. However, the gap is conserved in other phages such as VroomVroom and Emotion. There is no coding potential in the gap that might be a new gene as well. /note=Phamerator: pham: 97531. Date 10/28/23. It is conserved and found in VroomVroom (AZ) and Emotion (AZ). /note=Starterator: Start site 45 in Starterator was manually annotated in 39 non-draft genes in this pham. Start 45 is 19923 in VroomVroom and this evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence, this is a real gene and the most likely start site is 19923. /note=Function call: Deoxynucleoside monophosphate kinase. BLAST results supported by CDD (PHA02575, E-value 7.32e-18) and HHpred (PDB 1DEK_B, Probability 99.81%) suggest the ORF encodes for a kinase involved in nucleotide metabolism. This aligns with the function for catalyzing the phosphorylation of deoxynucleoside monophosphates which are essential in phage DNA replication. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note= /note= /note=Secondary Annotator Name: Labib, Youstina /note=Secondary Annotator QC: /note=Auto-Annotation: Both Glimmer and GeneMark call the start site to be 19923 indicating this is likely the start site since it appears in both. /note=Coding Potential: There is good coding potential as indicated in the position signal on the graph in the first ORF of the forward gene. This concludes that this gene is a forward gene and helps support that this gene is real. /note=SD (Final) Score: The SD final score is a value of -2.523. The goal is to have the least negative number which indicates a strong SD Score and this score is the best amongst the options listed. The z-score listed is also in support of this start site as it is listed as 3.063 which is above the ideal threshold of 2, indicating a strong z-score. /note=Gap/Overlap: This gene has a gap fo 116 base pairs. This is a large gap but there is no coding potential in this region according to the coding potential graphs so thus this should not be of concern nor will a gene need to be inserted in this region. /note=Phamerator: 11/3/23 This gene belongs to the pham 97531 which has 222 members in the phamily. There are a variety of phage clusters and subclusters within this gene set including AZ which this gene belongs to. Other AZ cluster matches include the phages Adolin and AEgle. /note=Starterator: 11/3/23 The start number called the most would be 51 which is shown in published annotations. This was called 51 times in 190 non-draft genes. This suggests that the start site is 19923. /note=Location Call: Based on the evidence it appears the suggested start site to be 19923 bp, /note=Function call: The original function call appears correct. I agree with the function call of deoxynucleoside monophosphate kinase as supported by the Phagesdb function frequency, Phagesdb Blast, NCBI Blast, HHpred calls, and CDD call. /note=Transmembrane domains: Deep TMHMM does not provide evidence of TMSs thus we can confirm that this is not a membrane protein. /note=Note: Select starterator drop down menu choice for selection and select all GM coding capacity selection. /note=Note: 11/16/23 The location and function call are both correct and there is no evidence of TMDs thus validating the functional call. CDS 20637 - 21230 /gene="27" /product="gp27" /function="hypothetical protein" /locus tag="JasmineDragon_27" /note=Original Glimmer call @bp 20637 has strength 14.13; Genemark calls start at 20637 /note=SSC: 20637-21230 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_27 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 8.51775E-125 GAP: 102 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 0.897, -6.857740919865087, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_27 [Arthrobacter phage VroomVroom]],,WIC90177,94.9239,8.51775E-125 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: To, Nathan /note=Auto-annotation start source: Glimmer and Genemark both agree on start site 20637 with start codon ATG. /note=Coding Potential: Both Genemark Self and Host show coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. /note=SD (Final) Score: -6.858, this is the not best RBS final score, but it is reasonable, as most of the start sites have final scores around -5.8 to -7, so options with better scores are not marginally better, and create gaps that are extremely large and improbable. /note=Gap/overlap: 102, this is somewhat large but still reasonable, and is conserved among non-draft genomes in the cluster such as in phage emotion reasonable, it is the LORF, giving a gene length of of 594, which is acceptable. /note=Phamerator: This gene is in pham 1819 as of 10/29/31. This pham is in other members of cluster AZ including Emotion and Vroom Vroom. There are no function calls for this gene. /note=Starterator: The called start site is conserved among other phages in the pham. The called start is start 21 @20637. 4 of 71 genomes call site #21. When site #98 is present, it is called 75% of the time. /note=Location call: The gathered evidence suggests that this is a real gene, with start site @20637. This gene is conserved, has good coding potential, and does not have large gaps before or after it. The start site 20637 seems most likely due to its conservation among other genomes in the same cluster and high likelihood of being called when present. /note=Function call: NKF, based on lack of evidence, with no calls from BLAST or CDD, and calls with very low probability (48.7), % coverage (24.8731), and E-value (280) from HHpred. /note=Transmembrane domains: 0, unable to determine if this would be supported by function due to NKF. /note= /note=Secondary Annotator Name: Kim, Abby /note=Secondary Annotator QC: I agree with this annotation. The gene’s start site at 20637 is called by both Glimmer and GeneMark and based on the Host-Trained GeneMark, it has good coding potential in how the start and stop site is outside of the coding potential. Synteny is observed with VroomVroom and Emotion in which is involved with the same Pham 1819, and although a gap of 102 is quite large, it is conserved among the two non-draft genomes. Even though this genome does not contain the “Most Annotated” start site, the start site of 20637 is the only site that has been annotated in cluster AZ4, further confirming that the start site for this gene is 20637. /note=Note: 11/17/23 I agree with the primary annotator’s call that this gene has NKF, no edits need to be made. /note=I have QC’ed this annotation and agree with this annotation. CDS 21466 - 22305 /gene="28" /product="gp28" /function="exonuclease" /locus tag="JasmineDragon_28" /note=Original Glimmer call @bp 21397 has strength 12.22; Genemark calls start at 21397 /note=SSC: 21466-22305 CP: no SCS: both-cs ST: SS BLAST-Start: [exonuclease [Arthrobacter phage Amyev] ],,NCBI, q1:s1 98.2079% 1.86614E-171 GAP: 235 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.971, -4.669046746593814, yes F: exonuclease SIF-BLAST: ,,[exonuclease [Arthrobacter phage Amyev] ],,YP_010677730,92.7273,1.86614E-171 SIF-HHPRED: Mitochondrial genome maintenance exonuclease 1; human MGME1, DNA complex, DNA exonuclease, DNA BINDING PROTEIN; 2.702A {Homo sapiens},,,5ZYT_D,82.7957,99.8 SIF-Syn: Cas4 exonuclease. Both VroomVroom and Emotion have synteny and call the function as Cas4 exonucleases. /note=Primary Annotator Name: Valente, Nina /note=Auto-annotation: Glimmer and GeneMark. Both call the start 21397. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. /note=is a forward gene. /note=SD (Final) Score: -6.898. It is not the best final score on PECAAN. /note=Gap/overlap: There is a 166 bp gap. There is no coding potential in the gap, there is a dip in the coding potential but these do not coincide. /note=Phamerator: pham: 106209. Date: 10/23/2023. It is conserved, found in Emotion and VroomVroom. /note=Starterator: The start site called most often is 38, called in 53/125 non-draft genes. JasmineDragon has the most annotated but does not call it. JasmineDragon calls start site 22. Suggested start is the same, 21397. /note=Location call: Based on the above evidence, this is most likely a real gene with a start site at 21466. 21466 has a better Z score of 1.971, which is closer to 2. It also has a better final score of -4.669. 21466 corresponds to the most annotated Starterator site, 39, as well. It is called 97.3% of the time when present, and is found in 48.7% of the genes in this pham. JasmineDragon is one of only two that does not call the site even when present. Since Starterator is ranked by SeaPhages as more substantial evidence than Glimmer and GeneMark, and both the final score and Z score would be improved, I feel that the start site should be revised to 21466. /note=Function call: The top two PhagesDB hits have Cas4 exonuclease as their listed function. The top two hits are VroomVroom (e-value E-160, 98% identities) and Cassia (e-value E-137, 92% identities). The first NCBI hit was VroomVroom (e-value 0, 99% identities), which was listed as a Cas4 exonuclease as well. The second NCBI hit was Amyev (e-value 2E-171, 83% identities) which was listed as an exonuclease. There were no results for CDD. The top three hits from HHpred are mitochondrial genome maintenance exonuclease (99.85% probability, e-value 3.1E-19), mitochondrial genome exonuclease (99.74% probability, 7.3E-16 e-value), and exonuclease (98.77% probability, e-value: 0.0000051). Based on this evidence, the most likely function is an exonuclease. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Indiresan, Neeti /note=Secondary Annotator QC: I agree with the chosen start site. For the function call, I agree with all the evidence, but I believe the SEA PHAGES approved function name that should be used in the function box is Cas4 exonuclease, not just exonucleause. The evidence you have and your PECAAN notes support this. In addition, make sure to check off the evidence boxes from PhagesDB/NCBI blasts and HHPred. CDS 22315 - 22620 /gene="29" /product="gp29" /function="hypothetical protein" /locus tag="JasmineDragon_29" /note=Original Glimmer call @bp 22315 has strength 15.65; Genemark calls start at 22315 /note=SSC: 22315-22620 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_29 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.28543E-59 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.264, -4.150999100097377, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_29 [Arthrobacter phage VroomVroom]],,WIC90179,95.0495,1.28543E-59 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Laureano, Ryan /note=Auto-annotation: Both Glimmer and GeneMark identity the start site at 22315. The start codon was identified as ATG. /note=Coding Potential: The coding potential was seen and covered by the putative ORFs in both ran programs. /note=SD (Final) Score: The SD score is the best one as it is the highest score. /note=Gap/overlap: The gap upstream is 9 bp which does not suggest an operon but is nonetheless small enough to create the longest ORF. The length of the gene is acceptable at 306 bps. /note=Phamerator: The gene is found in pham 121360 on 11/1/23. The gene shares a pham with VroomVroom_29 which both are part of AZ cluster. There is no function called for this gene. /note=Starterator: There is a reasonable conserved start site at number 39 however there is no correspondence with a bp to JasmineDragon. 54/146 of the members in the pham call the most conserved number of 39. /note=Location call: The gathered evidence suggests the gene is real and the potential start site seems the most likely as it covers the most coding potential. The gene is the correct length and has a high Final Score. The start site is at 22315. /note=Function call: The two programs run, NCBI and phagesDB, provided no known function for this gene. The hits came back as "hypothesized protein" or "unknown function". The HHpred hit that best match had high probability at 96.9 and high coverage at 61%. This hit was described as an uncharacterized protein. A possible function is electron transport and building a respiratory supercomplex but this is not a function approved by SEA-PHAGES. /note=Transmembrane domains: There are no TMRs. /note=Secondary Annotator Name: Kibria, Kamille /note=Secondary Annotator QC: I agree with the annotation overall. I do think that the start site should remain at 22315 based on the evidence. However, this is what I found in my investigation. Good coding potential in host and self trained GeneMark. However, in HostTrained GeneMark, the proposed start site does not contain all the coding potential. It does contain all the atypical coding potential. The proposed start site does contain all the coding potential in the self trained GeneMark. The atypical coding potential of the HostTrained GeneMark looks nearly identical to the SelfTrained GeneMark coding potential. This is not major cause for concern but could be looked at. Also it is important to note that although this gene in JasmineDragon does not contain the most annotated start site (39), start 22 was manually annotated in VroomVroom which is in the same cluster as JasmineDragon (AZ4). This provides even more evidence that the location call is correct. /note=Notes on 11/15/23: you checked JasmineDragon_Draft as evidence in Phages DB Blast but I don’t think we should use any drafts. Make sure to include HHPred screenshot in PECAAN Notebook. I believe that the HHPred hit generated could be informative (low e value but not super low) but might need extra QC on that. Otherwise, I agree with the function call as NKF but may have something to do with ETC. CDS 22617 - 23000 /gene="30" /product="gp30" /function="HNH endonuclease" /locus tag="JasmineDragon_30" /note=Original Glimmer call @bp 22617 has strength 7.38; Genemark calls start at 22791 /note=SSC: 22617-23000 CP: yes SCS: both-gl ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 4.42973E-66 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.269, -4.413617268040186, no F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage VroomVroom]],,WIC90180,85.8268,4.42973E-66 SIF-HHPRED: d.4.1.3 (M:1-105) Intron-encoded homing endonuclease I-HmuI {Bacteriophage SPO1 [TaxId: 10685]} | CLASS: Alpha and beta proteins (a+b), FOLD: His-Me finger endonucleases, SUPFAM: His-Me finger endonucleases, FAM: Intron-encoded homing endonucleases,,,SCOP_d1u3em1,89.7638,99.9 SIF-Syn: The function is HNH endonuclease and both the upstream and downstream gene are NKF. This pattern is also observed in VroomVroom. /note=Primary Annotator Name: Ryan, Kaitlin /note=Auto-annotation: Glimmer and GeneMark showed different start sites of 22617 and 22791, respectively. /note=Coding Potential: High coding potential in the forward direction as indicated by the graphical line of coding potential and thicker black line indicating a region of protein-coding. The region of coding potential appears to start at the site of 22617, indicating that this is the likely start site of the gene, as predicted by Glimmer. /note=SD (Final) Score: -4.414. This is the second least negative final score, which, along with the operon overlap, provides strong evidence for the call. /note=Gap/overlap: -4 overlap, showing that this is an operon. /note=Phamerator: pham: 124188. Date: 11/11/23. It is conserved; found in VroomVroom (cluster AZ) and MiniMommy (cluster AZ, although this is a draft gene) /note=Starterator: Start site 28 was manually annotated in 113/151 non-draft genes in this pham. However, this start site was not manually annotated for this phage gene. /note=Location call: Based on the above evidence, this is a real gene and the start site is 22617. /note=Function call: HNH endonuclease. The top three non-draft hits on phages DB BLAST all call the function as HNH endonuclease (only the top one of VroomVroom has an E-value < 10^-44). The one top NCBI BLAST hit also called it as an HNH endonuclease (E-value of 5e-66, 80.31% identity). HHPRED had a hit for an HNH endonuclease with 99.85% probability, 90.55% coverage, and an E-value of 4.1e-20. CDD had a hit for an HNH endonuclease with an E-value of 1.24e-10. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs; therefore, this is not a membrane protein. /note=Secondary Annotator Name: Tran, Michelle /note=Secondary Annotator QC: I agree with the location and function calls shown above. However, the Starterator and Phamerator sections should be updated because the pham and Starterator site numbers changed from the previous iteration until now. (The conclusions are still the same, though, so this should just be a minor edit.) The Starterator box should also be changed to NA because the most annotated start site on it does not match with the start site suggested for JasmineDragon. (Note that I forgot to add earlier: The synteny box has not been filled out despite the function being indicated. The pham maps indicate synteny with VroomVroom.) CDS 22997 - 23317 /gene="31" /product="gp31" /function="hypothetical protein" /locus tag="JasmineDragon_31" /note=Original Glimmer call @bp 22997 has strength 18.51; Genemark calls start at 22997 /note=SSC: 22997-23317 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_EMOTION_31 [Arthrobacter phage Emotion]],,NCBI, q1:s1 100.0% 4.39609E-54 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.364, -3.867841122493862, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_EMOTION_31 [Arthrobacter phage Emotion]],,WGH21380,88.6792,4.39609E-54 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sanchez, Kayla /note=Auto-annotation: Both Glimmer and Genemark. Both call the start at 22997. /note=Coding Potential: There is good coding potential in both Host-Trained and Self-Trained. Both the start and stop site are outside of the coding potential. It is found in the second forward row so we know that this gene is a forward gene. /note=SD (Final) Score: The only final score reported on PECAAN is -3.868. /note=Gap/overlap: Overlap of -4 bp. This is a reasonable gap that remains conserved in other phages (Emotion and VroomVroom). Acceptable gene length of 321bp. This overlap size suggests that our gene is an operon. /note=Phamerator: pham: 107560. Date 11/1/2023. It is conserved; found in Emotion (AZ4) and VroomVroom (AZ4) /note=Starterator: Start site 1 in Starterator was manually annotated in 2/2 non-draft genes in this pham. Start 1 is 22997 in JasmineDragon. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 22997. /note=Function call: Function unknown. The top two PhagesDB BLASTp hits have an unknown function (E-value <10^-44), and the top two NCBI BLAST hits have the function of a hypothetical protein. (78%+ identity, and E-value <10^-51). The hits given to us by HHpred all had very large e-values and low probability (<80%). /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. It predicts that the protein is found on the inside of the membrane. /note= /note= /note= /note=Secondary Annotator Name: TRAN, MICHELLE /note=Secondary Annotator QC: I agree with the annotation, and all of the evidence has been considered appropriately. CDS 23314 - 23712 /gene="32" /product="gp32" /function="hypothetical protein" /locus tag="JasmineDragon_32" /note=Original Glimmer call @bp 23314 has strength 15.85; Genemark calls start at 23314 /note=SSC: 23314-23712 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_EMOTION_32 [Arthrobacter phage Emotion]],,NCBI, q1:s1 100.0% 1.52603E-63 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.905, -2.8454123590742793, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_EMOTION_32 [Arthrobacter phage Emotion]],,WGH21381,90.1515,1.52603E-63 SIF-HHPRED: SIF-Syn: In comparison to both phages Emotion and VroomVroom, their gene in synteny with this gene does not relay any known function. The previous gene, gene 32, also does not show any known function. However, gene 34 is supported by both Emotion and VroomVroom as being a nucleoside deoxyribosyltransferase. /note=Primary Annotator Name: Hon, Darren /note=Auto-annotation: Gene 32 (stop@23712 F) /note=Coding Potential: Good coding potential, as the potential maps indicate that a single frame consists of a forward transcription direction. Both GeneMark potential maps also indicate a long protein-coding gene of at least 120 bp. /note=SD (Final) Score: -2.845, and no other final score is available for comparison, thereby making this the best final score. /note=Gap/overlap: -4, indicating no additional gene insertion needed, as this overlap is good. /note=Phamerator: As of 11/1/23, the phamerator number is 121223. This is in comparison to VroomVroom_32 and Emotion_32, as they are also in cluster AZ. Though other phams, in a total of 366 members with 24 drafts, use a different start site, this gene does not have the "most annotated" start site of 116. /note=Starterator: As of 11/1/23, the starterator number is 121223 and indicates the start site number of this gene to be 119 in start position 23314. No gene is required to fill the region upstream of the start of the gene. /note=Location call: Based on the above evidence, this is a real gene with a stop site at 23,712 and a start site most likely located at 23314. /note=Transmembrane domains: There are no predicted TMD’s. This indicates that it does not fulfill the criteria for calling it a membrane protein. The absence of TMD makes sense as the hypothesized function is an RDF, a function that does not necessarily require the protein to be on the membrane. /note=Secondary Annotator Name: Laureano, Ryan /note=Secondary Annotator QC: The information in the PECAAN notes was verified. The only thing worth noting is that this gene may be part of an operon as it has an overlap of 4. The function and transmembrane domain sections are verified after second QC. CDS 23709 - 24083 /gene="33" /product="gp33" /function="nucleoside deoxyribosyltransferase" /locus tag="JasmineDragon_33" /note=Original Glimmer call @bp 23733 has strength 12.16; Genemark calls start at 23709 /note=SSC: 23709-24083 CP: yes SCS: both-gm ST: SS BLAST-Start: [nucleoside deoxyribosyltransferase [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 4.13238E-72 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.551, -3.627004739933808, yes F: nucleoside deoxyribosyltransferase SIF-BLAST: ,,[nucleoside deoxyribosyltransferase [Arthrobacter phage VroomVroom]],,WIC90183,91.9355,4.13238E-72 SIF-HHPRED: purine trans deoxyribosylase; PTD, native, 2`-purine deoxyribosyltansferase, TRANSFERASE; 2.1A {Lactobacillus helveticus} SCOP: c.23.14.1,,,1S2L_A,87.9032,99.6 SIF-Syn: nucleoside deoxyribosyltransferase, upstream gene is a recombination directionality factor, downstream gene is a LAGLIDADG endonuclease, just like phage VroomVroom. /note=Primary Annotator Name: Claire, Monjov /note=Auto-annotation: The start sites for glimmer and GeneMark are very similar. They were 23733 and 23709 respectively. /note=Coding Potential: The gene had high coding potential with more than 80 percent covered and the start outside of the coding. The start site was also outside of the coding potential which is a good sign. The coding potential was in the forward direction as well. /note=SD (Final) Score:-3.627. Not a very negative score, indicating a good sequence match. This score also came with a high z-score of 2.551. Also, out of all the candidates it was the highest final score, which is a good indicator we have the right start site. /note=Gap/overlap: 20 base pairs gap. Small gap and very reasonable. /note=Phamerator: Pham 118829 (10/31/23), It is fairly conserved with 63/343 phages being in cluster AZ. The gene is conserved in phages like VroomVroom which is also a AZ4 phage. The function call of the gene varies from deoxycytidylate deaminase, nucleoside deoxyribosyltransferase, hydrolase. Most AZ phages were nucleoside deoxyribosyltransferase functions so I assume this was the most probable function of our phage. /note=Starterator: After further analysis, start site 72 which corresponds to a base pair start of 23709 had 22/296 manual annotations, much more that start site 81. This start site correlates with GeneMark and agrees with their start site. I also know believe that start site 72 with base pair start 23709 is the right starting point. /note=Location call: Based off of what I`ve seen and the evidence from the previous models, I definitely think this is a real gene and if I had to choose the best start site, it would be 23733 /note=Function call: Based on the BLAST results from both NCBI BLASTp and PhagesDB BLASTp, I can conclude that the most probable function of this gene is a nucleoside deoxyribosyltransferase. Both websites say that Emotion and VroomVroom have a conserved gene that is super similar to the gene in phage JasmineDragon, and that function of the gene is known. There is strong evidence to point that this gene is a nucleoside deoxyribosyltransferase. There are no requirements on the Approved SEA-PHAGES list but it is an option on there. /note=Transmembrane domains: 0 TMD`s found in DeepTMHMM. The results make sense that this is not a transmembrane protein because we called a function of A “nucleoside deoxyribosyltransferase”, which catalyzes the cleavage of beta-2`-deoxyribonucleosides and the subsequent transfer of the deoxyribosyl moiety to an acceptor purine or pyrimidine base. This protein works with nucleosides which are inside the cell where the DNA is, not transferring in and out of the protein. /note= /note=Secondary Annotator Name: Sanchez, Kayla /note=Secondary Annotator QC: I agree with the annotation, and all of the evidence has been considered appropriately. CDS 24080 - 24496 /gene="34" /product="gp34" /function="LAGLIDADG endonuclease" /locus tag="JasmineDragon_34" /note=Original Glimmer call @bp 24080 has strength 7.84; Genemark calls start at 24080 /note=SSC: 24080-24496 CP: yes SCS: both ST: SS BLAST-Start: [LAGLIDADG endonuclease [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 3.57604E-91 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.551, -4.3934175870462076, yes F: LAGLIDADG endonuclease SIF-BLAST: ,,[LAGLIDADG endonuclease [Arthrobacter phage VroomVroom]],,WIC90184,97.1014,3.57604E-91 SIF-HHPRED: LAGLIDADG endonuclease; LAGLIDADG homing endonuclease, I-AniI, Hydrolase, Intron homing, Mitochondrion, mRNA processing, mRNA splicing, hydrolase-DNA COMPLEX; HET: MSE; 2.4A {Emericella nidulans (strain FGSC A4 / ATCC 38163 / CBS 112.46 / NRRL 194 / M139)} SCOP: d.95.2.1, l.1.1.1,,,2QOJ_Z,80.4348,99.1 SIF-Syn: LAGLIDADG endonuclease, Upstream gene is a nucleoside deoxyribosyltransferase, downstream gene is a ThyX-like thymidylate synthasem, just like in phage VroomVroom. The downstream gene had no synteny with non-draft phage Emotion, this gene (LAGLIDADG endonuclease), and the Upstream gene (nucleoside deoxyribosyltransferase), had synteny with Emotion. All genes were in the same Pham, 122432. /note=Primary Annotator Name: Bhattarai, Aryan /note=Auto-annotation: Glimmer Start and GlimmerMark Start are both 24080. /note=Coding Potential: There is high coding potential in the forward direction, as indicated by the graphical line of coding potential in both the Host-Trained GeneMark, and Self-Trained GeneMark. /note=SD (Final) Score: -4.393. ​​It is the best final score on PECAAN. /note=Gap/overlap: -4. Gene is likely part of an operon. /note=Phamerator: As of 11/01/2023, the pham number is 116388. The gene is conserved in phages VroomVroom and Emotion, which are both in the same cluster (AZ4) as JasmineDragon. The function call for the gene is not explicitly stated, but a LAGLIDADG endonuclease tail is indicated, due to synteny among genes of both VroomVroom, and Emotion, two non-draft phages in the same subcluster AZ4. /note=Starterator: Start site 17 in Starterator was manually annotated in 2/52 non-draft genes in this pham. Start 17 in JasmineDragon is 24080. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 24080. /note=Functional call: LAGLIDADG endonuclease. This gene has a strong synteny with non-draft phages VroomVroom and Emotion, which are both known to have a LAGLIDADG endonuclease. For both of the programs, PhagesDB BLASTp & NCBI BLASTp, the top hits have scores that are >200 and e-value that are < 3e-56. HHpred and CDD also both support this function. HHpred’s hits 2QOJ_Z, and 4EFJ_A on HHPRED stated the description “LAGLIDADG endonuclease.” Both hits had a probability of 99.1%, 2QOJ_Z’s percent coverage was 80.4348, and 4EFJ_A’s percent coverage was 81.8841. Both e-values were < 3.5e-9. The CDD hit did however result in a higher e-value of 5.27e-04. However, the CDD hit revealed that the domain in question belongs to the LAGLIDADG superfamily, a group of similar domains characterized by their LAGLIDADG-like features. /note=Transmembrane domains: 0. This is not a membrane protein because DeepTMHMM does not predict any TMDs. Also, the LAGLIDADG endonuclease, is a family of homing endonucleases that typically do not require transmembrane domains to function. LAGLIDADG endonucleases are known for their role in DNA cleavage, usually within introns or inteins. They operate in the nucleus or other internal cellular compartments where DNA is accessible. Given this, the absence of TMDs in my protein aligns well with my hypothesized function. /note= /note=Secondary Annotator Name: Salinas Pinon, Juan Carlos /note=Secondary Annotator QC: /note=Auto-annotation: Both GeneMark and Glimmer agree and called this gene’s start site at 24080 /note=Coding Potential: There is coding potential that covers this start site. There is no coding potential in the complementary direction, suggesting that this is a forward gene. There is overlapping coding potential near the end of this gene’s coding potential; however, this shouldn’t be an issue considering there is not tick suggesting a start site. /note=SD (Final) Score: -4.393. This is the highest final score on PECAAN. This is favorable evidence for this start site. /note=Gap/overlap: -4. This suggests an overlap and is indicative of an operon. Downstream of this gene there is no gap suggesting no need for gene addition. /note=Phamerator: pham: 116388 as of 11/02/2023. There are 71 other members, most part of the AZ cluster. This pham is highly conserved and is present in non-draft genes such as Emotion and BaileyBlu. /note=Starterator: This start site has been manually annotated twice. Although this is not the most annotated start site, this is still strong evidence suggesting this start site. The most annotated start site was not called in this gene. /note=Location call: I agree with the primary annotator and would also call the start site at 24080 given the evidence above. /note=Function call: All sources, phages DB BLAST, NCBI, and HHPRED call this a LAGLIDADG endonuclouase and show significant e-values that are greater that our acceptable range of >e-3. The consistent evidence and synteny with VroomVroom lead me to agree with the primary annotator`s function call. Additionally HHPRED calls high probability values, both greater than 80%, suggesting that this gene has been correctly called. /note=Transmembrane domains: 0. This suggests that this is not a membrane protein. CDS 24496 - 25200 /gene="35" /product="gp35" /function="ThyX-like thymidylate synthase" /locus tag="JasmineDragon_35" /note=Original Glimmer call @bp 24496 has strength 15.7; Genemark calls start at 24496 /note=SSC: 24496-25200 CP: yes SCS: both ST: SS BLAST-Start: [ThyX-like thymidylate synthase [Arthrobacter phage VroomVroom]],,NCBI, q2:s3 99.5726% 2.7242E-165 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.659, -4.174392870950849, no F: ThyX-like thymidylate synthase SIF-BLAST: ,,[ThyX-like thymidylate synthase [Arthrobacter phage VroomVroom]],,WIC90185,98.2979,2.7242E-165 SIF-HHPRED: Thymidylate synthase ThyX; Tetramer, UMP/dUMP methylase, ThyX homolog, TRANSFERASE; HET: FAD, 5BU; 1.76A {Streptomyces cacaoi subsp. asoensis} SCOP: d.207.1.0,,,4P5A_D,97.8633,100.0 SIF-Syn: ThyX-like thymidylate synthase, upstream gene is LAGLIDADG endonuclease, downstream is recombination directionality factor, just like in phage VroomVroom. /note=Primary Annotator Name: Kang, Alix /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 24496. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -4.174 It is not the best final score on PECAAN. However, the gene is likely part of an operon and this is less significant. /note=Gap/overlap: Overlap: 1. Gene is likely part of an operon. /note=Phamerator: The pham number as of November 1, 2023 is 121173. The gene is conserved in phage VroomVroom, in the same cluster as JasmineDragon. The function call for the gene is a ThyX-like thymidylate synthase and it is consistent between Phamerator and the phams database. It is on the approved SEA-PHAGES list. /note=Starterator: Start site 90 in Starterator was manually annotated in 25/878 non-draft genes in this pham. Start 90 is 24496 in JasmineDragon. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 24496. /note=Function call: ThyX-like thymidylate synthase. The top three PhagesDB BLAST hits (not including those with an unknown function) have this function, and E-values of 5e-58 to e-128. The top three NCBI BLAST hits also have the function of ThyX-like thymidylate synthase. (97-99% coverage, 50%+ identity, and E-value <1e-70). HHpred had Hits for ThyX-like thymidylate synthase with 99.97-100% probability, 88.03-97.86% coverage, and E-values of 6.3e-29 to 1.5e-35. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Chamorro, Marco /note=Secondary Annotator QC: I have QC`ed the location call and I agree with the primary annotator. The region between the start site and the stop site, called by the primary annotator, exhibits high coding potential in the forward region. Additionally, the start site is called by both GeneMark and Glimmer. However, there is not much synteny with other genes from the same cluster, so not much can be gained from comparative Pham Maps. Starterator agrees with the auto annotated start site additionally. Thus, There is strong evidence to support the location call of primary annotator. /note=For the Module 9 QC. First of all, be sure to click the drop down menu at the top of page and select the suggest startsite (ss) option. This is the only issue. Otherwise, I agree with the primary annotator’s function call. HHpred and the NCBI BLAST both appear to have strong evidence in support of the gene being a ThyX-like thymidylate synthase. The TMD call is correct and the synteny notes are correct. CDS 25346 - 26047 /gene="36" /product="gp36" /function="recombination directionality factor" /locus tag="JasmineDragon_36" /note=Original Glimmer call @bp 25346 has strength 18.59; Genemark calls start at 25346 /note=SSC: 25346-26047 CP: yes SCS: both ST: SS BLAST-Start: [recombination directionality factor [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 5.2108E-169 GAP: 145 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.561, -4.374347681057712, yes F: recombination directionality factor SIF-BLAST: ,,[recombination directionality factor [Arthrobacter phage VroomVroom]],,WIC90186,99.5708,5.2108E-169 SIF-HHPRED: Gp3-like ; Recombination directionality factor-like,,,PF18897.4,87.5536,100.0 SIF-Syn: The gene is a recombination directionality factor in the pham 848. The upstream gene is in the pham 122432. Downstream is a gene in the pham 118342. These characteristics are seen in both VroomVroom and Emotion /note=Primary Annotator Name: Saud, Arnav /note=Auto-annotation: Glimmer and Genemark both call the start at #25346. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this gene is a forward gene. Coding potential is found in both GenemarkHost and GenemarkSelf. /note=SD (Final) Score: -4.374. This score is best final score called by PECANN. /note=Gap/Overlap: The gap between the start site and previous stop is 145 base pairs. This gap is large; however, the gap is conserved in VroomVroom as well. There is also no coding potential within this gap. /note=The gap between the stop site of our gene of interest and next gene`s start site is 1 base pairs. This gap is relatively small and is conserved in VroomVroom as well. /note=Phamerator: pham: 848 Date: 10/30/2023. The gene is conserved within genes of the same subcluster (AZ4); it`s found in VroomVroom (AZ4) and Emotion (AZ4). /note=Starterator: Start site 37 in Starterator was manually annotated in 56/119 non-draft genes in this pham. Start 37 is 25346 in JasmineDragon. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this gene is real and the most likely start site is 25346. /note=Function call: Recombination Directionality Factor. The top two phagesDB BlastP hits have the function of recombination directionality factor (E-value < 10^-103), and the top 2 NCBI BLAST hits also have been assigned the function of major tail protein( 100% coverage, +75% identity, and e-value < 6.4e-130). HHpred had a hit for a recombination directionality factor protein with 100% probability, 87.55% coverage, and E-value of 7.7e-32. CDD had one valid hit that called the gene of interest to be in the DUF5653 superfamily. This family of genes is characterized by having the function of recombination directionality factor. /note=Transmembrane domains: DeepTMHMM does not predict any transmembrane domains for /note=this gene; therefore, it is not a membrane protein. /note=Secondary Annotator Name: Valente, Nina /note=Secondary Annotator QC: I agree with this annotation. All of the evidence categories have been considered. The manual and auto-annotated start sites agree, coding potential is appropriately covered by the ORF, gaps are conserved and lack coding potential, and the specific gene is conserved in other members of the pham. /note= I agree with this annotation (11/16). The above evidence has been thoroughly considered. Both the top phagesDB and NCBI blastp hits suggest this gene has the function of recombination directionality factor. Both HHpred and CDD showed results that align with this hypothesis, also suggesting a functionality of recombination directionality factor. The lack of TMDs makes sense given the context of the hypothesized gene function, and the synteny box is completed. CDS 26049 - 26249 /gene="37" /product="gp37" /function="membrane protein" /locus tag="JasmineDragon_37" /note=Original Glimmer call @bp 26049 has strength 17.89; Genemark calls start at 26049 /note=SSC: 26049-26249 CP: yes SCS: both ST: NI BLAST-Start: [membrane protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 7.29721E-13 GAP: 1 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.145, -4.9668027763900575, yes F: membrane protein SIF-BLAST: ,,[membrane protein [Arthrobacter phage VroomVroom]],,WIC90187,84.8485,7.29721E-13 SIF-HHPRED: SIF-Syn: membrane protein; upstream gene is recombination directionality factor and no downstream gene available, but both genes are present on either side that are conserved in VroomVroom. /note=Primary Annotator Name: Daniel, Mila /note=Auto-annotation: Both Glimmer and GeneMark call the start at 26,049. /note=Coding Potential: Coding potential is seen in the ORF in the forward direction with good solid blocks in between the START and STOP sites. /note=SD (Final) Score: -4.967. This is the best final score on PECAAN. /note=Gap/overlap: 1 bp. This gap may be an indication that this gene is in an operon. /note=Phamerator: Pham 118342. Date 10/31/2023. It is conserved; found in both MiniMommy (AZ) (draft) and VroomVroom (AZ). /note=Starterator: Start site 3 was manually annotated in 1/1 non-draft genes in this pham. Start 3 is not present in JasmineDragon, but Start site 4 is the autoannotated start site. The evidence in Starterator is too lacking to make the determination that either are correct, but start site 4 does agree with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 26049. /note=Function call: membrane protein, the top two NCBI BLAST hits have the function of membrane protein, but only one a relatively low enough e-value that can be classified as good evidence (100% coverage, 73%+ identity, and E-value <10^-13). HHpred and CDD had no relevant hits. /note=Transmembrane domains: DeepTMHMM predicts 2 TMDs, therefore it may be a membrane protein. /note=Secondary Annotator Name: Saud, Arnav /note=Secondary Annotator QC: I agree with this annotation. I think the annotator needs to indicate that Minimommy is a Draft gene as this may have some implications on the PECANN notes. All of the other categories have been considered. (11/2/23) /note=I agree with this annotation. The annotator needs to indicate the e-value for the phages DB BlastP hits in the function call, and they need to indicate the coverage, identity and e-value for the NCBI Blast hits. (11/14/23) CDS 26246 - 26371 /gene="38" /product="gp38" /function="hypothetical protein" /locus tag="JasmineDragon_38" /note=Original Glimmer call @bp 26246 has strength 12.43 /note=SSC: 26246-26371 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_38 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 8.10583E-17 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.236, -6.307665910037288, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_38 [Arthrobacter phage VroomVroom]],,WIC90188,92.6829,8.10583E-17 SIF-HHPRED: SIF-Syn: Nrdh-like glutaredoxin, upstream gene Pham # 48566 NKF, downstream gene is holliday junction resolvase, just like in phage VroomVroom and Emotion (AZ4) /note=Gene 39 (stop@26371 F) /note=Primary Annotator Name: Labib, Youstina /note=Auto-annotation: Only Glimmer calls the gene while GeneMark does not. The called start site for this gene according to Glimmer is position 26246. The other start site listed is at position 26153. /note=Coding Potential: The coding potential is good as indicated by the signal present in the second open reading frame. The start codon is GTG and the position of the original call is 26246. There is a high probability of using this codon since GTG and ATG (the only proposed start codon) are most commonly seen start codons and appear at equivalent frequencies. This start covers all of the coding potential from the GeneMark map. As shown the coding region is within the start codon of the first open reading frame. /note=SD (Final) Score: This start has the best Ribosome Binding Site score of -6.309 compared to the other provided score of -8.073. The goal for RBS Final Score is to have the number be less negative, hence this is why this RBS Score is the best. /note=Gap/overlap: There are no gaps upstream of my gene, there is actually an overlap of 4 base pairs upstream, which is an indication of an operon. The gene directly upstream is not annotated but the gene downstream is labeled as an Nrdh-like glutarodoxin, which gives us insight on the potential function of this gene. A gene does not need to be added in this case. /note=Phamerator: Pham: 48566 and this was observed on 10/31/2023. It is conserved, the pham that my gene belongs to is present in other members of the cluster/subcluster in which my phage belongs to. Phages that I used for comparison include Emotion and VroomVroom. These are both within the subcluster AZ as is my assigned gene. /note=Starterator: The start site that was called most often in published annotations was start 5, shown in 2 of 4. There were 2 of 2 non-draft phage annotations that called it. /note=Location call: Based on the evidence above, this is a real gene and the start codon originally stated above is most correct of the options. The start site is 26246. The auto-annotated start number is 4 and the start position is 26246. /note=Starterator Drop-Down Menu : Based on all the evidence I have collected thus far, I agree with the auto annotated start site that this Gene begins at base pair 26246. The rationale behind this is that the Starterator data demonstrates that this call site has the most number of calls across phages. Additionally, this Gene has matches to other phages that are non-draft. /note=Coding Potential Drop-Down Menu : The gene does have reasonable coding potential predicted within the putative ORF. The chosen start site covers all this coding potential. /note=Function call: No known function. The HHPred and CDD data revealed no significant hits and thus there is no evidence of a known function. /note=Transmembrane domains: There are no transmembrane domains in this gene. As demonstrated by the graph, the topology is predominantly inside thus there is no indication of transmembrane components. This module could’ve potentially refuted our previous hypothesis that this gene doesn`t have a function, however, since no transmembrane evidence was found the functional call of NKF is validated. /note=Secondary Annotator Name: Mahadev, Anirudh /note=Secondary Annotator QC: I have QC`d this gene and agree with the Primary Annotator`s call on the location and that this is a real gene. Glimmer calls the start site at 26246 and GeneMark does not call the start site of the gene. I reviewed the Primary Annotator`s annotation notebook and agree with the claims about the gene`s coding potential on the GeneMark and synteny on the Pham Map. It seems to only have good coding potential in one ORF but I think that can be overruled by the Starterator data, which shows that it is both the Most Annotated and Best start site. The gene has a low Z score of 1.236 and a highly negative final score of -6.308, but again, these two are the best among the two options and are supported by the Starterator. Phagesdb BLAST has very low e-values near zero, so it seems to be a highly conserved gene. Corrections: In the Location Call notes you say that the auto-annotated start number is 4 for 26246 but the Starterator Data says 26246 is Start site 5, which I believe should be the correct start site because it is conserved in the Pham. In the Gap/Overlap section you say the overlap is 3 base pairs upstream, but it is 4 base pairs. I would also state somewhere that the gene could be part of an operon, which could be the case because of the gap of -4 and the relatively low Z score and highly negative final score relative to other genes. The following comments are for Module 9 QC. I agree with the primary annotator’s call that this gene has NKF, no edits need to be made. CDS 26463 - 26750 /gene="39" /product="gp39" /function="NrdH-like glutaredoxin" /locus tag="JasmineDragon_39" /note=Original Glimmer call @bp 26463 has strength 16.72; Genemark calls start at 26463 /note=SSC: 26463-26750 CP: yes SCS: both ST: SS BLAST-Start: [NrdH-like glutaredoxin [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 97.8947% 2.54797E-53 GAP: 91 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.063, -2.583959800616441, yes F: NrdH-like glutaredoxin SIF-BLAST: ,,[NrdH-like glutaredoxin [Arthrobacter phage VroomVroom]],,WIC90189,80.1802,2.54797E-53 SIF-HHPRED: c.47.1.0 (A:) automated matches {Baker`s yeast (Saccharomyces cerevisiae) [TaxId: 559292]} | CLASS: Alpha and beta proteins (a/b), FOLD: Thioredoxin fold, SUPFAM: Thioredoxin-like, FAM: automated matches,,,SCOP_d2m80a_,75.7895,99.2 SIF-Syn: /note=Primary Annotator Name: Carnes, Julianne /note=Auto-annotation: Glimmer and GeneMark, Both call the start at 26463 /note=Coding Potential: Coding potential is found both in GeneMark Self and Host. Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. /note=SD (Final) Score: -2.584, this is the least negative score and second start site option. /note=Gap/overlap: Gap: 91 base pairs. This is a larger gap but reasonable because this gene is conserved among VroomVroom and Emotion (AZ4). There is no coding potential within this 91 base pair gap. /note=Location call: Based on the evidence, it is likely that this is a real gene the start site of this gene is at 26463. /note=Phamerator: Pham 121171. Date reported: 10/31/23. This start site is conserved among phages found in the same cluster (AZ4) (VroomVroom and Emotion). /note=Starterator: Start site 90 in Starterator was manually annotated in 22/26 phages with this start site, and 22/988 members with this pham. It is not the most annotated, but does reflect the coding potential of this gene. /note=Location call: Based on the evidence, the start site for this gene is @26463 and is a real gene. /note=Function call: NrdH-like glutaredoxin: Blastp suggests multiple different hosts and similar lengths and same functions for this protein for phages. This protein is seen across a variety of different clusters. The first NCBI hit was from Athrobacter phage VroomVroom for an NrdH-like glutaredoxin with an e-value of 3e-53 and 86.60% identity. A CDD hit found NrdH [cd02976] with an e-Value of 2.29e-11. HHpred hit found 1H75 with a probability of 99.19% and an e-value of 5.3e-9, its function is a NrdH-like glutaredoxin. /note=Transmembrane domains: There are no TMDs, and therefore this is not a membrane protein. /note=Secondary Annotator Name: To, Nathan /note=Secondary Annotator QC: I QC`d this gene and agree with the Primary Annotator that this is a real gene with start position 26463. I agree that there is strong coding potential, Z score and final scores indicate that this is a strong start site. Both Glimmer and GeneMark agree on the start site, which the Primary Annotator claims is the best start site. Although somewhat large, the gap is reasonable and no gene needs to be inserted before this gene, supported by the fact that there is no coding potential in the gap, and the gene is conserved among other members of the cluster (Emotion, VroomVroom). The start site also provides the largest ORF and is much larger than the length of the other start`s ORF. /note=Final QC Comments: I agree with the function call made by the primary annotator, due to the high probability from the hits from BLAST and HHpred. CDS 26737 - 27180 /gene="40" /product="gp40" /function="Holliday junction resolvase" /locus tag="JasmineDragon_40" /note=Original Glimmer call @bp 26737 has strength 13.29; Genemark calls start at 26737 /note=SSC: 26737-27180 CP: yes SCS: both ST: SS BLAST-Start: [holliday junction resolvase [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.44008E-99 GAP: -14 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.165, -6.374982888595457, no F: Holliday junction resolvase SIF-BLAST: ,,[holliday junction resolvase [Arthrobacter phage VroomVroom]],,WIC90190,97.973,1.44008E-99 SIF-HHPRED: Holliday junction resolvase; archeal holliday junction resolvase helicase DNA binding enzyme phage 15-6 thermus thermophilus, RECOMBINATION; HET: SO4, MSE; 2.5A {Thermus thermophilus phage 15-6},,,7BGS_B,74.1497,99.6 SIF-Syn: Phages VroomVroom and Emotion (AZ4) are a part of the same cluster as Jasmine Dragon. This particular gene is conserved in both phages, and this exhibits synteny and supports the function call. /note=Primary Annotator Name: Chamorro, Marco /note=Auto-annotation: Glimmer and GeneMark both suggest the start at 26737. /note=Coding Potential: These is high coding potential along the forward strand, suggesting this is a forward gene. /note=SD (Final) Score: The SD score is -6.375, which is the second best score on PECAAN /note=Gap/overlap: This is a overlap of 13 nucleotides with the previous gene, which, relatively, is not extreme. /note=Phamerator: As of November 1st 2023, this gene is part of pham 121244. This is conserved in Emotion (AZ4) /note=Starterator: Start site 67 was manually annotated in 62/263 non-draft genes in the pham. This evidence disagrees with the site site predicted by glimmer and genemark /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 26737. /note=Function call: The predicted function for this gene is a Holliday junction resolvase based on the multiple hits on PhagesDB BLASTp with VroomVroom (e-value = 3e-56) and Emotion (e-value = 3e-56), both of which belong to the holliday junction resolvase class of protein. NCBI Hit predicts the same results but with e-values of 1e-99 and 1e-63 respectively. HHpred alignments also indicate that this protein a holliday junction resolvase protein involved in the remove of four-way DNA intermediates (Holliday junctions) during recombination, ensuring DNA replication. PhagesDB BLASTp, NCBI Hits, and HHpred collectively agree that the gene translates to a minor tail protein. /note=Transmembrane domains: According to the DeepTMHMM this protein has no transmembrane domains and it is not a transmembrane protein. /note= /note=Secondary Annotator Name: Laureano, Ryan /note=Secondary Annotator QC: The information in PECAAN is not complete yet. The Phamerator shows that the gene is in Pham 122383 as of 11/3/2023. The suggested start site is most likely correct as it has the most coding potential and has the longest ORF but start site 26836 should be looked at as it has a better SD and Z score. The function call and transmembrane domains are verified after 2nd QC. CDS complement (27177 - 27365) /gene="41" /product="gp41" /function="hypothetical protein" /locus tag="JasmineDragon_41" /note=Original Glimmer call @bp 27365 has strength 9.91; Genemark calls start at 27371 /note=SSC: 27365-27177 CP: yes SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_41 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.94172E-33 GAP: 124 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.156, -2.253486910374644, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_41 [Arthrobacter phage VroomVroom]],,WIC90191,98.3871,1.94172E-33 SIF-HHPRED: SIF-Syn: This gene has synteny with Emotion, but not with VroomVroom, which it was compared to for the function call data. It has the same gene order and it appears to be a conserved gene. /note=Primary Annotator Name: Mahadev, Anirudh /note=Auto-annotation: Glimmer and GeneMark do not agree on the same start site. Glimmer called the statrt site at 27365 and GeneMark called the start codon at 27371. Start Codon 27365 was called by the auto-annotation, and the start codon is ATG. /note=Coding Potential: Within the putative open reading frame, there is reasonable coding potential in the Host Trained GeneMark and in the Self Trained GeneMark. The chosen start site does cover the coding potential. /note=SD (Final) Score: The final score is the best possible among the four options at -2.253, as it is the least negative out of the options. /note=Gap/overlap:The gap is 124 base pairs for the start site that Glimmer called, which seems a little long for a gene that is only 189 base pairs long, but it is not unreasonable enough to invalidate the start site. /note=Phamerator: This gene was found in Pham 121635 on 10/31/2023. It is conserved in the Pham and is called in 24/27 of the non draft genes in the Pham. I compared it on the Starterator map to phages like Tweety, Wildwest, VResidence and Reedo, all of which had the same called start site at position 21. However, Emotion, the phage that I compared the Pham Map to, had a different called start site at position 24. I did not see a function that was called for this gene, and the Phages DB Blast has "Function unknown" among other phages as well. /note=Starterator: Start site 21 seems to be a reasonable start site for the members of this Pham as it is the most highly conserved and appears in the most phages. The start site occurs at: Start: 21 @27365 has 24 MA`s. Coordinate: (21, 27365). The Pham has 40 members in total, and 27 are non-draft annotations. Start 21 is called in 38/40 of the genes in this Pham (95%), and 86.8% of the time when present. The Starterator is informative and seems to agree on the Start site at position 21. This also agrees with the location call, which also calls the same start site at 27365. /note=Location call: The gathered evidence suggests that this is a real gene because of its high coding potential, reasonable gap and length, synteny with VroomVroom and Emotion, and the start site at 27365 seems the most reasonable. /note=Function call: I think it is NKF because of a lack of data from all sources. I did not find any hits on CDD, HHpred or either NCBI or PhagesDB Blast. I disregarded the data from the NCBI Blast because of its comparison to a hypothetical protein from VroomVroom, and the data from HHPred because of the abnormally high e-values. In addition, the gene on VroomVroom is also called as NKF, which I think should be called for this gene on JasmineDragon as well. /note=Transmembrane domains: Therre are no trans membrane domains, and this protein is found inside of the phage. /note=Secondary Annotator Name: Woodward, Lauren /note=Secondary Annotator QC: I agree with this location call based on the evidence presented above. I agree with the function call as well. CDS 27490 - 30000 /gene="42" /product="gp42" /function="DNA primase/helicase" /locus tag="JasmineDragon_42" /note=Original Glimmer call @bp 27490 has strength 17.0; Genemark calls start at 27490 /note=SSC: 27490-30000 CP: yes SCS: both ST: SS BLAST-Start: [DNA primase/helicase [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 124 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.379, -3.9166971242758706, no F: DNA primase/helicase SIF-BLAST: ,,[DNA primase/helicase [Arthrobacter phage VroomVroom]],,WIC90192,96.747,0.0 SIF-HHPRED: Putative primase C962R; polymerase, primase, PrimPol, Helicase, DNA BINDING PROTEIN; HET: ANP;{African swine fever virus BA71V},,,8IQI_E,55.3828,100.0 SIF-Syn: /note=Primary Annotator Name: Hosford, Ryan /note=Auto-annotation: Glimmer and GeneMark both agreed on the start site of 27490 which is an ATG /note=Coding Potential: Highly likely, the start site contains the entire coding potential of the gene and shows reasonable coding throughout the entirety of the self trained and host trained maps. /note=SD (Final) Score:The SD score is not the highest, at -3.917 but it is within the suitable range of scores suggesting that the gene has a high probability of starting at that site. /note=Gap/overlap: The gap with the upstream gene is on the higher end of the acceptable range at 124 but this is most likely due to the gene before it being a reverse gene so the gap should be a little bit larger than normal. I think that GeneMark and Glimmer choosing this start site along with the coding potential gives the start site credibility. The other sites, one with a larger open reading frame and one with better Final Score and Z-value dont look to be as correct because the LORF (27337) has an overlap of -29, which is a bit much, and the one with the better scores of 2.568 (Z) and -3.803 (Final) (@27700) has a gap of 334. /note=Phamerator: The gene is found under the pham 85082 as of 10/30/23, it is heavily conserved among the AZ cluster, I referenced Warda, Yang, YesChef, Powerpuff, and Obitoo. the phamerator has notes under the final annotations of phages calling the function as DNA primase/helicase /note=Starterator: The start choice of 49 @ 27490 was conserved among 57% of the genes in this pham. 57 of the 119 final annotations called this site as correct. /note=Location call: The gene is definitely real, the synteny with VroomVroom and Emotion give that credit as well as its E-values with those two and more being 0 or extremely close. The start site with the most potential is the 27490 which was called by both Glimmer and GeneMark, it uses an ATG start codon and contains the entire coding potential in the Host-trained GeneMark map. 57 MAs also called this site which is also the Most Annotated start and leads me to believe it is correct. /note=Function call: Based off of the BLASTp for both NCBI and PhagesDB the sequence corresponds to VroomVroom and Emotions same gene with E values of 0.0 and high percentage match giving me the ability to say it is confidently a DNA primase/helicase for its function. /note=Transmembrane domains: None present. /note=Secondary Annotator Name: Potter, Sofia /note=Secondary Annotator QC: Everything seems to match with what I found for this gene, I only saw a small typo of -3.197 instead of -3.917 when you`re discussing the final score for the start site. My version of the PECAAN notes is below. Second QC 11/17/23: To conclude that this is a DNA primase/helicase there should be evidence of both primase and helicase (as stated in the SEA-PHAGES Approved Function List), but I do also see the low e-values and would agree with the primary annotator that that is still the most likely function for this gene. /note= /note=Auto-annotation: Both Glimmer and GeneMark call the start at 27490. /note=Coding Potential: There is good coding potential in GeneMark Self and Host. There is a small gap caused by detection of a stop codon towards the end of the gene, but this does not appear to be significant against the graphical evidence of coding potential and synteny evidence of this being one, continuous gene. /note=SD (Final) Score: -3.917, not the best final score on PECAAN but very narrowly second. The site with the best PECAAN score would have a significantly (few hundred) downstream bp start site, which would not match the extent of the coding potential we see on GeneMark. /note=Gap/overlap: 124 (gap) /note=Phamerator: As of November 3, 2023, this gene is in pham number 85082. The gene is also conserved in VroomVroom and Emotion, the two non-draft phages of the same cluster, AZ4. /note=Starterator: The auto-annotated start site call of 49 aligns with VroomVroom and Emotion, which both also call start site 49 for this gene. Start site 49 in JasmineDragon corresponds to 27490. /note=Location call: Based on the above evidence, this is a real gene with a most likely start site of 27490. CDS 30010 - 30129 /gene="43" /product="gp43" /function="hypothetical protein" /locus tag="JasmineDragon_43" /note=Original Glimmer call @bp 30010 has strength 11.4; Genemark calls start at 30010 /note=SSC: 30010-30129 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_43 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 97.4359% 4.78328E-16 GAP: 9 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.139, -4.325675514574479, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_43 [Arthrobacter phage VroomVroom]],,WIC90193,89.7436,4.78328E-16 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Indiresan, Neeti /note=Auto-annotation: Glimmer and Genemark both call the gene start site at 30010 with start codon ATG. /note=Coding Potential: The gene shows coding potential in the forward direction according to both host and self. The chosen start site covers this coding potential. /note=SD (Final) Score: -4.326. It is the best final score on PECAAN. /note=Gap/overlap: 9 bp, which is a small gap and is consistent with the corresponding genes in other similar phages in the cluster. /note=Phamerator: pham: 89866. Date 11/01/2023. It is conserved; found in VroomVroom (AZ) and Emotion (AZ). /note=Starterator: Start site 2 in Starterator was manually annotated in 2/2 non-draft genes in this pham. Start 2 is 30010 in JasmineDragon. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence shown above, this is likely a real gene and the most reasonable start site is 30010. /note=Function call: Function unknown. The top two phagesDB blast hits have unknown functions and the top two NCBI blast hits have the function hypothetical protein. In addition, there was not enough evidence from CDD and HHpred to assign a function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: To, Nathan Joseph /note=Secondary Annotator QC: I QC`d this gene and agree with the Primary Annotator that gene is real with start site 30010. I agree that there is strong coding potential, and the strong Z score and final score indicate the strength of this start site. Both Glimmer and GeneMark agree on the start site at 30010, which is the call made by the annotator. The gap of 9 bp is reasonable and no gene needs to be inserted before this gene. The gene is conserved among other members of the cluster (Emotion, VroomVroom). The start site is also supported since there are only two available sites and the selected one gives a more favorable final score and gap, as well as a significantly larger ORF. /note=Final QC Comments: I agree with the NKF call due to the lack of strong hits from BLAST, HHpred, or CDD. CDS 30221 - 30355 /gene="44" /product="gp44" /function="hypothetical protein" /locus tag="JasmineDragon_44" /note=Original Glimmer call @bp 30221 has strength 10.18 /note=SSC: 30221-30355 CP: yes SCS: glimmer ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_44 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 4.21525E-19 GAP: 91 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.905, -2.8454123590742793, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_44 [Arthrobacter phage VroomVroom]],,WIC90194,95.4545,4.21525E-19 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kalliomaa, Kira /note=Auto-annotation: Glimmer only. Glimmer calls start at 30221. /note=Coding Potential: Little coding potential found both in GeneMark Self and Host. However, what little coding potential in this open reading frame (ORF) is on the forward strand only, indicating that this is a forward gene. /note=SD (Final) Score: -2.845. This is the best final score on PECAAN. /note=Gap/overlap: 91 bp. This is a somewhat large gap (larger than the 50bp limit) but in the end is reasonable due to coding potential in the gap that could indicate a new gene. /note=Phamerator: pham: 88080. Date 11/01/2023. It is conserved; found in VroomVroom (AZ). /note=Starterator: Start site 9 in Starterator was manually annotated in 7/7 non-draft genes in this pham (89866 ). Start 9 is 30221 in JasmineDragon. This evidence agrees with the site predicted by Glimmer. : /note=Location call: Based on the prior evidence, this is a real gene with a start site of 30221. /note=Function call: NKF /note=Transmembrane domains: 0; DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note= /note=Secondary Annotator Name: Daniel, Mila /note=Secondary Annotator QC: I agree with this annotation. All of these evidence categories have been considered, and the primary annotator made sure to fill out the two drop-down menus. CDS 30557 - 32419 /gene="45" /product="gp45" /function="DNA polymerase I" /locus tag="JasmineDragon_45" /note=Original Glimmer call @bp 30557 has strength 19.78; Genemark calls start at 30557 /note=SSC: 30557-32419 CP: yes SCS: both ST: SS BLAST-Start: [DNA polymerase I [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 0.0 GAP: 201 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.303, -2.4811409279978642, yes F: DNA polymerase I SIF-BLAST: ,,[DNA polymerase I [Arthrobacter phage VroomVroom]],,WIC90195,99.6774,0.0 SIF-HHPRED: Apicoplast DNA polymerase; DNA polymerase, exonulease, apicoplast, Plasmodium falciparum, REPLICATION, TRANSFERASE; HET: PEG, EDO; 2.5A {Plasmodium falciparum (isolate 3D7)},,,7SXQ_B,96.7742,100.0 SIF-Syn: DNA polymerase I, upstream gene and downstream are both NKF as seen in phage Emotion, which displays close synteny. /note=PECAAN Notes /note=Primary Annotator Name: Potter, Sofia /note=Auto-annotation: Glimmer and GeneMark both call the start at 30557. /note=Coding Potential: Coding potential in the ORF is shown only in the forward strand, agreeing that this is a forward gene. Coding potential was found in both GeneMark Self and Host. /note=SD (Final) Score: -2.481, the best final score on PECAAN, and lower than the accepted -2 cutoff. /note=Gap/overlap: 201 bp gap, a relatively large gap, but with no coding potential that would justify extending this gene upstream. there is also synteny in Emotion with a similar gap. /note=Phamerator: The pham number as of November 1, 2023 is 121166. This gene is conserved in VroomVroom and Emotion, the two non-draft phages in the same cluster (AZ4). The function call is DNA polymerase I. /note=Starterator: The start site number 219 is conserved in VroomVroom and Emotion, the only two non-draft phages also in AZ4. This correlates to start 30557 in JasmineDragon. /note=Location call: Given the Starterator and Phamerator calls and the evidence that this is a real gene, the start site is at 30557 bp, corresponding to start site 219. /note=Function call: As seen especially clearly in the PhagesDB and BLASTp graphical outputs, there is high conservation of sequence between this gene and many genes already classified as encoding DNA polymerase I. I think it is a compelling hypothesis that this gene functions as DNA polymerase I. This gene has an extensive list of low or 0 e-value hits for DNA polymerase I in other viruses, which appears to be highly conserved sequentially. Every major hit from BLASTp, HHpred, and CDD matches another DNA polymerase. /note=Transmembrane domains: DeepTMHMM did not predict any transmembrane domains, so this is not a membrane protein. /note=Secondary Annotator Name: Carnes, Julianne /note=Secondary Annotator QC: I agree with this start site call. The evidence is well-supported for this gene. The gap is quite large, however, there is no coding potential or start sites that close the gap according to the coding potential. Although this gene did not have the most manual annotations in Starterator, 48 manual annotations were done and it appears many phages with this start site are in the AZ cluster. Based on the function call, I agree that this is DNA polymerase I. There are over 20 non-draft BLAST hits with a 0.0 e-value with the DNA polymerase I function. NCBI has provided good evidence for function. A PDB (7SXQ_A) had a 100% probability and high coverage percentages. Overall, I agree with this function call. Corrections: In your CDD evidence box, you checked off pfam00476 as evidence, and it has 29% identity and 46% coverage. This may or may not need further investigation. Also, in your gap/overlap notes, please add more information (what is considered a large gap and why there is one). /note=> I`ve modified the CDD evidence box to reflect choices with higher coverage and % identity, and added more information to the gap/overlap notes CDS 32416 - 32613 /gene="46" /product="gp46" /function="hypothetical protein" /locus tag="JasmineDragon_46" /note=Original Glimmer call @bp 32434 has strength 9.48; Genemark calls start at 32434 /note=SSC: 32416-32613 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_EMOTION_47 [Arthrobacter phage Emotion]],,NCBI, q3:s2 89.2308% 1.64211E-28 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.987, -4.717114992838803, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_EMOTION_47 [Arthrobacter phage Emotion]],,WGH21396,74.6479,1.64211E-28 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Salinas Pinon, Juan Carlos /note=Auto-annotation: Glimmer & GeneMark both agree on this call at start site 32434; however this is not the LORF. Additionally, these calls do not have the highest final scores or Z-score. The start codon is ATG, which is preferred due to its commonality, but GTG is also a favorable option. The length of this call is 180bp long; however, there are longer options listed on PECAAN. /note=Coding Potential: There is high coding potential only in the forward direction. This indicates that it is a forward gene. The coding potential covers the region of interest at start site 32416 only. There is no coding potential on the complimentary strand. /note=SD (Final) Score: -4.717. This is the best final score on PECAAN. /note=Gap/overlap: -4. This is indicative of an overlap upstream of the gene and possibly an operon. Synteny exists with the phage Emotion but not with VroomVroom. The lack of synteny does not indicate that this is not a real gene as conservation exists. /note=Phamerator: Pham: 85757. Date 11/01/2023. This is a conserved pham in 23 non-draft phages and genes, such as Emotion, Dr.Manhattan_41, and Nitro_41. /note=Starterator: The most manually annotated site is site 4. This was called 23/35 non-draft genes. The start site bp location is at 32416. Note that the Glimmer and GeneMark calls do not match this. Given the evidence and its conservation across many other non-draft genes, site 4 is a better option. /note=Location call: The most evident start site is start site 4 at 32416. This start site has the smallest gap, the highest final score, and a z-score close to 2. Additionally, this also has the smallest gap upstream of the gene, which fills the gap that would exist otherwise. /note=Functional call: NFK. The gene has little evidence of a called function based on data collected from phagesDB BLAST, NCBI, and HHPRED. All three databases primarily suggest a NKF when compared to other non-draft phages. The only functions called have little evidence supporting it, such as very high e-values as in the case of HHPRED, or simply not enough evidence as is the case for NCBI. /note=Transmembrane domains: There are no TMDs predicted, so this is not a transmembrane protein. /note= /note=Secondary Annotator Name: Zaragoza, Evelin /note=Secondary Annotator QC: /note= /note=Auto-annotation: GeneMark and Glimmer. Both call the start at 32434 which does not produce the LORF or have the highest final score at -6.307. The z-score is not above 2, but none of the others are either. The start codon is ATG which is not uncommon. However, I believe the start site should be 32416 because the start codon is not uncommon (GTG), it is a longer gene at 198 bp as opposed to 180 bp, and the final score is higher. This start site would include more of the coding potential. 31570 is not an option due to the big overlap with the upstream gene. /note=Coding Potential: Start site 32434 does not include all of the coding potential and so 32416 would be a better choice. It includes all of the coding potential. There is no coding potential on the complementary strand so this is a forward gene. /note=SD (Final) Score: Final score is on the higher end at -4.717 but the z-score is below 2. When compared to other final scores, this start site seems the most reasonable. /note=Gap/overlap: There is an overlap of four with the upstream gene, which may suggest that it is an operon. Synteny is expressed with Emotion but this gene is not present in VroomVroom. However, it does not need to be deleted because there is coding potential to be filled in this area. /note=Phamerator: The pham number is 85757 as of 10/28/2023. It is conserved in phage such as Emotion, a non-draft phage. Similar gene members of the cluster, but not the subcluster, include Adolin_44, Adumb2043_39, and Amyev_41. /note=Starterator: The start number called the most often in the published annotations is 4, and it was called in 23 of the 35 non-draft genes in the pham. Start 4, which corresponds to 32416, is present but not called in JasmineDragon. Instead, start 7 is called which is 32434. This evidence suggests that the site predicted by Glimmer and GeneMark is not the best option and 32416 is our best option. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 32416.The gene does not need to be deleted because the coding potential proves that there needs to be a gene there and the start site of this particular gene has all of the coding potential. Other factors include strong RBS and small overlap with the upstream gene as well as /note=Function call: NKF. I agree with the primary annotator. There is no significant evidence on HHPred, CDD, Phagesdb BLAST, or NCBI BLAST that gives us information on the function. No hits were given in CDD at all, and HHpred did not provide any significant hits so neither were informative. No Phagesdb/NCBI BLAST hits had a known function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. CDS 32601 - 32918 /gene="47" /product="gp47" /function="hypothetical protein" /locus tag="JasmineDragon_47" /note=Original Glimmer call @bp 32601 has strength 11.66; Genemark calls start at 32601 /note=SSC: 32601-32918 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_46 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.17257E-56 GAP: -13 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.063, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_46 [Arthrobacter phage VroomVroom]],,WIC90196,88.0734,1.17257E-56 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Zaragoza, Evelin /note=Auto-annotation: GeneMark and Glimmer. Both call the start at 32601, which is reasonable because, while not the longest ORF, the gene would have coding potential with a large Z-score over 2 and the highest final score that suggests the best sequence match. Furthermore, this start site is similar to other start sites in other phage (more explanation below). /note=Coding Potential: Chosen start site in forward direction has all the coding potential with the ORF showing reasonable potential despite not being the LORF. Start site at 32601 and end site have all of the coding potential so there is no need to push the start site back. No coding potential on the complementary (reverse). /note=SD (Final) Score: Final score is -2.584 with the z-score being 3.063. The z-score being over 2 is favorable and the final score is the highest out of all the other possibilities so this very much suggests this start site is likely. /note=Gap/overlap: There is little overlap with the upstream gene and no significant gap (anything over 50) with the next gene. The overlap is 12 bp and the gap with the next gene is 49. While the gap appears large, this is reasonable when looking at the synteny with related phage. Phage such as VroomVroom and Emotion have a gap in related areas. I also chose this start site despite it not being the LORF because the coding potential is within this ORF and the coding potential does not significantly overlap with that of the upstream gene. The high z-score and higher final score support this conclusion. /note=Phamerator: The pham number is 965 as of 10/28/2023. It is conserved in phage such as VroomVroom and Emotion; Adolin_44, Adumb2043_41, and Amyev_43 are members of the AZ cluster and of the pham but are not part of the same subcluster; all are non-draft genes. Most genes have lengths in the 300s like mine which further supports my start site. /note=Starterator: Start site 37 in Starterator was manually annotated in 87/104 non-draft genes in this pham. Start 37 is not present for JasmineDragon. This evidence suggests that the site predicted by Glimmer and GeneMark is still the best option. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 32601.The gene does not need to be deleted because the coding potential proves that there needs to be a gene there and the start site of this particular gene has all of the coding potential. /note=Function call: NKF. There is no significant evidence on HHPred, CDD, Phagesdb BLAST, or NCBI BLAST that gives us information on the function. No hits were given in CDD at all, and HHpred did not provide any significant hits so neither were informative. No Phgesdb/NCBI BLAST hits had a known function. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Bhattarai, Aryan /note=Secondary Annotator QC: I agree with the annotation and location calls made by the primary annotator. All of the evidence categories have been considered. I also agree with the function call made by the annotator. Note: the annotator needs to checkmark the evidence that supports their call on PECAAN (e.g. Phagesdb BLAST, HHPRED, NCBI BLAST). Also for the HHpred hits, the primary annotator needs to give the job a name. CDS 32967 - 33116 /gene="48" /product="gp48" /function="hypothetical protein" /locus tag="JasmineDragon_48" /note=Original Glimmer call @bp 32967 has strength 0.41 /note=SSC: 32967-33116 CP: yes SCS: glimmer ST: NA BLAST-Start: GAP: 48 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.985, -3.9067510144203146, no F: hypothetical protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kibria, Kamille /note=Auto-annotation: Only Glimmer called the gene and denoted the start site as 32967. GeneMark did not call the gene. /note=Coding Potential: Both host-trained and self-trained GeneMark agree that there is reasonable coding potential (less than or slightly above 0.5). The atypical coding potential in the host trained GeneMark goes above 0.5. The proposed start site in both self and host trained GeneMark includes all of the coding potential. /note=SD (Final) Score: -3.907. This isn’t necessarily the best RBS Final score in PECAAN, however the start site with the best Final Score of -3.095 has a length of 51 bp and a gap of 147 bp which would not make it the best candidate. /note=Gap/overlap: Gap is 48 which is larger than expected, but reasonable. When looking at the pham map, there is little space for a gene to be added and there is no earlier alternative start site proposed by PECAAN. Both the host and self trained GeneMark show no coding potential in the gap. No gene should be added. /note=Phamerator: Pham 100344. Date 10/30/23. There are only 2 members in this pham, both are draft genes. /note=Starterator: Start 1 (bp 32967) is the autoannotated start site. This evidence agrees with the site predicted by Glimmer. There are no manual annotations of this start. /note=Location call: Based on the evidence, this is a real gene with most likely start site at 32967. /note=Function call: NKF. Hits generated in PhagesDB Blastp were only draft genomes. There were no significant hits in NCBI Blastp. CDD generated no relevant hits. Hits generated in HHPred were not significant, as they did not meet the thresholds for probability, % coverage, or e value. We cannot assign a function to this gene. /note=Transmembrane domains: No TMDs detected my TMHMM or TOPCONS. Not a membrane protein. /note=Secondary Annotator Name: Hosford, Ryan /note=Secondary Annotator QC: The Gene 49 start 32967 seems to have some red flags, the host trained genemark shows very little coding potential, there is no synteny with VroomVroom or Emotion, the two final phages in this pham. Starterator has only one other gene in the start position and it is also a draft. the gap for the start site is larger than expected at 49 bp and the length is on the shorter end so although the start is the LORF, glimmer called the start site, the Z-score is 2.985 and the final score is -3.907, I think there may be an issue with the gene being real or not. Also need to select drop down options for starterator and coding capacity. /note=CDD had no hits whatsoever, HHpred dislpayed no significant hits with the best E-value at 37 which is well above the acceptable range, also the highest probability was 50% and coverage at 67% which is not favorable. no predicted TMDs either so all signs point to NKF just as the primary annotator called. CDS 33177 - 33989 /gene="49" /product="gp49" /function="DNA binding protein" /locus tag="JasmineDragon_49" /note=Original Glimmer call @bp 33177 has strength 19.69; Genemark calls start at 33132 /note=SSC: 33177-33989 CP: yes SCS: both-gl ST: SS BLAST-Start: [DNA binding protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 3.09286E-167 GAP: 60 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.964, -2.996490344739583, yes F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Arthrobacter phage VroomVroom]],,WIC90197,94.4444,3.09286E-167 SIF-HHPRED: RNA polymerase sigma factor RpoS; Transcription-activator, DNA/RNA, SigmaS, beta`, TRANSCRIPTION, Transferase-DNA complex; 3.26A {Escherichia coli},,,6OMF_F,95.9259,99.9 SIF-Syn: DNA binding protein; upstream gene is NKF and downstream gene is DNA binding protein, which is conserved in Emotion. /note=Primary Annotator Name: TRAN, MICHELLE /note=Auto-annotation: Glimmer designated the start site as 33177, while GeneMark designated the start site as 33132. The Gene Candidates section appears to have selected Glimmer’s (start codon: GTG). /note=Coding Potential: The most support for the coding potential in the ORF is on the forward strand, which supports this gene being a forward gene. This is consistent across both GeneMark Self and Host. The chosen start site also includes all of the coding potential. /note=SD (Final) Score: -2.996, which is the best one offered for this gene on PECAAN. /note=Gap/overlap: The gene has a 60-bp gap upstream. While the other approved phages in the database appear to have larger gaps, the alternative start site options that would accommodate this issue have too low of a final score and Z-value to be viable options. /note=Phamerator: As of November 15, 2023, this gene is part of pham 102451. This is conserved in all of the phages within JasmineDragon’s subcluster, of which Emotion and VroomVroom are non-draft phages that could be used for comparison. /note=Starterator: Start site 34, which corresponds to 33177 in JasmineDragon, was manually annotated in 3/107 of the non-draft genes in the pham. This evidence would only agree with the site predicted by Glimmer. /note=Location call: Based on the above evidence, the gene is most likely real and has a start site of 33177. /note=Function call: DNA-binding protein. This protein has a strong synteny with VroomVroom’s homologous protein, which is known to have a DNA-binding function. For both of the programs, the hit has a score of >400 and an e-value of <1e-165. The second strongest hit, Emotion’s homologous protein, also has a DNA-binding function; both programs give this protein a score of >200 and an e-value of <1e-65. HHpred and CDD also support this function; HHpred has a hit with a low e-value (<1e-23), while CDD has a hit with a higher but still relatively low e-value (<1e-6). /note=Transmembrane domains: This is not a membrane protein because DeepTMHMM does not predict any TMDs. /note=Secondary Annotator Name: CHAMORRO, MARCO /note=Secondary Annotator QC: I have QC`ed the location and I agree with the primary annotator`s assessment. The Glimmer start site has a better Z-score and Final Score compared to the GeneMark score. Additionally the GenemarkS graph shows that the Glimmer start site is within a area of high coding potential whereas the GeneMark start site is not. There is no synteny with other phages of the same cluster, so Pham Maps does not support either start site. Starterator shows support for the 33177 start site. Thus, there is high confidence that the primary`s annotator`s location call is correct. /note=For the Module 9 QC. I agree with the primary annotator`s function call. HHpred and the NCBI BLAST both appear to have strong evidence in support of the gene being a DNA binding protein. The TMD call is correct and the synteny notes are correct. CDS 34166 - 35137 /gene="50" /product="gp50" /function="DNA binding protein" /locus tag="JasmineDragon_50" /note=Original Glimmer call @bp 34166 has strength 14.48; Genemark calls start at 34166 /note=SSC: 34166-35137 CP: yes SCS: both ST: SS BLAST-Start: [DNA binding protein [Arthrobacter phage VroomVroom]],,NCBI, q1:s11 99.6904% 0.0 GAP: 176 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.763, -5.092497832398658, no F: DNA binding protein SIF-BLAST: ,,[DNA binding protein [Arthrobacter phage VroomVroom]],,WIC90198,92.8144,0.0 SIF-HHPRED: RNA polymerase sigma factor RpoS; Transcription-activator, DNA/RNA, SigmaS, beta`, TRANSCRIPTION, Transferase-DNA complex; 3.26A {Escherichia coli},,,6OMF_F,82.9721,100.0 SIF-Syn: DNA binding protein; upstream gene is DNA binding protein, downstream gene has an unknown function, just like in phage Emotion and VroomVroom. /note=Primary Annotator Name: Kim, Abby /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 34166. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -5.092. It is not the best final score on PECAAN (second best). /note=Gap/overlap: 10 bp gap. Not that large. The gap is conserved in other phages such as VroomVroom and Emotion. There is no coding potential in the gap that might be a new gene as well. /note=Phamerator: pham: 102451. Date 10/28/23. It is conserved and found in VroomVroom (AZ) and Emotion (AZ). /note=Starterator: Start site 31 in Starterator was manually annotated in 2 non-draft genes in this pham. Start 31 is 34166 in VroomVroom and this evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Based on the evidence, this is a real gene and the most likely start is 34166 which is called by Glimmer and Genemark. /note=Function call: DNA binding protein. BLAST results show significant alignment to DNA binding proteins from Arthrobacter phage (E-value 4e-60, 41.26% identity). This is corroborated by a high-probability HHpred match to an RNA polymerase sigma factor RpoS, which plays a role in DNA-dependent RNA transcription (99.96% probability, E-value 1.6e-25). The CDD analysis supports the presence of a sigma-70 domain, typically involved in DNA binding and recognition during the initiation of transcription (E-value 1.09e-05 to 2.81e-06). This evidence collectively indicates the ORF functions in DNA-binding related to transcription regulation. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Ryan, Kaitlin Josephine /note=Secondary Annotator QC: Everything looks great! Just remember to update the drop-down menus for Starterator and GM coding capacity. // I agree with all of the evidence listed in these notes, great job! Just remember to fill out the synteny box (11/15/23). CDS 35204 - 35473 /gene="51" /product="gp51" /function="hypothetical protein" /locus tag="JasmineDragon_51" /note=Original Glimmer call @bp 35204 has strength 14.93; Genemark calls start at 35204 /note=SSC: 35204-35473 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_EMOTION_51 [Arthrobacter phage Emotion]],,NCBI, q1:s1 100.0% 1.17827E-37 GAP: 66 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.303, -2.033982896655645, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_EMOTION_51 [Arthrobacter phage Emotion]],,WGH21400,85.3933,1.17827E-37 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: To, Nathan /note=Auto-annotation start source: Glimmer and Genemark both agree on start site 35204 with start codon ATG. /note=Coding Potential: Both Genemark Self and Host show coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. /note=SD (Final) Score: -2.034, this is the best RBS final score available. /note=Gap/overlap: 66bp, this is reasonable, it is the LORF, giving a gene length of of 270, which is acceptable. /note=Phamerator: This gene is in pham 80399 as of 10/29/31. This pham is in other members of cluster AZ including Emotion, and is in fact only present in phages in the AZ cluster. There is no function call for any genomes. /note=Starterator: The called start site is conserved among all other phages in the pham. The called start is start 2 @35819. 3/3 genomes call site #2. There is, however, only one final genome and 1 other draft genome to go off of. /note=Location call: The gathered evidence suggests that this is a real gene, with start site @32819. This gene is conserved, has good coding potential, and does not have large gaps before or after it. The start site 32819 seems most likely due to its conservation among other genomes in the same cluster. /note=Function call: NKF, due to mostly unreliable results. No BLAST hits called a function, nor did CDD. A HHpred PDB function call of mechanosensitive channel protein was called with a probability of 93.6 was found, but only had % coverage of 33.7079 and an e-value .32. This could be a possible candidate, but without other sources corroborating for now there is not enough strong evidence to call. /note=Transmembrane domains: 0, unable to determine if this would be supported by function due to NKF. /note=Secondary Annotator Name: Hon, Darren /note=Secondary Annotator QC: The coding potential was stated to be good, and the GeneMark maps support this claim. This potential is encompassed by the start site at 35204, and in comparison to phage Emotion as well as evidence from the GeneMark maps, this start site sufficiently encompasses the entire coding potential. The Final score was stated to be -2.034 with a Z-score of 3.303, both of which are optimal. The large gap of 66 was dismissed due to this gene candidate being the LORF. The only necessary addition is to indicate the “All GM Coding Potential” as yes. CDS 35594 - 36457 /gene="52" /product="gp52" /function="hypothetical protein" /locus tag="JasmineDragon_52" /note=Original Glimmer call @bp 35594 has strength 17.73; Genemark calls start at 35594 /note=SSC: 35594-36457 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_49 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 2.55973E-155 GAP: 120 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.985, -2.6013996449736907, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_49 [Arthrobacter phage VroomVroom]],,WIC90199,87.4564,2.55973E-155 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Valente, Nina /note=Auto-annotation: Glimmer and GeneMark both have a start listed as 35594. /note=Coding Potential: Coding potential on this ORF is forward strand only, indicating that this is a forward gene. Very large area covered in the direct sequence in Host Trained. Similar patterns observed in the Self Trained. There is no coding potential in the gaps near the gene. There are no gaps in the gene and all of the coding potential is within the suggested start site and the stop site. /note=SD (Final) Score: -2.601. It is the best final score on PECAAN /note=Gap/overlap: Gap 120 bp. Relatively large gap, not conserved in Emotion (very little synteny), or VroomVroom (more synteny than Emotion but still not much). /note=Phamerator: Pham 117964. Date 11/1/23. It is conserved, found in VroomVroom, but not in Emotion. /note=Starterator: The most annotated start site is 6, which is called by 7/13 non-draft genes in the pham. JasmineDragon also calls this start site. Suggested start is the same, 35594. /note=Location call: Based on the evidence, this is most likely a real gene (which lacks synteny) with a start site at 35594. /note=Function call: Every result from the PhagesDB Blastp is listed as a function unknown, including VroomVroom (e-value E-128, 78% identities) and CallinAllBarbz (e-value 8E-56, 46% identities). Every result from the NCBI Blastp is listed as a hypothetical protein, including VroomVroom (e-value 3E-155) and CallinAllBarbz (e-value 2E-49). There were no results for CDD. One result from HHpred (e-value 22) listed possible functions as coiled coil helix, zinc ribbon, or hypothetical protein. Based on this evidence, the function call is most likely NKF. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. /note=Secondary Annotator Name: Saud, Arnav /note=Secondary Annotator QC: I agree with this annotation. However, within the Coding Potential section I believe that the primary annotator needs to state the direction of the coding potential within the ORF. All of the other categories have been considered. The GM coding capacity menu have not been filled out yet. (11/2/23) /note=I agree with this annotation. However for the function call,the annotator needs to checkmark the phages that they used as their evidence for Phagesdb BLAST and NCBI BLAST. For both Phagesdb BLAST and NCBI BLAST, they need to list the number of good hits out of the top 5 hits. Also, they need to indicate the coverage, identity and e-value for the NCBI Blast hits. The annotator also needs to indicate if there were any good hits from HHPRED or CDD.The annotator also needs to fill out their synteny box. (11/16/23) CDS 36578 - 37183 /gene="53" /product="gp53" /function="SprT-like protease" /locus tag="JasmineDragon_53" /note=Original Glimmer call @bp 36578 has strength 14.22; Genemark calls start at 36578 /note=SSC: 36578-37183 CP: yes SCS: both ST: SS BLAST-Start: [SprT-like protease [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 5.8694E-132 GAP: 120 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.985, -2.6013996449736907, yes F: SprT-like protease SIF-BLAST: ,,[SprT-like protease [Arthrobacter phage VroomVroom]],,WIC90200,97.0149,5.8694E-132 SIF-HHPRED: SprT-like domain-containing protein Spartan; DPC repair protease, DNA BINDING PROTEIN; HET: FLC, MLZ, ADP; 1.5A {Homo sapiens},,,6MDW_A,93.0348,99.8 SIF-Syn: Sprt-like protease, upstream gene has same pham, downstream gene has same pham, as compared to phage VroomVroom /note=Primary Annotator Name: Laureano, Ryan /note=Auto-annotation start source: The start site is marked at 36578 for both Glimmer and Genemark. The start codon is listed as ATG. /note=Coding Potential: The coding potential is covered among >80% of the putative ORF from start to stop site. The 36578 start site covers all the coding potential. /note=SD (Final) Score: The SD score for the suggested start site is the lowest at -2.601. /note=Gap/overlap: The gap from the previous gene to this start site is 120. This is not the lowest gap but the start site with a lower gap has a much higher Z score at -8558. /note=Phamerator: The gene is in Pham 1210 as of 11/1/23. JasmineDragon_Draft_55 shares a cluster with Adolin_47 both of which are in Pham 1210. There is no called function for this gene but non-draft genes are called as proteases. /note=Starterator: There is a conserved start site at number 46. This start site number is not the same as the start site number for this gene which is 44. 61/102 genes call the most conserved start number of 46. /note=Location call: This gene seems to be real with sufficient length at 606 bp and coding potential to match. The Z-score is above 2 at 2.985 and the final score is the highest among the other start sites at -2.601 which indicates the start site of 36578 is the most likely. /note=Function call: The function for this gene is likely a sprt-like protease. The two best hits for phagesDB and NCBI pointed towards VroomVroom50 and Reedo46 with high probability, high coverage, and low e-values. These two genes are both sprt-like proteases. The HHpred, CDD, and PDB data show high probability (99%), high coverages (93 and 49%), and low e-values (e-17 and e-11). These hits are highly supported and point towards sprt-like proteases. This adds to the conclusion that this gene is a sprt-like protease. /note=Transmembrane domains: There are no TMRs. /note=Secondary Annotator Name: Zaragoza, Evelin /note=Secondary Annotator QC: /note=Auto-annotation: GeneMark and Glimmer. Both call the start at 36578 which does not produce the LORF but has the highest final score and a high z-score. The start codon is ATG which is not uncommon. This start site includes all of the coding potential. No other start site is chosen because the gaps are either too large, the gene too small, or the final/z-scores are not as favorable. /note=Coding potential: Start site 36578 includes all of the coding potential. There is no significant coding potential on the complementary strand so this is most likely a forward gene (which is also suggested by synteny with related phages VroomVroom and Emotion). /note=SD (Final) Score: The final score is the highest at -2.601. The z-score is also above 2. When compared to other final scores this start site seems to be one of the most reasonable. /note=Gap/overlap: There is a gap of 120 with the upstream gene, which may seem large but VroomVroom and Emotion have similar gaps. Synteny is expressed with Emotion and VroomVroom so this gap, while does not make for the most compact genome, may be reasonable. /note=Phamerator: The pham number is 1210 as of 10/28/2023. It is conserved in phage such as Emotion and VroomVroom, both non-draft phage. Similar gene members of the cluster, but not the subcluster, include Adolin_47, Adumb2043_44, and Amyev_48. /note=Starterator: The start number called the most often in the published annotations is 46, and it was called in 43 of the 82 non-draft genes in the pham. 46 corresponds to 32921. This start is not called in JasmineDragon. Instead, start 44, which correspond to our start of interest 36578, has 10 MA`s. This evidence suggests that the site predicted by Glimmer and GeneMark is the best option. /note=Location call: Based on the above evidence, this is a real gene and the most reasonable start site is 36578.The gene does not need to be deleted because the coding potential proves that there needs to be a gene there and the start site of this particular gene has all of the coding potential. Other factors of interest include strong RBS and z-scores. /note=Function call: Agree with the primary annotator. The function for this gene is likely a sprt-like protease due to significant hits across all databases. PhagesDB and NCBI pointed towards VroomVroom50 and Reedo46 with high probability, high coverage, and low e-values. HHpred, CDD, and PDB data also show high probability and coverage with low e-values. /note=Transmembrane domains: I have confirmed there are no TMDs predicted by DeepTMHMM so it is not a membrane protein. /note=Note: Primary annotator should try to be more descriptive in synteny box (refer to lab manual for example). CDS 37230 - 37544 /gene="54" /product="gp54" /function="hypothetical protein" /locus tag="JasmineDragon_54" /note=Original Glimmer call @bp 37314 has strength 15.96; Genemark calls start at 37314 /note=SSC: 37230-37544 CP: yes SCS: both-cs ST: NI BLAST-Start: [hypothetical protein SEA_VROOMVROOM_51 [Arthrobacter phage VroomVroom]],,NCBI, q29:s1 72.1154% 3.0618E-35 GAP: 46 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.915, -4.845388397338397, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_51 [Arthrobacter phage VroomVroom]],,WIC90201,86.8421,3.0618E-35 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Ryan, Kaitlin /note=Auto-annotation: Glimmer and GeneMark called the same start site of 37314. However, due to GeneMark output, gap size, and LORF, I believe there is more evidence that shows that the true start site is 37230. /note=Coding Potential: High coding potential in the forward direction as indicated by the graphical output from GeneMark S and GeneMark Host. /note=SD (Final) Score: -4.708. This is the least negative final score. /note=Gap/overlap: 46 base pair gap. There is no coding potential in this gap indicating the presence of another gene. /note=Phamerator: pham: 9258. Date: 10/28/23. It is conserved; found in VroomVroom (cluster AZ). /note=Starterator: Start site 5 was auto-annotated, but not manually annotated in any non-draft genes (or final genes). However, it is auto-annotated for the draft genes of JasmineDragon and MiniMommy and agrees with the prediction called by both Glimmer and GeneMark. /note=Location call: Based on the above evidence, this is a real gene and it`s start site is called at 37230. /note=Function call: No known function (NKF). Phages db BLAST returned matching hits with unknown function (with no E-value < 10^-29). NCBI Blast also called all hits as hypothetical proteins, with no E-value less than 10^-35 . HHPred also provided inconclusive evidence with no E-value < 4.8. CDD hits were irrelevant. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs; therefore, this is not a membrane protein. /note=Secondary Annotator Name: Carnes, Julianne /note=Secondary Annotator QC: This may need more attention. I think this is a real gene, however, the gap is quite large. There is coding potential that extends past the initial start site call. The start site is also only called in draft genes, and no non-draft genes have yet to call this particular start in Starterator. I do see that this gap is conserved among other phages in the AZ cluster, but I would suggest looking into start site 37230 based on feedback from the instructional team. In terms of function call, I agree with this annotation. There isn`t strong evidence pointing towards one gene function. PhagesDB Blastp had no hits with an e-value below 10-6 with a known function, CDD had no specific hits, NCBI Blastp had a few hits but with no known functions, and HHpred had no significant hits (e-value above threshold). Overall, I agree that this protein has no known function. Corrections: For NCBI Blastp hits listed on PECAAN, accession WIC90201 can be selected as evidence even though it is a hypothetical protein. CDS 37686 - 37970 /gene="55" /product="gp55" /function="hypothetical protein" /locus tag="JasmineDragon_55" /note=Original Glimmer call @bp 37686 has strength 20.2; Genemark calls start at 37626 /note=SSC: 37686-37970 CP: no SCS: both-gl ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_52 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 97.8723% 7.74491E-48 GAP: 141 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.063, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_52 [Arthrobacter phage VroomVroom]],,WIC90202,81.5534,7.74491E-48 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Sanchez, Kayla /note=Auto-annotation: Glimmer calls the start at 37686, meanwhile GeneMark calls the start at 37626. We will be using 37686 as the start site. /note=Coding Potential: Both GeneMark Self and Host show coding potential in the ORF. This coding potential is not great because only the stop codon is outside of the coding potential. /note=SD (Final) Score: -2.443. It is the best final score on PECAAN because it is the lowest negative value. /note=Gap/overlap: Gap of 141 bp. This gap is questionable because it is a fairly large gap, however, this gap remains conserved in other phages (VroomVroom). Acceptable gene length of 285bp. /note=Phamerator: pham: 4404. Date 11/01/2023. It is conserved; found in VroomVroom (AZ4) /note=Starterator: Start site 11 in Starterator was manually annotated in 16/16 non-draft genes in this pham. Start 11 is 37686 in JasmineDragon. This evidence agrees with the site predicted by Glimmer, but not GeneMark. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 37686. /note=Function call: Function unknown. The top two PhagesDB BLASTp hits have an unknown function (E-value <10^-16), and the top two NCBI BLAST hits have the function of a hypothetical protein. (47%+ identity, and E-value <10^-18). The hits given to us by HHpred all had very large e-values. /note=Transmembrane domains: DeepTMHMM does not predict any TMDs, therefore it is not a membrane protein. It predicts that the protein is found on the inside of the membrane. /note= /note=Secondary Annotator Name: Daniel, Mila /note=Secondary Annotator QC: I agree with this annotation. All of these evidence categories have been considered, and the primary annotator made sure to fill out the two drop-down menus. CDS 38032 - 38379 /gene="56" /product="gp56" /function="membrane protein" /locus tag="JasmineDragon_56" /note=Original Glimmer call @bp 38032 has strength 10.61; Genemark calls start at 38032 /note=SSC: 38032-38379 CP: yes SCS: both ST: NI BLAST-Start: [hypothetical protein [Frankiales bacterium]],,NCBI, q5:s33 95.6522% 1.24554E-40 GAP: 61 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.164, -4.354778061539587, no F: membrane protein SIF-BLAST: ,,[hypothetical protein [Frankiales bacterium]],,MCU1623523,58.0,1.24554E-40 SIF-HHPRED: SIF-Syn: There is no evidence to support synteny with both phages Emotion and VroomVroom. According to the pham maps, no lines indicate a synteny with any gene present in the phages. Therefore, it can be concluded that this gene was inserted into JasmineDragon. /note=PECAAN Notes /note=Primary Annotator Name: Hon, Darren /note=Auto-annotation: Gene 56 (stop@38379 F) /note=Coding Potential: Good coding potential as there is a single frame that demonstrates a protein-coding gene. The gene itself is more than 120 bp, but there is a gap of 61 bp. There is no gene orientation switches. /note=SD (Final) Score: -4.355, in comparison to the second final score, this final score has a better sequence match. /note=Gap/overlap: 61 bp, though the gap is large, this is the better gap compared to the second start site. /note=Phamerator: As of 11/1/23, there is only one available start site of 1, with the pham present only in MiniMommy_58. It is also in the same cluster. The base pair coordinate is at 38,032. According to GeneMark, the coding potential starting at this start site is good. Therefore, the start site is sufficient, as the second potential start site does not encompass the whole coding potential. /note=Starterator: As of 11/1/23, the auto-annotated start number is 1, with a starting position of 38,032. There is no final phams that this start site can be compared to. /note=Location call: Based on the evidence presented, this is a real gene with a stop site at 38,379 and the most likely start site at 38032 regardless of not showing up on the GeneMark maps, due to both Phamerator and Starterator indicating the presence of this LORF. /note=Function call: The phagedb blast hits did not reveal any evidence from non-draft phages, as all of them have e-values above 1. The NCBI blast, however, does show some results indicating a DUF898 family protein. However, this is not an approved protein function, and exterior resources indicate that this family consists of bacterial proteins of unknown function. Extraneous information resulting from CDD indicate that it is part of the DUF898 super family, and HHpred support this, though no crystal structure of these proteins is accessible on the website. /note=Transmembrane domains: There are 2 predicted TMRs, both of which are TMhelix and have an amino acid length of 20. Though this is a protein with an unknown function, the previous evidence of this being part of the DUF898 family of proteins do align with how this protein is potentially membrane protein. /note=Secondary Annotator Name: Kang, Alix /note=Secondary Annotator QC: If we look at the sample PECAAN notes, the format we can follow for “Auto-annotation” is: Both Glimmer and GeneMark call the gene and they agree on the start site at 38032 bp. SD (Final) Score is -4.355. It is not the best final score on PECAAN. The gap with the upstream gene is a little large at a 61 bp gap. However, this gap was seen in the other draft phage, such as MiniMommy. For “Phamerator”: pham: 100315. Date 11/02/2023. It is conserved; found in MiniMommy (AZ). For “Starterator”: There are only 2 draft members and no non-draft member of this Pham. 2/2 draft members call start site 1, which correlates to a start site of 38032 bp for JasmineDragon. For “Location call”: Based on the above evidence, this is a real gene and the most likely start site is 38032. /note= /note=For NCBI BLAST, the first annotator checked evidence for “MCU1623523” which is a hypothetical bacterial protein, and “WP_283559315” which is another bacterial protein. I do not think we should check these for evidence. DeepTMHMM predicts 2 TMDs. I agree with the first annotator that this gene can be assumed to have a real TMD and is, therefore, a “membrane protein.” However, the first annotator should check the evidence box because this protein has 2 TMDs. CDS 38704 - 39057 /gene="57" /product="gp57" /function="hypothetical protein" /locus tag="JasmineDragon_57" /note=Original Glimmer call @bp 38704 has strength 13.2; Genemark calls start at 38704 /note=SSC: 38704-39057 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_53 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 4.28316E-38 GAP: 324 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.379, -3.898968357315439, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_53 [Arthrobacter phage VroomVroom]],,WIC90203,75.8333,4.28316E-38 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Claire, Monjov /note=Auto-annotation: Both Glimmer and Genemark call the gene and claim a start site of 38704 /note=Coding Potential: The Host-Trained GeneMark for this gene shows amazing coding potential with more than 90% of the gene covered by potential. Also, the start site is just outside of the potential which is a good sign and the start includes almost all of the coding potential. /note=SD (Final) Score: -3.899. The least negative final score and it also had a z-score value of 2.379 which is the highest z-score value. The high final score and high z-score value indicate a great sequence match. These were the best score values on pecaan. /note=Gap/overlap: There is a 324 base pair gap which isn`t super normal This one isn`t a very useful indication of anything important because we don`t have much to compare it too. /note=Phamerator: 115693 (10/31/23), This gene is conserved in VroomVroom and Emotion. There is no indication of function and there is only 4 other phages in the pham so there isn`t much help there either. /note=Starterator: There are only 2 non-draft phages in Pham 115693 and they both call the same start site. They call start site 5 and that correlates to base pair start 39057 for JasmineDragon. /note=Location call: Based on the relatively little information I have, I still agree that the start site of 39057 is correct. These gene seems to be real and even though there are only 4 phages in the pham, it is conserved amongst all 4 (even though 2 are draft genomes). /note=Function call: VroomVroom and Emotion (both non draft) were both the two best hits on both webistes with e values much larger than 10^-6, but they both had functions unknown. Based off of those results, there isn`t a function matched for this gene and there isn`t much evidence to point towards a function either. I believe this gene is a real gene, but I cannot conclude there is a function known for it yet. CDD had no hits and HHpred had one but it was very light blue and not very convincing. The HHpred had 54 probability and 37 percent coverage. This is a pretty good indicator that we don`t know the function of this gene. /note=Transmembrane domains: No TMH`s called on DeepTMHMM. Though we don`t know the function of the protein, and I am assuming the protein is real, we can say with a level of certainty that it is not a transmembrane protein as the results don`t give any conclusive evidence. /note= /note=Secondary Annotator Name: Kalliomaa, Kira /note=Secondary Annotator QC: /note=I have QC’ed this annotation and agree with this annotation. However: /note= -For coding potential, I would mention if there is coding potential in GeneMark self and GeneMark Host. If it is missing from one note it. Also is the coding potential on the forward or reverse strand? /note= -SD– Good, but I would mention that it is the best on PECANN out of all the options. /note= -GAP: I would double check the gap, when I looked on 11/17 @ 8:56 am there was not a 3 bp overlap. Is the gap conserved on other genes? /note= - Phamerator: Do not use draft genomes at all! Only mention final/non-draft genomes. /note= - Location call: I would change the wording here. It sounds like you are not sure when we want to be absolutely sure. I would double check the manual if you wanted another way to phrase your location call statement. /note= - Function call: The evidence is there, but I would start with the function call and end with it vs. just end. We want to see right away what the function is. /note= - TMD : we already concluded that the gene is most likely real, saying that we assume it’s real makes it sound like we are unsure. We want to be sure, I would change the verbiage here and consult the annotations manual. CDS 39054 - 39932 /gene="58" /product="gp58" /function="hypothetical protein" /locus tag="JasmineDragon_58" /note=Original Glimmer call @bp 39054 has strength 18.48; Genemark calls start at 39054 /note=SSC: 39054-39932 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_54 [Arthrobacter phage VroomVroom]],,NCBI, q3:s1 91.7808% 5.66819E-82 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 2.985, -2.7423981586358774, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_54 [Arthrobacter phage VroomVroom]],,WIC90204,70.3297,5.66819E-82 SIF-HHPRED: SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: Bhattarai, Aryan /note=Auto-annotation: Glimmer Start and GlimmerMark Start are both 39054. /note=Coding Potential: There is high coding potential in the forward direction, as indicated by the graphical line of coding potential in both the Host-Trained GeneMark, and Self-Trained GeneMark. /note=SD (Final) Score: -2.742. ​​It is the best final score on PECAAN. /note=Gap/overlap: -4. Gene is likely part of an operon. /note=Location call: Based on the evidence and reasoning above, this is a real gene, with the start site being most probably at 39054. /note=Phamerator: As of 11/01/2023, the pham number is 48281. The gene is conserved in phages VroomVroom and Emotion, which are both in the same cluster (AZ4) as JasmineDragon. The function call for the gene is not explicitly stated, nor indicated when observing synteny among non-draft genes of the same subcluster. /note=Starterator: Start site 1 in Starterator was manually annotated in 2/2 non-draft genes in this pham. Start 1 in JasmineDragon is 39054. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 39054. /note=Function call: This gene has a strong synteny with non-draft phages VroomVroom and Emotion, which both have no known function for their homologous gene. This gene likely has no known function. The only strong hits on PhagesDB BLASTp and NCBI BLASTp correlate to genes with no known function. Additionally, CDD and HHpred are both uninformative. The CDD page for this gene is blank, and the HHpred hits only yield results with e-values >1. /note=Transmembrane domains: 0. This is not a membrane protein because DeepTMHMM does not predict any TMDs. Additionally, because this gene has no known function (NKF), and synteny cannot be used to compare it to the non-draft phages within its subcluster (VroomVroom and Emotion), it is currently not possible to determine this protein’s function beyond it being an internal protein. /note=Secondary Annotator Name: Woodward, Lauren /note=Secondary Annotator QC: I agree with the selected start site, based on the evidence presented above. The phagesdb blast section on PECAAN has a draft-gene selected as evidence. I agree with the function call. CDS 39929 - 40144 /gene="59" /product="gp59" /function="hypothetical protein" /locus tag="JasmineDragon_59" /note=Original Glimmer call @bp 39929 has strength 17.26; Genemark calls start at 39929 /note=SSC: 39929-40144 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein [Arthrobacter mobilis] ],,NCBI, q3:s4 94.3662% 8.17107E-4 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.299, -6.565547700744353, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein [Arthrobacter mobilis] ],,WP_168488253,53.7313,8.17107E-4 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Kang, Alix /note=Auto-annotation: Glimmer and GeneMark. Both call the start at 39929. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. Coding potential is found both in GeneMark Self and Host. /note=SD (Final) Score: -3.627. It is the best final score on PECAAN. /note=Gap/overlap: Overlap: 4bp. Gene is likely part of an operon. /note=Phamerator: pham: 100410. Date 11/01/2023. It is conserved; found in MiniMommy (AZ). /note=Starterator: There are only 2 draft members and there is no non-draft member of this Pham. 2/2 draft members call start site 2, which correlates to a start site of 39929 bp for JasmineDragon. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 39929. /note=Function call: No known function. Phagesdb BLAST, NCBI BLAST, CDD, and HHpred had no relevant hits. /note=Transmembrane domains: No known function. Phagesdb BLAST, NCBI BLAST, CDD, and HHpred had no relevant hits. /note=Secondary Annotator Name: Mahadev, Anirudh /note=Secondary Annotator QC: I have QC`d this gene and agree with the Primary Annotator`s call. The Host trained GeneMark shows good coding potential in the forward direction and it has high synteny with MiniMommy, which is also in the same Pham. I think the best justification for the start site is the coding potential graph, since it shows a spike in potential after the 39929 start site and no potential between 39647 and 39929. I agree with the call that it may be an operon with the overlap of 4 base pairs; that justifies why the start site at 39929 is chosen over the start site at 39647 even though the start site at 39647 has a higher Z score of 2.551 and a less negative final score of -3.627. The Starterator for this gene seemed vague with only two members of the Pham, both of which being drafts, but this start site was conserved in both genes so it should be the most likely/best start site. The only thing I would edit is to make sure you fill in the Starterator Drop Down Menu at the top with "suggested start". The following comments are for the Module 9 QC. I agree with the primary annotator’s call that this gene has NKF. Please make sure to select an option under the Starterator drop down menu. Everything else looks good to me. CDS 40141 - 40287 /gene="60" /product="gp60" /function="hypothetical protein" /locus tag="JasmineDragon_60" /note=Original Glimmer call @bp 40141 has strength 19.84; Genemark calls start at 40141 /note=SSC: 40141-40287 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_56 [Arthrobacter phage VroomVroom]],,NCBI, q1:s16 83.3333% 7.59048E-19 GAP: -4 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.063, -2.583959800616441, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_56 [Arthrobacter phage VroomVroom]],,WIC90206,63.4921,7.59048E-19 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Saud, Arnav /note=Auto-annotation: Glimmer and Genemark both call the start at #40141. /note=Coding Potential: Coding potential in this ORF is on the forward strand only, indicating that this is a forward gene. High coding potential is seen on both GenemarkHost and GenemarkSelf. /note=SD (Final) Score: -2.584. This is the best final score called by PECANN. /note=Gap/overlap: There is an overlap of 4 bps between the the start site of the gene of interest (40141) and the previous gene`s stop (40144). This small overlap may suggest an operon. Additionally, there is an overlap of 8 bps between the stop site of the gene of interest (40,287) and the start site of the next gene (40,280). /note=The gap is small, and when looking at other phages within the AZ cluster (Adolin, DrManhattan), this arrangement of genes with small overlaps is conserved. Also, when looking at the coding potentials on both GenemarkHost and Self, one can see that the coding potentials drop sharply at predicted stop sites and increase sharply at predicted start sites, indicating that the gene of interest is actually its own gene. /note=Phamerator: pham: 121348 Date: 11/1/2023. The gene is conserved; found in Adolin (AZ1) and DrManhattan (AZ1) /note=Starterator: Date: 11/1/2023. The most annotated start site of 6 was not called for this gene. Instead, Start site 14 was manually annotated in 4/147 non-draft genes in this pham, indicating that this may be the best start site for this gene. Start 14 is 40140 in JasmineDragon. This evidence agrees with the site predicted by Glimmer and Genemark. /note=Location call: Based on the above evidence, this gene is real and the most likely start site is 40140. /note=Function call: unknown function. The top 5 phagesDB BlastP hits have unknown function listed(E-value < 3e^-8), and the top 5 NCBI BLAST hits also have been assigned unknown function( +64% coverage, +56% identity, and e-value < 4e-7). HHpred and CDD had no hits for this gene. /note=Transmembrane domains: DeepTMHMM does not predict any transmembrane domains for /note=this gene; therefore, it is not a membrane protein. /note= /note=Secondary Annotator Name: Salinas Pinon, Juan Carlos /note=Secondary Annotator QC: /note= /note=Auto-annotation: Both Glimmer and GeneMark agree and call this start site at 40141 /note=Coding Potential: There is coding potential in the forward direction and the called start site is covered by this coding potential. There is no coding potential in the complementary direction, suggesting that this is a forward gene. The coding potential suggests that this is the correct start site. /note=SD (Final) Score: The final score is -2.584. This is the highest score on PECAAN. /note=Gap/overlap: -4. This is an overlap that may suggest an operon. The primary annotator starters a gap of 3bps, which needs to be updated. There are no significant gaps or overlaps that would suggest that this gene needs to be deleted or adjusted. /note=Phamerator: pham: 121348 as of 11/02/2023. This pham is mostly conserved in the E cluster and is not prevalent in the AZ cluster (JasmineDragon). This leads me to believe that conservation is not strong evidence for this start site; however, this does not indicate any information against that start site. /note=Starterator: 11/02/2023. The most annotated start site (6) was not called for this gene; however, start site 14 was manually annotated 4 times, suggesting that this is the best start call at 40141. /note=Location call: I agree with the primary annotator and would also call this gene at 40141 given the evidence above. I would revise the information provided regarding the gap. /note=Function Call: NKF. I would agree with the primary annotator. All significant evidence, such as hits with e-value less that e-3 suggest that this function is unknown. No consistent function was found between the various sources and most call it unknown or hypothetical function. /note=Transmembrane domains: 0. No transmembrane domains suggests that this is not a membrane protein. CDS 40280 - 40492 /gene="61" /product="gp61" /function="hypothetical protein" /locus tag="JasmineDragon_61" /note=Original Glimmer call @bp 40280 has strength 12.17; Genemark calls start at 40280 /note=SSC: 40280-40492 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_58 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 98.5714% 1.50396E-39 GAP: -8 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 1.529, -6.097136544885063, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_58 [Arthrobacter phage VroomVroom]],,WIC90208,97.1429,1.50396E-39 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Woodward, Lauren /note=Auto-annotation: Glimmer and Genemark; Both marked start site as 40280 /note=Coding Potential: Coding potential in the forward direction in the first ORF and in reverse direction in the second ORF. Forward direction had higher coding potential that aligned more closely with the gene start/stop site. Potential in both Host and Self Genemark outputs. /note=SD (Final) Score: -6.097; It is the only score listed on PECAAN /note=Gap/overlap: Overlap of 8bp; The same overlap is seen in the VroomVroom gene with which this gene has synteny /note=Phamerator: Pham: 100283. Date: 10/30/23; Conserved in Cluster AZ in member VroomVroom /note=Starterator: Start site 2 in Starterator was manually annotated in 1/1 non-draft genes in this pham. Start 2 is at position 40280 in JasmineDragon. Glimmer and GeneMark also predicted this as the start site. /note=Location call: Based on the above evidence, this is a real gene and the most likely start site is 40280. /note=Function call: NKF. Phagesdb Blastp and NCBI Blastp each had one significant, non-draft phage hit, but the gene, VroomVroom_58 did not have a function listed. There were no significant CDD or HHpred hits. /note=Transmembrane domains: This gene has no TMDs. /note=Secondary Annotator Name: Claire, Monjov /note=Secondary Annotator QC: Both Glimmer and GeneMark call the start site 40280. When looking at the coding potential for the Host-Trained GeneMark and GeneMarkS, there is high coding potential in the forward direction which we’d expect because it is a forward gene, but there is also coding potential in the reverse direction which isn’t normal. There is more coding potential in the forward direction compared to reverse so it is a better indicator that the gene is a forward gene and the potential indicates it is real. There was only one final score in PECAAN -6.097 with a z-score of 1.529. The final score is pretty negative which isn’t a good sign, and the z-score is relatively high but not as high as we’d like to be certain of the results. There is an overlap of 7 base pairs from the upstream gene, and that is an acceptable value which is not out of the ordinary. VroomVroom is the only non-draft genome in the same pham as JasmineDragon and that overlap is conserved there. JasmineDragon is a part of Pham 100283 with 3 total members, 1 of which are non-draft genomes. All 3 members are AZ4 sub cluster phages and all have a length of 213 base pairs (which includes the one non-draft genome VroomVroom). The most annotated start site was 2 and that is the same as the start site for JasmineDragon as well. Start site 2 correlates to base pair start 40280. The one non-draft genome was manually annotated and agreed with the start site. Looking at PhagesDB, there isn’t a function for any of the other phages, which leads me to believe that there is no known function. VroomVroom was the only non draft phage that we could use for analysis, and even VroomVroom didn’t have a known function for the gene. VroomVroom is very similar to JasmineDragon so it is strong evidence that the most similar phage also has an unknown function for the gene in question. HHpred also was not very helpful as the lowest e-value was 4.6 which is much larger than what we would want to use as evidence. Everything leads me to believe that there isn’t a known function yet to this gene, but hopefully with more sequencing and time, there will be. For transmembrane proteins, we can also say this is gene does not code for transmembrane proteins as there were no predicted TMH’s. Based on everything I’ve seen and looked up, I think that these gene is definitely real and has a start site of 2 that correlates to base pair start 40280. The coding potential even though there is some in the reverse is high, both GeneMark and Glimmer call the same start site, all of the phages in the cluster at AZ4, and the overlap is minimal. My conclusion is that I agree with the primary annotator (Lauren) that this is a real gene with start site 2 at base pair start 40280. I also agree with SS and “yes” CDS 40489 - 40872 /gene="62" /product="gp62" /function="hypothetical protein" /locus tag="JasmineDragon_62" /note=Original Glimmer call @bp 40489 has strength 12.43; Genemark calls start at 40489 /note=SSC: 40489-40872 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_59 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 92.126% 7.68647E-68 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.106, -4.744199388040119, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_59 [Arthrobacter phage VroomVroom]],,WIC90209,84.252,7.68647E-68 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Woodward, Lauren /note=Auto-annotation: Glimmer and Genemark; Both marked start site as 40489 /note=Coding Potential: Coding potential in the forward direction in the first ORF and in reverse direction in the third ORF. Forward direction had higher coding potential that aligned more closely with the gene start/stop site. Potential in both Host and Self Genemark outputs. /note=SD (Final) Score: The final score was the least negative option at -4.744. The z-value was the highest available at 2.106. /note=Gap/overlap: There is a 4bp overlap with the previous gene. This gap is preferred by ribosomes and indicates that this gene is part of an operon. /note=Phamerator: Pham: 48483. Date: 10/30/23; Conserved in Cluster AZ in members Emotion and VroomVroom /note=Starterator: Start site 2 in Starterator was manually annotated in 2/2 non-draft genes in this pham. Start 2 is at position 40489 in JasmineDragon. Glimmer and GeneMark also predicted this as the start site. /note=Location call: Considering the above evidence, this is a real gene, and the start site is 40489. /note=Function call: NKF. Phagesdb Blastp and NCBI Blast each had two significant, non-draft phage hits, but the genes, VroomVroom_59 and Emotion_63 did not have functions listed. There were no significant CDD or HHpred hits. /note=Transmembrane domains: This gene has no TMDs. /note= /note=Secondary Annotator Name: Labib, Youstina /note=Secondary Annotator QC: /note=Auto-annotation: Both Glimmer and GeneMark call the start site to be 40489 bp. This match in both systems is strong and likely confirms start site. /note=Coding Potential: There is strong coding potential indicated by the positive signal in the ORF of the forward gene. This confirms that the gene is real and there is prominent coding potential. This is observed in both Host and Self GeneMark data sets. /note=SD (Final) Score: The SD (final) score is -4.744 which is the best of the options listed. The idea fo SD final score is for the selection to have the least negative value and of the options listed, the start site of 40489 has the best SD final score. This selection also has the best z-score of 2.106, the highest z-score meeting the over 2 threshold criteria. /note=Gap/Overlap: This gene has a 4 basepair overlap upstream, this is likely indicative of an operon and shows no concerns. This is such a minor overlap it raises no concerns in the genome annotation process. /note=Phamerator: 11/3/23 This gene belongs to the pham 48483 which has 3 other members within it. This phage belongs to the AZ cluster and amongst the other phages in the same phamily other matches in the AZ cluster include MiniMommy and VroomVroom. /note=Starterator: 11/3/23 The start most called for is start 2 which was noted in 2 published annotations. This start was called in 2 of 2 non-draft genes in the pham. This suggests the start site to be 40489. /note=Location Call: Based on the evidence, the suggested start site is 40489. /note=Function call: The original function call appears correct. I agree that this gene has no known function as there is no evidence supported by any of the databases including Phagesdb function frequency, Phagesdb Blast, NCBI Blast, or HHpred calls. /note=Transmembrane domains: Deep TMHMM does not provide evidence of TMSs thus we can confirm that this is not a membrane protein. /note=Note: 11/16/23 The location and function call are both correct and there is no evidence of TMDs thus validating this as NKF. CDS 40869 - 40967 /gene="63" /product="gp63" /function="membrane protein" /locus tag="JasmineDragon_63" /note=Genemark calls start at 40869 /note=SSC: 40869-40967 CP: yes SCS: genemark ST: SS BLAST-Start: GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.985, -2.9525085049809894, yes F: membrane protein SIF-BLAST: SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Janga, Geethika /note=Auto-annotation: Only GeneMark calls for a start site, which is at position 40869 bp and has a start codon of ATG. /note=Coding Potential: This ORF has good coding potential on the forward strand in both the Self and Host Genemark; the chosen start site contains all the coding potential. /note=SD (Final) Score: The final score at our start site is -2.953 with a corresponding Z-score of 2.985 and the start codon was ATG. This was not the LORF but it had the best final score and one of the best Z-scores. /note=Gap/overlap: The overlap with the upstream gene is only 4bp, which is small overlap, especially compared to the possible start sites which have significantly larger overlaps, at 238bp, or gaps of 50 or 68bp. This overlap was also conserved in other phages of this cluster. /note=Phamerator: As of 11/26/23, this gene belongs to pham 100474 with this gene being the only member. Since there are no other phages which contain genes belonging to this pham, we cannot compare our gene to other phages in the cluster for conservation. /note=Starterator: Due to this being such a small pham, there are only two non-draft phages and neither of them call for the same start site as our gene does, at 40869 bp. /note=Location call: Although the Pham report for this start site was still not generated, the start site called by GeneMark at 40869 bp seems to be the best, as all the coding potential is contained within this site and the accompanying start codon, ATG, is also the most common. /note=Function call: This gene has no known function with no hits in either phagesDB blast or with NCBI BLASTp. All HHPRED hits had too high of e-values to be considered. /note=Transmembrane domains: There were no predictions about TMDs in DeepTMHMM, so this is not a membrane protein. /note=Secondary Annotator Name: Hosford, Ryan /note=Secondary Annotator QC: I feel that the start sight of 40,869 is correct because it is at an overlap of -4 which signifies it is within an operon. Also, the Host-trained GeneMark shows full coding potential within the region and looking at the Pham maps comparison with Emotion, there seems to be a similar gene in the same location. Lastly in support of the start sight, the start codon is an ATG which is one of the more common start codons. Otherwise, GeneMark called the site but not Glimmer and the Starterator report has an error 404 message. For the function of the gene, I am leaning towards no known function NKF because HHpred has some hits that have decent probability and coverage but the E-values are way too high. This is also an orpham with no other phams to compare to. I did see a similar gene in Emotion which is in the same cluster AZ as JasmineDragon. CDD had no hits, but DeepTMHMM shows that the protein is located in a TMD. likely helix within the membrane. CDS 41082 - 41342 /gene="64" /product="gp64" /function="hypothetical protein" /locus tag="JasmineDragon_64" /note=Original Glimmer call @bp 41082 has strength 19.87; Genemark calls start at 41082 /note=SSC: 41082-41342 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_61 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 3.1005E-23 GAP: 114 bp gap LO: no RBS: Kibler 6, Karlin Medium, 3.063, -2.442961286954254, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_61 [Arthrobacter phage VroomVroom]],,WIC90211,70.9302,3.1005E-23 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Janga, Geethika /note=Auto-annotation: Both Glimmer and GeneMark call for the same start site, which is at position 41082 bp and has a start codon of ATG. /note=Coding Potential: This ORF has good coding potential on the forward strand in both the Self and Host Genemark; the chosen start site contains all the coding potential. /note=SD (Final) Score: The final score at our start site is -2.443 with a corresponding Z-score is 3.063 and the start codon was ATG. Although this was not the LORF, it does have the best final and Z-scores out of all possible start sites. /note=Gap/overlap: The gap with the upstream gene is 115 bp, which is not too significant a gap, considering that there was no coding potential in this area and that this gap was conserved in other phages of the cluster. /note=Phamerator: As of 11/01/23, this gene belongs to pham 48339 with 4 members; the pham our gene belongs to was conserved in other non-draft phages of the cluster (AZ4), such as phages Emotion and VroomVroom. /note=Starterator: Due to this being such a small pham, there are only two non-draft phages and neither of them call for the same start site as our gene does, at 41082 bp. /note=Location call: Considering the fact that both Glimmer and GeneMark call for the same start site, at 41082bp, and that this is also the most manually annotated start site, this is a real gene which starts at 4102bp. /note=Function call: This gene has no known function, the best result from phagesDB blast had conserved length, 58% identities, and an e-value of 5e-21, and it is also a protein of unknown function. The best NCBI BLASTp result had conserved length, 58.1% identities, and an e-value of 3e-23, which was a hypothetical protein. CDD provided no hits and all the HHPRED hits had high e-values and low probabilities. /note=Transmembrane domains: There were no predictions about TMDs in DeepTMHMM, so this is not a membrane protein. /note=Secondary Annotator Name: Darren Hon /note=Secondary Annotator QC: Both GeneMark and Glimmer support the claim that there is good coding potential. The gene has a length of 261 with a gap of 114 and a spacer of 10. The large gap does not have any coding potential according to both GeneMark and Glimmer, thereby nullifying this discrepancy. The Z-score and final score were good, with values of 3.063 and -2.443 respectively. The start codon is ATG. This was not the LORF, but the LORF does not encompass the entire coding potential. There were strong blast hits, in which the e-value was 5e-21 for VroomVroom. Synteny is present with both non-draft phages VroomVroom and Emotion. The Pham number is 48339 as of 12/8/23, and indicated that the start number called the most often is 7, at start site 41082. CDS 41521 - 42096 /gene="65" /product="gp65" /function="hypothetical protein" /locus tag="JasmineDragon_65" /note=Original Glimmer call @bp 41521 has strength 15.35; Genemark calls start at 41521 /note=SSC: 41521-42096 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_62 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 98.4293% 6.81905E-36 GAP: 178 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.303, -1.953940808934884, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_62 [Arthrobacter phage VroomVroom]],,WIC90212,57.2139,6.81905E-36 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Maddineni, Saineha /note=Auto-annotation: Glimmer and Genemark agree on start site # auto-annotated as 41521 bp. Start codon is GTG. /note=Coding Potential: Reasonable coding potential in the first frame is present in self-trained and host-trained GeneMark. Start site covers all coding potential. /note=SD (Final) Score: SD score of -1.954 is the best and highest for sequence match. The corresponding Z-score of 3.303 is higher than 2, indicating a good sequence match for the start site. /note=Gap/overlap: The gap of 178 bp is reasonable since it prevents any overlap between coding frames and there is also no coding potential to be covered in the gap. The length of the gene (576 bp) is acceptable as the start site covers the coding potential and it is within the ORF. /note=Phamerator: 11/2/2023: pham 48567. Pham in which gene is conserved is in other members of the same cluster AZ, such as Emotion and VroomVroom. No function is called for the gene. /note=Starterator: Start site choice conserved among pham members is site #7. Start site #7 is at 41521 bp coordinate in JasmineDragon. There are 4 total pham members and 2/4 non-draft members call site #4. All draft members also call site #4. /note=Location call: Evidence suggests it is a real gene with good coding potential and conserved in phamerator and starterator. Likely start site is #4 at bp coordinate 41521 with GTG codon. /note=Function call: Predicted function is unknown (NKF). 2 hits from phagesDB BLAST had low e-values (1e-35 and 3e-29): VroomVroom and Emotion. The same 2 hits were found from phagesDB BLAST with high query coverage (98% and 95%), low e-values (7e-36 and 5e-25), and decent coverage (43.22% and 40.59%). CDD and HHPRED had no informative results. DeepTMHMM indicated that it is a globular protein. /note=Transmembrane domains: No predicted TMD’s in DeepTMHMM, which indicates that it is not a membrane protein. Even though the function is unknown, we can predict that it is more of globular structural protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 42093 - 42320 /gene="66" /product="gp66" /function="hypothetical protein" /locus tag="JasmineDragon_66" /note=Original Glimmer call @bp 42099 has strength 11.45; Genemark calls start at 42099 /note=SSC: 42093-42320 CP: yes SCS: both-cs ST: SS BLAST-Start: [hypothetical protein SEA_EMOTION_68 [Arthrobacter phage Emotion]],,NCBI, q1:s1 100.0% 5.12342E-20 GAP: -4 bp gap LO: no RBS: Kibler 6, Karlin Medium, 1.878, -4.857784300446364, no F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_EMOTION_68 [Arthrobacter phage Emotion]],,WGH21417,74.6835,5.12342E-20 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Maddineni, Saineha /note=Auto-annotation: Glimmer and GeneMark both call the auto-annotated start site #15 at 42099 bp. Start codon called is ATG. However, the best start site is at 42093 bp with GTG as the start codon. /note=Coding Potential: Reasonable coding potential in the third frame is present in self-trained and host-trained GeneMark. Start site at 42093 bp covers all coding potential. /note=SD (Final) Score: For chosen start site at 42093 bp, SD score of -4.858 is the highest and best with a Z-value of 1.878. The gene is suggested to be organized into an operon due to the 4bp overlap, making it a good start site with a credible ribosome binding site. /note=Gap/overlap:For chosen start site at 42093 bp, the overlap of 4bp is reasonable since it indicates an operon. The alternative start site of 41694 bp creates a longer ORF with a better Z-score but has an unreasonably large overlap and not the best RBS score. The 4 bp overlap and higher RBS score of the chosen start site make it the best candidate. Length of gene of 228 bp is acceptable. /note=Phamerator: 11/14/2023: pham 3945. Pham in which gene is conserved is in other members of cluster AZ, such as Adolin and DrManhattan. No function is called for gene. /note=Starterator: Start site #13 is most conserved among members as it’s called in 11/19 non-draft genes, but is not found in JasmineDragon gene. Start site #15 is the conserved site in JasmineDragon gene at 42093 bp and 2/24 members call site #15. The chosen start site #11 is at 42093 bp coordinate in JasmineDragon gene and has no manual annotations yet but is found in 2/24 members. Starterator may not be as informative for the start site call since the best start site of 42093 bp at site #11 has no manual annotations. /note=Location call: Evidence suggests it is a real gene with good coding potential and conserved in phamerator. Likely start site is #11 at bp coordinate 42093 with GTG codon since the 4bp overlap is preferred as it is indicative of an operon and a long ORF is created with the highest SD score. /note=Function call: Based on hits, function is NKF. 2 significant hits from phagesDB BLAST of unknown functions with low e-values (4e-18 and 2e-17): Emotion and VroomVroom. 2 hits from NCBI BLASTp of unknown function with high query coverage (100% and 98%), low e-values (5e-20 and 7e-20), and good % identity (62.03% and 61.36%): Emotion and VroomVroom. No significant results from CDD or HHPRED. Lack of TMD’s from DeepTMHMM indicates a globular protein. /note=Transmembrane domains: No transmembrane regions predicted from DeepTMHMM, so it is not a membrane protein. Instead, it is predicted to be a globular protein. /note=Secondary Annotator Name: LastName, FirstName /note=Secondary Annotator QC: CDS 42320 - 42616 /gene="67" /product="gp67" /function="hypothetical protein" /locus tag="JasmineDragon_67" /note=Original Glimmer call @bp 42320 has strength 12.35; Genemark calls start at 42320 /note=SSC: 42320-42616 CP: yes SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_64 [Arthrobacter phage VroomVroom]],,NCBI, q12:s14 88.7755% 1.71474E-32 GAP: -1 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.057, -3.284137222879636, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_64 [Arthrobacter phage VroomVroom]],,WIC90214,71.5789,1.71474E-32 SIF-HHPRED: SIF-Syn: /note=Primary Annotator Name: Estampa, Julia /note=Auto-annotation: Glimmer and Genemark both call the gene and indicate that the start site is at 42320 bp. The start codon is ATG. Host-Trained and Self-Trained GeneMark both reflect relatively high coding potential that is consistent with the ORF. The chosen start site covers all the coding potential. /note=Coding Potential: The gene has reasonable coding potential predicted within the putative ORF. /note=SD (Final) Score: The SD score is -3.284 and is the best from the list, and this start site includes the LORF. /note=Gap/overlap: The overlap of 1 bp (-1 gap) long upstream of the gene, which is small and reasonable. /note=Phamerator: Pham: 122787. Date found: 11/11/23. There are 6 clusters represented in this pham: FP, AZ3, AZ1, AZ2, UNK, AZ4. Function unknown was predominantly called for most phages belonging in this pham, with the exception of five cluster AZ phages and one cluster FP phage that called HNH endonuclease. /note=Starterator: The start number called the most often in the published annotations is 19 and it was called in 9 of the 39 non-draft genes in the pham. However, JasmineDragon was called for start number 32, which was found in 2 of 59 ( 3.4%) of genes in the pham. There are no manual annotations for this start number so far but start number 32 was called 100% of the time when present. This evidence agrees with the site predicted by Glimmer and GeneMark. /note=Location call: Gathered evidence suggests this is a real gene with good coding potential and that the strongest candidate for the start site is 42320 bp. This start site is consistent with the ORF. /note=Function call: NKF. Top phagesDB BLAST hit revealed “function unknown” with an E-value of 1e-28, followed by hits that called the function “HNH endonuclease” with E-values of 2e-14 and 9e-14. NCBI BLAST hits revealed similar results suggesting NKF and HNH endonuclease, with the highest hit being NKF against cluster AZ phage VroomVroom. No CDD hits returned. In HHPred, the top hit suggested restriction endonuclease function with a probability of 96.25% and a poor E-value of 0.029. Ultimately, the majority of gathered evidence suggests NKF. /note=Transmembrane domains: Since there are no predicted TMHs or TMDs returned from PECAAN and DeepTMHMM, it is not a membrane protein. /note=Secondary Annotator Name: /note=Secondary Annotator QC: tRNA 42824 - 42900 /gene="68" /product="tRNA-Trp(cca)" /locus tag="JASMINEDRAGON_68" /note=tRNA-Trp(cca) CDS 43056 - 43256 /gene="69" /product="gp69" /function="hypothetical protein" /locus tag="JasmineDragon_69" /note=Original Glimmer call @bp 43056 has strength 8.64; Genemark calls start at 43056 /note=SSC: 43056-43256 CP: no SCS: both ST: SS BLAST-Start: [hypothetical protein SEA_VROOMVROOM_66 [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 7.26839E-19 GAP: 439 bp gap LO: yes RBS: Kibler 6, Karlin Medium, 3.063, -2.523003374675015, yes F: hypothetical protein SIF-BLAST: ,,[hypothetical protein SEA_VROOMVROOM_66 [Arthrobacter phage VroomVroom]],,WIC90215,75.3623,7.26839E-19 SIF-HHPRED: SIF-Syn: /note=PECAAN Notes /note=Primary Annotator Name: Bhattarai, Aryan /note=Auto-annotation: Glimmer Start and GlimmerMark Start are both 43056. /note=Coding Potential: There is high coding potential in the forward direction, as indicated by the graphical line of coding potential in both the Host-Trained GeneMark, and Self-Trained GeneMark. /note=SD (Final) Score: -2.523. It is the best final score on PECAAN. /note=Gap/overlap: 439 bps. Gene isn’t likely part of an operon. There is a slight peak observed from 42350, however, there is no graphical line of coding potential in both the Host-Trained GeneMark, and Self-Trained GeneMark. No extra gene needs to be added in this gap. /note=Phamerator: As of 11/15/2023, the pham number is 48236. The gene is conserved in phages VroomVroom and Emotion, which are both in the same cluster (AZ4) as JasmineDragon. The function call for the gene is not explicitly stated, nor indicated when observing synteny among non-draft genes of the same subcluster. /note=Starterator: Start site 1 in Starterator was manually annotated in 4/4 non-draft genes in this pham. Start 1 in JasmineDragon is 43056. This evidence agrees with the site predicted by both Glimmer and GeneMark. /note=Location call: Based on the evidence above, this is a real gene and the most likely start site is 43056. /note=Function call: /note=Transmembrane domains: /note=Secondary Annotator Name: Kayla Sanchez /note=Secondary Annotator QC: I agree with the primary annotator after completing my QC, however it is worth noting that the gap for this gene is large and could include a gene. The function call and transmembrane domains have not been updated by the primary annotator. CDS 43243 - 43605 /gene="70" /product="gp70" /function="HNH endonuclease" /locus tag="JasmineDragon_70" /note=Original Glimmer call @bp 43243 has strength 0.12; Genemark calls start at 43513 /note=SSC: 43243-43605 CP: yes SCS: both-gl ST: SS BLAST-Start: [HNH endonuclease [Arthrobacter phage VroomVroom]],,NCBI, q1:s1 100.0% 1.12315E-71 GAP: -14 bp gap LO: no RBS: Kibler 6, Karlin Medium, 2.905, -2.827683592113848, yes F: HNH endonuclease SIF-BLAST: ,,[HNH endonuclease [Arthrobacter phage VroomVroom]],,WIC90216,91.7355,1.12315E-71 SIF-HHPRED: HNH endonuclease; Thermophilic bacteriophage, HNH Endonuclease, DNA nicking, HYDROLASE; 1.52A {Geobacillus virus E2},,,5H0M_A,62.5,95.7 SIF-Syn: HNH endonuclease, upstream gene is NKF, pham #48236, just like in phages VroomVroom and Emotion. /note=Primary Annotator Name: Woodward, Lauren /note=Auto-annotation: Glimmer marked the start site as 43243. Genemark marked the start site as 43513. /note=Coding Potential: The gene has some coding potential in the forward direction in the first ORF based on the host-trained Genemark. There was still potential in the self-trained Genemark, but it was lower overall. /note=SD (Final) Score: The final score was the least negative option at -2.828. The z-value was the highest available at 2.905. /note=Gap/overlap: There is an overlap of 14 base pairs between this gene and the gene preceding it. This overlap is conserved in the gene from VroomVroom (stop@43164 F) which this gene has synteny with. /note=Phamerator: Pham: 121172. Date: 11/01/23; This pham has 965 members, and this gene is conserved in 46 members of Cluster AZ. /note=Starterator: Start site 143 is the most conserved, but was not a candidate for this gene. Start site 137 was auto-annotated by Glimmer, and has been manually called in 6/927 phages, including VroomVroom and Emotion. Start site 170 @43318 has 114 manual annotations, but this site doesn’t include all of the coding potential and would leave a gap of 61. /note=Location call: The gene has coding potential and synteny with genes from published phage genomes VroomVroom and Emotion. Considering the above evidence, this is a real gene, and the start site is 43243. /note=Function call: HNH endonuclease. The top 42 non-draft phagesdb BLAST hits all have the function of HNH endonuclease (E-value <10^-32). The top 2 NCBI BLAST hits also had the function of HNH endonuclease (100% coverage, 70%+ identity, E-value <10^-50). CDD had one specific hit for HNHc (e value 1.60e-03), which lends more evidence to this gene being for an HNH endonuclease. The top hit from HHpred was for an HNH endonuclease (98.11% probability, E-value of 0.0000083). The E-values did not match between PECAAN and HHpred. This is the link to the HHpred results https://toolkit.tuebingen.mpg.de/jobs/JasmineDragon_70 /note=Transmembrane domains: This gene has no TMDs. /note=Secondary Annotator Name: Dom Garza /note=Secondary Annotator QC: Everything looks good.